A fast µC to FPGA bus Nov 23, 2016

For reasons which will become clear later, I’d like to exchange data quickly between an STM32 µC and an FPGA. SPI is a serial bus, which can be pushed to several dozen Mbit/sec - but what if we want more?

Suppose we could “map” the FPGA into the STM32’s memory space and then simply read and write bytes? No handshaking, no interrupts, no polling. One advantage of such an approach, is that it could transfer data really quickly through DMA - without tying up the µC’s CPU at all.

As it so happens, most STM32 chips with 100 pins or more include a “Flexible Static Memory Controller” which can do just that. And in an earlier weblog post I already used DMA to pump data into the (built-in) D/A converter at 2.7 million samples per second.

If we can make the FPGA act as a memory chip of some sort, and make it compatible with the FSMC, then that built-in hardware could take care of all the pesky little details.

It turns out that the simplest interface for bulk data transfers is the one implemented in NAND flash chips. These are not truly random-access: you send it a block number, and then you can read or write a bunch of consecutive bytes and it’ll auto-increment its internal address on each cycle.

A benefit of this bus protocol is that it needs relatively few I/O pins: 21 pins are enough to transfer data 16-bit words at a time.

This Storm IV FPGA board has just enough free pins to set up a test jig, using a compact STM32F103ZE board from eBay on top:

Time for some pin-planning and soldering!

Now we need to get the two talking, to verify that it all works. Let me just say that it took quite a bit of debugging to get there! - It all kinda-sorta worked fairly quickly, but I kept seeing off-by-one errors, and not reading the same data back as what was being sent out to the FPGA.

This is the time when a Logic Analyser from Saleae turns out to be invaluable. I didn’t want to solder wires to attach the probes to the above setup, so instead I simply hooked up another STM32F103ZE board:

It turns out that I was running the FSMC controller slightly too fast, and using the wrong edges of the read and write pulses. Once fixed, everything fell into place:

For testing, I just can’t get enough of Forth: easy to set up, fast, and great for interactive development. You simply cannot beat the “edit-compile-run” cycle when there is no compilation. There is a 1-sec upload delay, due to the way I re-send entire modified source files, but that’s of a different order.

The full test code (all 45 lines) is on GitHub. Here is the test data and basic test:

create wdata
hex
  1234 h, 4321 h, 5678 h, 8765 h, 2345 h, 5432 h, 6789 h, 9876 h,
  8080 h, 4040 h, 2020 h, 1010 h, 0808 h, 0404 h, 0202 h, 0101 h,
  1111 h, 2222 h, 3333 h, 4444 h, 5555 h, 6666 h, 7777 h, 8888 h,
decimal

: test
  fpga-init
  $00 wdata      fpga-write  $00 rdata fpga-read  show
  $40 wdata 16 + fpga-write  $40 rdata fpga-read  show
                             $00 rdata fpga-read  show
                             $40 rdata fpga-read  show
                             $80 rdata fpga-read  show
; test

And this is what gets reported via serial:

 1234 4321 5678 8765 2345 5432 6789 9876
 8080 4040 2020 1010 0808 0404 0202 0101 0000
 8080 4040 2020 1010 0808 0404 0202 0101
 1111 2222 3333 4444 5555 6666 7777 8888 0000
 1234 4321 5678 8765 2345 5432 6789 9876
 8080 4040 2020 1010 0808 0404 0202 0101 0000
 8080 4040 2020 1010 0808 0404 0202 0101
 1111 2222 3333 4444 5555 6666 7777 8888 0000
 0000 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000 0000 ok.

Each test writes eight 32-bit values, which the FSMC then sends across as 16-bit words (since this interface is 16-bit only). Then we read back 8 values and print them out. Each test sends different values, and the results are indeed correct.

As quick timing test, let’s read 2048 bytes:

: timing ( n -- )  \ perform a timing test, reading 1024 words via the FSMC
  micros swap 0 do NAND @ drop loop micros swap - . ;
512 timing 236  ok.

As you can see, this takes 236 µs. A bare loop takes 43 µs, so the actual transfers take under 200 µs. That’s 2 KB in 200 µs: this setup is transferring data at > 10 MB/sec!

Best of all, is that the transfer mechanism has become invisible: we can just write a position to “NAND-ADR” and then read or write consecutive words from/to “NAND”. From a µC code perspective, they’re both simply memory addresses - it’s magic!

Weblog © Jean-Claude Wippler. Generated by Hugo.