Tying SPI and DMA together Mar 2017

If you consider µCs to be incapable of any “serious” data handling, then you’ll be in for a treat.

The following design was created for an upcoming project, which needs a fairly high-speed path for handling requests and transferring 512-byte blocks of data to and from an SD card.

One option is to act as a slave-side SPI device. Here’s how SPI works, courtesy of Wikipedia:

Note that the master drives the clock which causes both shift registers to exchange each bit.

SPI is stunningly simple and elegant when only two devices are involved. And just by adding an SPI “select” line, the master can signal to the slave when a transaction is complete. After 8 clock cycles one byte will have been transferred from the master to the slave, and one byte will have moved back in the other direction (in many cases, one of the two directions is ignored).

Normal µC hardware will trigger a request every 8 clock cycles, and the edges of that extra SPI “select” line can then be used to delimit the beginning and end of data packets, respectively.

By adding one more pin (let’s call it “BUSY”) from slave to master, the slave can also let the master know when it has processed an incoming request, and is ready to provide a reply.

So all in all, 5 I/O pins are sufficient to send and receive “packets” in both directions between a master and a slave. A very similar mechanism is used between a µC (as master) and an SD card (as slave), in fact - except that the busy signalling there takes place on the MISO pin.

The transfer clock can have an extremely high rate when the signal distance is low, say a few centimeters. You can easily clock at 8 MHz and transfer one byte per microsecond this way.

But there’s a catch: given that the master drives the clock, it’s very easy for the master to only do so when it’s ready to send and receive data. Which is why it’s so easy to implement an SPI master in software. On the master side, SPI transfers are automatically throttled by the µC.

On the slave side, we don’t have that luxury: bits will arrive at a rate we can’t control. In fact, there’s not an easy way for the master to see whether the bits are being received and sent correctly. All the master can do is write the MOSI pin and read the MISO pin on clock edges.

If the maximum speed is low enough, transfers can be handled by polling the SPI peripheral in software, or with an interrupt generated for each byte. But with a clock rate of 8 MHz or more, there won’t be enough time for the CPU to handle this. That’s where DMA comes in: transfers directly between the SPI hardware and a memory buffer, without wasting CPU cycles at all.

With DMA, we can easily handle a byte per microsecond, on a µC like the F103 running at 72 MHz. Since SPI is bi-directional, we will need to have two DMA “channels” enabled at the same time: one to take bytes from SPI and store them in a memory buffer, and one to feed bytes from a second buffer to SPI. This setup must be repeated before each transfer.

The STM32F103’s DMA hardware supports up to 7 transfers concurrently, but only from a fixed mix of peripheral channel allocations:

Let’s use SPI2, for which DMA channels 4 and 5 have to be set up and activated.

Here is the logic which needs to be implemented:

Note that the master is always in control of the transfers (in both directions), the BUSY signal is just used to keep the master waiting while the slave is handling the request.

Note also that the direction of the data in the second transfer depends on the request - it could be transferring data in either direction.

By convention, the first byte from the master will be the request code. In the second transfer, this code should be zero, since it’s not a new request but a concluding transfer of data.

Here is an demonstration of the whole process, as seen with a logic analyser:

On the slave side, the trick is to use the rising edge on the SEL signal as the trigger, using a pin interrupt, which occurs at the end of each transfer. The rest can be handled using DMA, with no involvement of the CPU at all (and hence at ridiculously high speed, if needed).

As in the previous article, we can use an interrupt to trigger on SEL (a pin change interrupt in this case), and then wake up a task created specifically to handle these requests. Without going into the details here, you should nevertheless be able to see that it’s the same trick as before:

[: BUSY ios! 12 bit EXTI-PR ! slavetask wake ;] irq-exti10 !

Every time SEL goes high we trigger this code, which sets BUSY high and wakes slavetask. At this point, multi-tasking takes over, and it really doesn’t matter how long this will take.

The slave task (details to follow in an upcoming article), then contains all the logic to tie SPI’s RX and TX sides to two DMA channels, writing and reading two different buffers - in parallel!

: slave&  \ this task will process all incoming SPI2 requests
  slavetask background

    vreqbuf c@ case
      \ ... here is the dispatch code to handle each incoming request

    BUSY ioc!
  again ;

There is one very tricky aspect here (isn’t there always?): in slave mode, the SPI hardware TX side must be fed with the first byte to send out before the actual transfer starts. You can see why from the above master/slave diagram of this article: the moment a master clock pulse comes in, the slave hardware must start sending out the first reply bit - and there is no way for the slave to know in advance when, or at what rate, this will happen. On the slave side, we’re at the mercy of the master’s control of SEL and CLK. We have to always be ready for action.

Note the implicit logic behind all this: on SEL going high, BUSY is raised, the slave task is started, and when it is ready, BUSY is lowered again, with DMA set up for the reply.

This design works surprisingly well: it will support SPI clock rates up to 18 MHz (1/4th of the slave’s system clock!), and only generates two interrupts per request/reply transfer, at which point the CPU gets involved and the slave task is activated to perform some real work.

Apart from that, there is virtually no load on the slave side, it’s all handled by the DMA controller. The CPU is free to do whatever it wants. It could be doing interactive stuff over serial, compiling Forth code, performing SD card I/O… or even all of that at the same time.

Which… is what we’re about to do next. Stay tuned!

Weblog © Jean-Claude Wippler. Generated by Hugo.