The amazing world of DMA Mar 2016
There are a lot of features hiding in today’s microcontrollers - even the STM32F103 series includes some very nice peripherals:
- 2 to 3 A-to-D converters, sampling up to a million times per second
- on the larger devices: dual D-to-A converters, with 3 µS rise times
- 2 to 3 hardware SPI interfaces, supporting up to 18 Mbit/s
- 2 to 5 universal serial ports, some of them supporting up to 4.5 Mbis/s
That’s a lot of data, once you start using these peripherals.
With polling, it would be very hard to sustain any substantial data rates, let alone handle I/O from several peripherals all going on at the same time.
With interrupts, it becomes easier to deal with timing from different sources, but you also need to be extra careful to avoid race conditions - which can be very hard to debug and get 100% right.
But there’s also another problem with interrupts: overhead.
To “service” an interrupt, the CPU must stop what it’s doing, save the state, and switch to the interrupt handler. And when the handler returns, it must restore the state before the original code can be resumed. This can eat up quite a few clock cycles, if only to get that saved state in and out of memory. And it leads to latency, before the interrupt handler can perform its task.
In many situations, the sustained data rates are not actually that high. We may be receiving the bytes of a packet, or lines from a serial link, or sending out a reply to an earlier request. Even at top speed, all we really need is to efficiently collect (or emit) a certain number of bytes, and then we can deal with them all at once at a considerably slower pace.
One solution for this is to add FIFOs to each peripheral: that way they can collect all incoming bytes without losing any, even if the CPU isn’t using that data right away. Likewise for output: the CPU can fill an outbound FIFO as soon as it likes, and then move on to other tasks while the hardware clocks all those bytes out at the configured rate. But it’s expensive in terms of silicon.
Meet the Direct Memory Access controller: another brilliant hardware peripheral, whose only task is to move data around. In a way, it’s like a little CPU without computational capability - all it can do is fetch, store, count, and increment its internal address registers.
The DMA “engine” of an STM32F103 chip has 7 to 12 channels depending on chip model, which can each move data around independently. These can be set up to either send or receive data from an ADC, DAC, SPI, USART, etc.
As with interrupts, DMA performs data transfers without having to continuously poll. The code which is currently running need not be aware of it. The difference with interrupts, is that even the CPU is not aware of these data transfers: DMA operates next to the CPU, grabbing its own access to peripherals and memory, and “stealing” memory cycles to perform its transfers. There’s “arbitration” involved, to keep all these cats, eh, bus masters out of each other’s way.
Here is an overview from the STM32F103 Reference Manual:
Similar to the FSMC in the previous article, it takes a bit of tinkering to set up a DMA stream, but the gains can be substantial. Imagine pushing 1 KB of data from RAM to a Digital-to-Analog converter (present on higher-end chip models):
- with DMA, the transfer of each 12-bit value will take one memory bus cycle
- with interrupts, it’s more like 20..50 CPU and memory cycles, from interrupt begin to end
If you’re feeding the DAC with values at 1 million samples per second, then this overhead will add up - to the point that an interrupt-based implementation might not even be fast enough!
Lets’ try this. We’re going to use the same Hy-MiniSTM32V as with the FSMC. We’ll set up DMA in circular mode, causing it to send out values to the DAC from a fixed-size buffer over and over again. And to get a bit fancy, we’ll store the values of a sine wave in that buffer, so that a real (analog!) sine wave should come out once this all starts running. Code on GitHub, as usual.
First some basic non-DMA code to initialise and send values to both DACs:
: +dac ( -- ) \ initialise the two D/A converters on PA4 and PA5 29 bit RCC-APB1ENR bis! \ DACEN clock enable IMODE-ADC PA4 io-mode! IMODE-ADC PA5 io-mode! $00010001 DAC-CR ! \ enable channel 1 and 2 0 0 2dac! ; : 2dac! ( u1 u2 -- ) \ send values to each of the DACs 16 lshift or DAC-DHR12RD ! ;
That’s the basic DAC peripheral. Fairly simple to setup and use from code.
Here’s the gist of the DMA setup code (details omitted for brevity):
: dac1-dma ( addr count -- ) \ feed DAC1 from wave table at given address 1 bit RCC-AHBENR bis! \ DMA2EN clock enable [...] DMA2-CNDTR3 ! [...] DMA2-CMAR3 ! [...] DMA2-CPAR3 ! [...] DMA2-CCR3 ! \ set up DAC1 to convert on each write from DMA1 12 bit DAC-CR bis! ;
But we also need to use a timer, to drive this process, since there is no incoming event to trigger this stream. The timer period determines how fast new values will be sent to the DAC:
: dac1-awg ( u -- ) \ generate on DAC1 via DMA with given timer period 6 +timer +dac wavetable 8192 dac1-dma fill-sinewave ;
This, and the code to fill a wavetable with sine values can be found here.
And that’s it. If we enter “
12 dac1-awg”, then the DAC will start producing a
really nice and well-formed 4096-sample sine wave, as can be seen in this
oscilloscope capture from pin PA4:
The resulting 675.67 Hz output frequency matches this calculation:
36 MHz <APB1-bus-freq> / 4096 <samples> / (12 <timer-limit> + 1)
In case you’re wondering: DMA is now driving our DAC at over 2.7 million samples per second.
The DAC actually has several other intriguing capabilities, such as generating triangle waves and even mixing pseudo-random noise into its output. See the code on GitHub for some examples.
But the most impressive part perhaps, is that all this is happening in the background. The µC continues to run Mecrisp Forth, and remains as responsive to our typed-in commands as before. The DAC has become totally autonomous, there is not even a single interrupt involved here!
Next up: let’s find out what DMA can do for us on the Analog-to-Digital side…