Great ADC/DMA performance May 2016
For the “JEM” JeeLabs Energy Monitor, we’re going to need to put the ADC on the Olimexino’s STM32F103 to some serious work: the goal is to acquire 4 ADC channels at 25 Khz each, so that we can capture a full cycle of the 50 Hz AC mains signal with a resolution of 500 samples, as well as collecting the readings of up to three current transformers.
Since AC mains voltage is being sampled via the negative peaks of the incoming 9V AC supply, we really only get half cycles, with flat segments in between. To be able to reconstruct a full cycle, we need to capture at least 3 segments: in the worst case, two flat ones with only one complete negative cycle. This requires a data sampling window of at least 30 ms.
As described earlier, we’re going to aim for the following setup:
- single ADC, acquiring 4 channels every 40 µs
- for each channel, two buffers of 800 samples
- this gives an acquisition time of 32 ms per buffer
The STM32F103 has a very capable ADC subsystem, as seen in this diagram from the datasheet:
To distill some other relevant info from the datasheet for our use case:
- the ADC can take up to 1 million samples per second
- it’s slightly less when running at 72 MHz (max ≈ 850 Ksps)
- there’s a “SCAN” mode to read 1..16 specific ADC channels in rapid succession
- the ADC can be triggered to run from a hardware timer, set to 25 KHz in this case
So this means we’re getting one new ADC reading every 10 µs on average. There is one catch: in scan mode, the ADC can only be used in combination with DMA, which makes sense since these data rates would completely overwhelm the CPU if handled through interrupts.
A benefit of using a hardware timer + DMA is that the ADC acquisition timing will be rock solid.
That DMA controller itself is an equally sophisticated part of the µC chip, by the way:
Note that both diagrams include hardware which is not on the “low end” STM32F103RB used on the Olimexino-STM32 board, which has only one ADC and one DMA unit.
It takes quite some reading in the (1137-page!) reference manual for the STM32F1xx chips, to figure out all the settings needed to implement the above acquisition mode. Then again, once that’s done, the code is remarkably short.
Here’s the basic DMA-based acquisition cycle to keep the ADC permanently running:
: adc1-dma ( addr count pin rate -- ) \ continuous DMA-based conversion 3 +timer \ set the ADC trigger rate using timer 3 +adc adc drop \ perform one conversion to set up the ADC 2dup 0 fill \ clear sampling buffer 0 bit RCC-AHBENR bis! \ DMA1EN clock enable 2/ DMA1-CNDTR1 ! \ 2-byte entries DMA1-CMAR1 ! \ write to address passed as input ADC1-DR DMA1-CPAR1 ! \ read from ADC1 0 \ register settings for CCR1 of DMA1: %01 10 lshift or \ MSIZE = 16-bits %01 8 lshift or \ PSIZE = 16 bits 7 bit or \ MINC 5 bit or \ CIRC \ DIR = from peripheral to mem 0 bit or \ EN DMA1-CCR1 ! 0 \ ADC1 triggers on timer 3 and feeds DMA1: 20 bit or \ EXTTRIG %100 17 lshift or \ timer 3 TRGO event 8 bit or \ DMA 0 bit or \ ADON ADC1-CR2 ! ;
It’s not so important at this stage how this works, just what it does:
- a buffer + length is passed in, where the DMA unit will deposit all its readings
- the DMA unit is set up to fill this buffer in circular mode, going on forever
- the ADC is set up to acquire data on every timout of timer 3 at a specified rate
This was created in an earlier experiment, titled Reading ADC samples via DMA to implement an oscilloscope. That was for a single channel, whereas here we need four. Luckily, we can keep that DMA code as is and modify the ADC settings on the fly to switch to 4-channel scan mode:
: quad-adc ( -- ) \ configure ADC and DMA for quad-channel continuous sampling +adc 6 us adc-calib adata #abytes VAC-IN arate-clk adc1-dma VAC-IN adc# \ channel 0 CT1 adc# 5 lshift or \ channel 1 CT2 adc# 10 lshift or \ channel 2 CT3 adc# 15 lshift or \ channel 3 ADC1-SQR3 ! \ set up the ADC scan channels 3 20 lshift ADC1-SQR1 ! \ four scan channels 8 bit ADC1-CR1 bis! \ enable SCAN mode ;
The above code depends on a number of constants, defined as follows:
4 constant #adcs 800 constant #asamples 2 constant #abuffers #adcs #asamples * #abuffers * 2* constant #abytes 40 constant arate-us arate-us 72 * constant arate-clk
It also needs this definition of a 12.8 KB buffer to store all acquired data in:
#abytes buffer: adata
Note that the timer and DMA settings have not changed: timer 3 will fire once every 40 µs and trigger a burst of four ADC conversions, one for each channel. Each completed conversion then triggers a DMA transfer, filling up the circular buffer four times faster than before.
All this code looks complex, and of course in a way it is indeed - this is a complex use case for the ADC + DMA hardware contained in the µC, after all! But in actual use it couldn’t be simpler:
That’s it. Now - magically - the
adata buffer will be continuously filled
with ADC samples from all four channels, without the CPU doing any work at
all. It’s all happening in the background, and perhaps most surprising of all:
the current drawn for all this extra activity is only 2 mA!
The processing overhead is negligible: one 16-bit read and one 16-bit write by the DMA unit - once every 10 µs on average. Since both ADC and SRAM are on the fast internal bus, this will occupy that internal data bus less than 0.3% of the time.
We do need to be careful with timing and synchronise our processing to avoid DMA changing values while we’re still using them. This is solved by inspecting a few status bits in the DMA controller: there is one bit for when the buffer has been filled halfway and another bit when the buffer is full and the DMA unit starts over from the beginning. These happen 40 µs x 800 samples = 32 ms apart, so we can simply poll this in the main loop of our application. There is even no need to introduce interrupts - 32 ms is a very long time for a µC running at 72 MHz.
At the halfway point, we have 32 ms to process the 1st buffer. At the end point, we have another 32 ms to process the 2nd buffer. And so on. This is the circular equivalent of double buffering.
Another subtle issue, is that we can no longer use the ADC in polled mode. To read out the LiPo voltage for example, we need to somehow make the ADC read out an extra channel, without interfering with the above high-speed acquisition cycle. As it happens, the designers at STM thought of that too, and came up with the concept of “injected data channels”: it’s possible to make the ADC acquire 1..4 extra channels, and have it place the results in separate registers.
Using this mechanism, we could specify that we want to read PB0 as well for example (once!), and then simply wait for the ADC scan to pick that request up after it has taken care of all the regular channels. This will allow reading out a few other analog pins, with at most 40..50 µs delay - the worst-case time needed by the ADC to start again and process our “injected” request.
As you can see, modern ARM µCs are a lot more than just a CPU-with-some-memory!