Simulating an MMU¶
For a “grown-up” multi-tasker, it’s very useful to have a Memory Manamagent Unit (MMU). This can be used to allocate and map memory for it in arbitrary address ranges, and allows a pre-compiled application to be loaded anywhere, but also to launch multiple independent instances of this same application (by remapping its RAM differently in each case).
Unfortunately, ARM Cortex µCs only have a Memory Protection Unit, which performs access control but does no remapping. And even this type of MPU is optional.
The Flexible Memory Controller¶
What some ARM µC’s do have, is a memory controller to support external memory chips. This built-in hardware can map sections of the 32-bit address space to a variety of memory chips (and LCD screens which act as memory). You need quite a few GPIO pins for this function, to be able to bring out all the address and data lines, plus a few control signals (CS, WE, etc).
On STM32 chips with 144 pins or more, there tend to be enough spare pins to connect standard SRAM chips with 256..2048k of memory, using 8, 16, or 32 bits as data bus width. I’m going to to use the STM32F723IE as example, which is used on the F723-Disco board.
The key point is that the FMC has to re-use a few GPIO pins for this, such as these:
A18 | A17 | A16 | A15 | A14 | A13 |
---|---|---|---|---|---|
PD12 | PD11 | PG5 | PG4 | PG3 | PG2 |
In normal “FMC” use, these pins are set to alt mode 12 so the FMC can control them as address pins for the attached SRAM chip. But who says that we have to stick to this rule?
Taking over address pins¶
For the following discussion, it’s important to first explain how the demo app is going to show the effects of remapping memory. Some code:
auto mem = (uint8_t*) 0x6000'0000;
constexpr auto BYTES = 512*1024, STEP = 8*1024;
for (auto i = 0; i < BYTES/STEP; ++i) {
auto c = '@' + i;
mem[i*STEP] = c < 0x7F ? c : '?';
}
This stores a different character code in the first byte of each of the 8 kB blocks in the PSRAM chip (pseudo-static ram) present on the F723-Disco board:
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg...rstuvwxyz{|}~? unmapped
There are 64 blocks in total, with blocks 40..49 not shown for brevity. Here’s what happens when the mode of the PD12 GPIO pin is changed to a fixed pull-down:
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_@ABCDEFG...RSTUVWXYZ[\]^_ D12:D
Only the lower half of the 512 kB memory is now accessible, the upper half is replaced by a copy of that lower half. And here’s the mapping when PD12 is set to pull-up:
`abcdefghijklmnopqrstuvwxyz{|}~?`abcdefg...rstuvwxyz{|}~? D12:U
Note how a different address range is now visible in that same lower 256 kB section of FMC-mapped memory. In other words: that PD12 pin can be used to map another part of external RAM to the same low end of the address space. The FMC wants to control PD12 (i.e. A18) when accessing that part of memory, but it can’t as PD12 is now under GPIO control.
Simulated address mapping¶
You can probably see where this is going: by turning several of the address pins into fixed zeros and ones, the total 512 kB memory mapping is changed into a smaller (power-of-two) memory area which gets replicated over the entire 512 kB address range.
Here are some examples of this, with the pin modes listed at the end of each line:
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg...rstuvwxyz{|}~? unmapped
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_@ABCDEFG...RSTUVWXYZ[\]^_ D12:D
`abcdefghijklmnopqrstuvwxyz{|}~?`abcdefg...rstuvwxyz{|}~? D12:U
@@BBDDFFHHJJLLNNPPRRTTVVXXZZ\\^^``bbddff...rrttvvxxzz||~~ G2:D
AACCEEGGIIKKMMOOQQSSUUWWYY[[]]__aacceegg...ssuuwwyy{{}}?? G2:U
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg...rstuvwxyz{|}~? unmapped
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_@ABCDEFG...RSTUVWXYZ[\]^_ D12:D
@ABCDEFGHIJKLMNO@ABCDEFGHIJKLMNO@ABCDEFG...BCDEFGHIJKLMNO D11:D,D12
@ABCDEFG@ABCDEFG@ABCDEFG@ABCDEFG@ABCDEFG...BCDEFG@ABCDEFG G5:D,D11,D12
@ABC@ABC@ABC@ABC@ABC@ABC@ABC@ABC@ABC@ABC...BC@ABC@ABC@ABC G4:D,G5,D11,D12
@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A...@A@A@A@A@A@A@A G3:D,G4,G5,D11,D12
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@...@@@@@@@@@@@@@@ G2:D,G3,G4,G5,D11,D12
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg...rstuvwxyz{|}~? unmapped
`abcdefghijklmnopqrstuvwxyz{|}~?`abcdefg...rstuvwxyz{|}~? D12:U
pqrstuvwxyz{|}~?pqrstuvwxyz{|}~?pqrstuvw...rstuvwxyz{|}~? D11:U,D12
xyz{|}~?xyz{|}~?xyz{|}~?xyz{|}~?xyz{|}~?...z{|}~?xyz{|}~? G5:U,D11,D12
|}~?|}~?|}~?|}~?|}~?|}~?|}~?|}~?|}~?|}~?...~?|}~?|}~?|}~? G4:U,G5,D11,D12
~?~?~?~?~?~?~?~?~?~?~?~?~?~?~?~?~?~?~?~?...~?~?~?~?~?~?~? G3:U,G4,G5,D11,D12
????????????????????????????????????????...?????????????? G2:U,G3,G4,G5,D11,D12
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg...rstuvwxyz{|}~? unmapped
For memory management, all that’s required is to stick to a small section of the address space in our µC application code: anything from the lower 4 kB to the lower 256 kB should do nicely.
Managing multiple tasks¶
In the context of multi-tasking, it is now a matter of altering the address pins as part of the context switching logic: each task thinks that it has its own N kilobytes of RAM, starting at 0x6000‘0000 (where the FMC maps this RAM chip), but in reality the mapping is adjusted so that it ends up using any part of the RAM chip.
All tasks can now be compiled with a fixed RAM memory area placed at 0x6000‘0000, just like any other embedded µC build. And yet, there can be many tasks running concurrently without any of them aware of the “simulated virtual memory” mechanism just described.
This also supports multiple instances of the same task, each with its own RAM.
Caveats¶
This simulated MMU is created through GPIO trickery in software. The internal hardware does not know about any of this, which leads to some issues to be aware of:
- interrupts cannot assume a specific mapping to be in effect, so all interrupt state (including the main interrupt stack) has to be placed in non-mapped internal RAM
- likewise, DMA has no knowledge of any of this, and must also be limited to transfers to and from non-mapped internal RAM (unless it locks all mapping changes while active)
Luckily, the F723 µC on the Discovery board also has 256 kB of internal SRAM, which ought to be more than enough for all non-mappable requirements.
It’s all just a first exploration for now. Time will tell if this approach is able to turn a little µC into a mainframe-like workhorse, in terms of supporting multiple concurrent tasks which are not able to interfere with each other. Can a 21st century µC do what “big iron” did decades ago?
Here is the source code for this demo: https://git.jeelabs.org/doodle/tree/smmu/mapped.cpp