Skip to content

C++ on bare metal

The JeeH library was created a while back to explore simpler C++ notations and more efficient coding of direct access to hardware registers on a microcontroller. I’ve been using and refining this library in most of my projects over the past few years. The ease of accessing GPIO, UART, SPI, and other devices has helped simplify a lot of my code, even though I’ve had to write everything from scratch, including ring buffers, interrupt-driven UART drivers, all the way to DMA, USB, LCD, and Ethernet. But JeeH is only the tip of the iceberg w.r.t. what’s possible.

This article introduces my next iteration: a C++ header file for µCs which uses some more advanced features of C++ to provide even more direct access to the hardware. The idea being that all the hard work will be done inside the C++ header(s), so that the real-world use of these features becomes convenient, concise, straightforward, and efficient .. an ambitious goal!

As always, I’ll use PlatformIO (aka “PIO”) as build environment. The µC dev-board I’ll use is the Nucleo-L432. All source code described here is available at See also these notes about my coding choices and conventions.

What to expect

As shown in this article, here is the code needed to blink an LED with CMSIS:

#include <stm32l4xx.h>

void delay (int n) { for (auto i = 0; i < n * 1000; ++i) asm (""); }

int main () {

    while (true) {
        GPIOB->ODR |= GPIO_ODR_OD3;
        GPIOB->ODR &= ~GPIO_ODR_OD3_Msk;

With the header I’m introducing, there are two ways to rewrite this - first, raw register access:

#include <jee.h>
using namespace jeeh;

void delay (int n) { for (auto i = 0; i < n * 1000; ++i) asm (""); }

int main () {
    enum { MODER=0x00, ODR=0x14 };

    RCC[AHB2ENR](1) = 1;
    GPIOB[MODER](6, 2) = 1;

    while (true) {
        GPIOB[ODR](3) = 1;
        GPIOB[ODR](3) = 0;

That doesn’re really improve things: it’s slightly more concise, but there are also a few extra constants, plus some C++ magic in the assignment statements. Here’s the second option:

#include <jee.h>
using namespace jeeh;

void delay (int n) { for (auto i = 0; i < n * 1000; ++i) asm (""); }

int main () {
    Pin led ("B3");

    while (true) {
        led = 1;
        led = 0;

The LED is obviously on pin PB3 and set to Push-pull mode. In addition there’s a led variable which can be set to 1 or 0 to control the LED (it’s the same convenient notation as in Mbed). But unlike most runtime libraries, this second version still compiles to the same efficient code as the first one. Being header-based, the C++ compiler will optimise the heck out of the above, and will inline just about everything, with raw memory writes to the hardware registers.

The delay() code in all these examples uses busy-looping to avoid bringing in some of the more advanced concepts, such as a systick-based timer, idling, and low-power sleep modes. These will be addressed later.

To illustrate the compactness of this (admittedly trivial) demo blink code, here is a size report:

RAM:   [          ]   0.0% (used 28 bytes from 65536 bytes)
Flash: [          ]   0.1% (used 392 bytes from 262144 bytes)

And for the generated assembly code, check out Matt Godbolt’s amazing Compiler Explorer.

Building up the notation

The main idea for simplifying low-level access, is to make hardware registers as well as bits and subfields in them as similar to C++ variables as possible. So that int i = <something>; and <something> = 1; act as natural notations to access and change hardware registers.

So RCC[AHB2ENR](1) = 1; means: take the h/w register at offset AHB2ENR in the RCC device, access bit 1 (bit 0 is the LSB), and set it to 1. And GPIOB[MODER](6, 2) = 1; means: take the register at offset MODER in the GPIO device, and change 2 bits starting at 6 (i.e. bits 6..7) to 1.

This requires some C++ tricks, using templates, proxy objects, and overloading the = and int operators. Let’s start with a simpler case, changing the entire AHB2ENR register in RCC:

template< uint32_t ADDR >
struct IoReg {
    struct Word {
        uint32_t o;

        operator int () const { return *(volatile uint32_t*) (ADDR+o); }
        void operator= (int v) const { *(volatile uint32_t*) (ADDR+o) = v; }

    constexpr auto operator[] (uint32_t off) const { return Word{ off }; }

constexpr IoReg<0x4002'1000> RCC;

enum : uint16_t { AHB2ENR  = 0x60 };

int main () {
    RCC[AHB2ENR] |= 1<<1;

This code enables the GPIOB clock, but without the bit access trickery:

  • RCC is defined as a const IoReg with 0x4800'0400 as value for ADDR
  • RCC[AHB2ENR returns an object of type IoReg<0x4800'0400>::Word
  • ... |= 1<<1; will call operator int to read the register, “or” it with 1<<1 (i.e. 0x02), and then call operator= to save the result back in the register

This code also relies on some modern C++11 / C++17 constructs:

  • auto is a type declaration which adapts to the initialising or return value
  • Word{ ... } is a way to instantiate a Word (like a C cast, but more powerful)
  • constexpr is used in various places to help the C++ compiler optimise further
  • the backtick in the “0x4800`0000” constant is only to improve readability
  • note also the Word struct nested inside the IoReg<> template struct

Proxy objects

The Word struct is called a “proxy” because it gets returned by the [] operator to represent a hardware register location without actually doing anything yet. The actual work is done when its int or = operator is called. These objects are only intended to be used as temporaries as part of an expression, they should in principle never be used to create variables or be stored anywhere.

But this is just a first step. Additional overloads of the () operator and a second Bits proxy type are used to extend this notation to Word(x) for access to bit x and Word(x,y) for access to bits x through x+y-1. See the full source code in git for details.

And, given that everything above main() is in the jee.h header file, it should now be clear how things work under the hood. A few details not addressed here are: 1) support for bit-banding on some STM32 families (including the L432), and 2) a separate jee-svd.h header containing all the IoReg<...> register definitions for the L432 µC (more on that later).

With this out of the way, none of this should be hard to understand anymore:

RCC[AHB2ENR](1) = 1;    // set bit 1 of RCC's AHB2ENR register to 1
GPIOB[MODER](6, 2) = 1; // set bits 6..7 of GPIOB's MODER register to 1
GPIOB[ODR](3) = 1;      // set bit 3 of GPIOB's ODR register to 1
GPIOB[ODR](3) = 0;      // set bit 3 of GPIOB's ODR register to 0

There are lots of magic names and numbers, but the point is that these are well-documented in the reference manual for each chip type. Instead of adding an extra layer of mind-numbingly long constants, just keep that manual nearby while writing hardware device interface code …

Anyway, so much for the most basic level of hardware register access. On to the next level.

GPIO pins and modes

GPIO pins are widely used on µCs, where each one could be described as “a single bit in one of the hardware registers”. But there is a bit more to it than that:

  • reading pins and setting them should be efficient and use little code
  • each pin has a “mode”, defining whether it is an input, output, etc
  • there are some additional settings, such as pull-ups and pin drive strengths

But the fact remains that once pins have been configured, their main purpose is to be read out or to be set or cleared. Sometimes in numerous places in an app.

On STM32, pins are grouped into “ports”, called GPIOA, GPIOB, etc. Each port has up to 16 pins connected to it, depending on the exact chip type used. The “native” naming convention is PA0, PB12, etc. and this is how they are documented in datasheets and reference manuals.

Pins can also be tied to other internal hardware devices, such as UARTs, SPI, I2C, etc. In this case, a few pins are usually connected in combination, and the proper configuration is then most often done in the driver setup code.

Given the countless ways pins are used and configured, and given that each new generation of chips adds more ways to do so, a string-based approach would be very convenient as it is flexible enough to accommodate future extensions. It would be convenient to pass a specific set of bits to a driver, and it would also be convenient if the driver can then easily configure the specified pins. But strings are not great in terms of code overhead and efficiency, especially on small µCs and when performance is critical, e.g. for bit-banged SPI or I2C.

This is where C++17’s constexpr capabilities can help. It allows processing strings at compile time and generating settings which are virtually optimal at run time. The C++ declaration auto led = Pin ("B3"); can be transformed into a single-byte constant, encoding the B port and the 3 pin index - at compile time! In other words, Pin ("B3") is equivalent to to single byte, up to the point even that the following code will compile properly:

if (Pin ("A4") <= pin && pin <= Pin ("A7")) ...

Even this will work:

switch (pin) {
    case Pin ("A0"): ... 
    case Pin ("B12"): ... 

When setting up a bit-banged SPI device, the pins could then be specified as one string:

auto sdcard = SpiGpio ("B1,A7,A15,C0");

In this case, the string does get passed in at run time, but the overhead of parsing it at run time and turning into into Pin objects (in the SpiGpio constructor, i.e. once) will be very low.

Taking this a step further, strings can also be used to configure pin modes:

auto tx = Pin ("A9"), rx = Pin ("A10");
tx.init("V10"); // very high speed, alt mode 10
rx.init("H7"); // high speed, alt mode 7

(there’s little point making the pin speeds different, this is just a silly example)

In other words: a Pin object is an efficient string-less single-byte constant to represent pins in C++, even though in the source code it’s all based on easy-to-interpret string names and settings. When t cannot be parsed at compile time, strings will mostly end up in flash memory, as a compact notation and DSL, parsed and “interpreted” at run time.

Again, the C++ code is quite involved, but here’s the essence of it all:

struct Pin {
    uint8_t id;

    explicit constexpr Pin (char const* s) : id (parse(s)) {}

    operator int () const { ... }
    void operator= (int v) const { ... }
    void toggle () const { ... }

    void mode (char const* desc) const { ... }

    constexpr uint8_t parse (char const* s) {
        if (s == nullptr || *s < 'A' || *s >= 'P')
            return 0;
        uint8_t port = *s++ - '@', pnum = 0;
        while ('0' <= *s && *s <= '9')
            pnum = 10 * pnum + *s++ - '0';
        if (*s != 0 && *s != ':' && *s != ',')
            return 0;
        return 16*port + pnum;

The key trick is that the Pin constructor and parse() are defined as constexpr, so that when the input string is known at compile time, the compiler will perform the parsing itself.

Perhaps the nicest part of it all, is that when parsing can’t be done at compile time, then the compiler will simply emit the code to parse at run time. This approach does not prohibit the use of configuration strings read in at run time from a serial port, from an EEPROM, or whatever.

Difference with JeeH

This compile- vs run-time flexibility was the main reason to improve on JeeH’s template-only approach, where each pin and device ended up being a separate type. This made it very tricky to write high-level code which could be configured to use different drivers (or even different instances, e.g. UARTs) at run time.

On STM32, the mode string supports the following letter + digit combinations:

  • A: analog, F: float, D: pull-down, U: pull-up
  • P: push-pull, O: open drain
  • L: low speed, N: normal speed, H: high speed, V: very high speed
  • 0..15: number, switch to alternate mode when specified

Unlike pin names, modes aren’t parsed at compile time. Probably due to my inexperience with this approach: I couldn’t get it to work automatically, only with an artifical extra “Mode” struct. Which is why the Pin version is larger and includes the mode parser:

RAM:   [          ]   0.0% (used 28 bytes from 65536 bytes)
Flash: [          ]   0.3% (used 828 bytes from 262144 bytes)

Multi-pin configuration

Taking this approach yet further, the number of separate mode-config calls can be reduced by parsing strings just a bit more. The tx + rx configuration shown above can also be written as:

Pins pins [2];
Pin::config("A9:V10,A10:H7", pins, 2);

In this case, pins[0] will be the TX pin definition, and pins[1] wil be RX. If the pin variables themselves are not needed by the app, this can be shortened to a one-liner:


And if several pins are set to the same mode, then all the subsequent ones can be omitted: "A1:A,A2,A3,A4,B9:UH,B10" is equivalent to "A1:A,A2:A,A3:A,A4:A,B9:UH,B10:UH".

The strings have essentially become a tiny description language for modes. It may seem a bit over the top to support this sort of configurability, but it turns out to be extremely convenient, and performance-wise it really doesn’t matter, since pin configuration tends to be used very infrequently (often only as part of one-time app initialisation).

Running tests on a µC

I’m a big fan of Test Driven Development. There are many ways to do this (on real as well as simulated hardware), but I just want to illustrate how simple it can be with PIO, which supports the Unity test framework out of the box (not to be confused with the “Unity 3D Engine” …). Three steps need to be taken to make this work:

  1. Create a minimal UART interface
  2. Add a first test in the test/ directory
  3. Upload and run the test, fix any errors, and then write more tests

Step 1 needs only be done once. I will again use the Nucleo-L432 board for this, which has an on-board ST-Link USB interface with a bridge to one of STM32L432’s serial ports.

Minimal UART

Here is the code to implement a minimal, polled, send-only UART. It’s STM32L432-specific, in that the registers, bits, and pins will differ across the different STM32 chip variants:

#include "unittest_transport.h"
#include <jee.h>
using namespace jeeh;

enum { CR1=0x00, BRR=0x0C, ISR=0x1C, TDR=0x28 };

extern uint32_t SystemCoreClock; // Hz, set in CMSIS startup

void unittest_uart_begin () {
    RCC(EN_UART2) = 1;
    UART2[BRR] = SystemCoreClock / 115200;
    UART2[CR1] = (1<<3) | (1<<0); // TE UE

    for (auto i = SystemCoreClock/20; i > 0; --i) asm (""); // startup delay

void unittest_uart_putchar (char c) {
    while (!UART2[ISR](7)) {} // TXE
    UART2[TDR] = c;

void unittest_uart_flush () {}
void unittest_uart_end () {}

Let’s write a test

And here is a first trivial test case:

#include <unity.h>
#include <jee.h>
using namespace jeeh;

void t_smoke () {
    TEST_ASSERT_EQUAL(42, 40+1);

int main () {
    return 0;

It may seem silly, especially since it’s wrong, but it’s actually a good starting point, respectful of the TDD tradition: 1) write a failing test, 2) fix it, and 3) move on. Being able to actually run this test will verify that the setup is functioning, that the UART works, and illustrates what a failing test looks like:

$ pio test -e pin

The result should be something like this, assuming the Nucleo-L432 is connected:

failed test output

Once the test has been corrected, the output will switch to all green text:

successful test output

Testing the Pin type

I’ve combined the test and the UART code in a single file called test/main.cpp. Adding more tests is now a matter of editing this file and running the pio test -e pin command again.

The -e pin option prevents PIO from running all the other builds it knows about (cmsis and ioreg, in this case). This can be a bit confusing, since pio test does not include any code from src/.

To test GPIO pins, I’ve jumpered pins D0 and D1 together on the Nucleo-L432 board (these are Arduino-derived pin names, totally unrelated to the STM32’s PD0 and PD1 pins). This way, one of them can be set to an input while the other is changed, reading back the value and veryfying that every combination works as expected. See the test/main.cpp file in git:

$ pio test -e pin
test/main.cpp:64:t_smoke    [PASSED]
test/main.cpp:65:t_jumper   [PASSED]
2 Tests 0 Failures 0 Ignored
============================ [PASSED] Took 2.75 seconds ===

Test    Environment    Status    Duration
------  -------------  --------  ------------
*       pin            PASSED    00:00:02.748
============================ 1 succeeded in 00:00:02.748 ==

Sure enough, it passes, all 18 checks! Again, keep in mind that the pins used in this test have to be jumpered together, and they are bound to be different on each µC development board.

All in all, quite a bit has been accomplished so far: a new notation in C++ to access hardware registers, a new Pin type with readable pin names and modes, and an actual tests on a real board to verify that this code is working properly.

Other µC chip variants

Until now, all this code has been for the Nucleo-L432 board. What about other µC types?

I’ll limit myself to other chips from ST Microelectronics for now: there are hundreds of different STM32 variants out there, covering a huge range of features, sizes, and prices. All ARM Cortex based, all 32-bit, all usable in PIO by adjusting one board = ... setting in platformio.ini.

The good news is that everything in the jee.h header is essentially usable on any of these chips. The main differences are concentrated in the jee-svd.h header, with its list of hardware devices, addresses, interrupt IDs, and some other chip-specific details. So the challenge is one of placing the correct definitions in jee-svd.h before building a firware image.


There is a formal definition of most ARM chips called the System View Description, which precisely documents each hardware register, their memory address, and all the bit fields inside them, as well as all interrupt vector numbers and names. It’s essentially a machine-readable form of the “register maps” found in all the huge Reference Manuals from STM and others.

These definitions are written in XML and can be found in the CMSIS-SVD repository on GitHub.

Three more bits of good news:

  • it’s very easy to parse these files using Martin Blech’s xmltodict Python module
  • PIO includes the relevant files whenever it auto-installs the toolchain and runtime libraries needed to build a specific target
  • PIO is based on SCons and allows customising its build process through “extra scripts”, as described in the Advanced Scripting section.

In short: all the information needed to generate the jee-svd.h header for a specific STM32 chip is available in PIO. All it needs is a bit of Python glue scripting to 1) parse and generate jee-svd.h and 2) call this script just before PIO starts compiling the code.

The first script is called in the scripts/ directory, with a copy of next to it. The second script is in scripts/, next to platformio.ini.

The script takes an (SVD!) name as argument and generates the header on stdout. It can be run manually from the top-level directory: scripts/ STM32L4x2 | less

The glue code in is very SCons-oriented. Its task is to figure out the current SVD name from the PIO project environment, check whether the current jee-svd.h file is for a different chip (by looking at the first text line), and update the jee-svd.h header if so. Thus, the (slow) process of parsing XML is avoided as long as we’re compiling for the same chip.

The result of all this custom glue logic is seamless automation: whenever PIO builds, it will always have the correct information for that chip in jee-svd.h.

This doesn’t solve all portability issues (different pins, UARTs, hardware devices, etc), but it’s a big step towards hiding many differences between all the chip variants - at least for STM32.

Let’s try it out

An STM32-based board I very much like to use, due to its low cost and all the functionality that comes on-board, is the Disco-F723. Lots of goodies: extra RAM, QSPI flash, LCD display, audio I2S codec, extra USB ports, breadboard interface, even a socket for an ESP8266 WiFi module.

While blinking an LED yet again would make an easy first test, I’ll go for the more complete jumpered-pin tests. The following steps are needed:

  • add an extra “f723” build section to platformio.ini
  • add a UART implementation for this slightly different processor
  • choose a different set of pins for the jumpered tests

The result is somewhat messier code at the end of test/main.cpp to match the UART and a change of pins in the jumper tests. But jee.h has not been affected at all, and jee-svd.hhas auto-magically adapted itself to this µC with considerably more hardware functionality:

$ pio test -e f723
Test    Environment    Status    Duration
------  -------------  --------  ------------
*       f723           PASSED    00:00:02.556
============================ 1 succeeded in 00:00:02.556 ==

Note that PIO properly connects to whatever it needs when only one dev board is present.