Computing stuff tied to the physical world

Low-level programming

If you’re used to program in languages such as JavaScript, Java, Go, Python, or PHP, then C can be a major step back, since the constructs it uses are closer to the raw hardware than any of these languages. Even C++ suffers from this, as it has to use the same semantics.

The most important difference is that C has a very “raw” model of memory, and therefore also all the variables and arrays. This is what memory looks like in C and C++:

Memory

There is no structure, but each byte is individually “addressable”. By convention, the leftmost byte is at address 0, and so in this example the rightmost one would be address 9.

Addressing

Everything else – absolutely everything – is mapped on top of this address space: there is an area where we have flash memory, another one for RAM, and yet another where we can find the hardware peripheral “registers”. Here is the address map of the LPC81x series:

Screen Shot 2015 02 02 at 22 31 54

In this diagram, address zero is at the bottom. We’ll get into bit fields and pointers later, but the thing to keep in mind is that in C/C++ the entire 32-bit address space (from byte 0 to 4,294,967,295) is one ginormous pool. And that 99.99% of it is “reserved” and unusable.

The most interesting areas start at 0x10000000 (RAM) and 0x40000000 (APB).

RAM is where all variables, arrays, stacks, and heaps are located. The APB and other areas are where hardware (and pin!) settings can be stored and retrieved. All I/O happens there.

When a C/C++ application is built, all the source code is translated (“compiled”) into machine language instructions (usually 2 bytes each on ARM Cortex), and then all the code is combined (“linked”) into a single binary firmware image.

Uploading / re-flashing means nothing other than: getting that firmware image through some magic into the µC’s flash memory, i.e. the memory starting at address zero.

On power-up, but also after a reset, the µC then starts executing the instructions it finds in its flash memory. Think of it as a tireless little engine which keeps track of where it is (the “program counter” or “instruction pointer”), fetches the next instruction from memory, and performs what it says. When done, it goes to the next instruction, does that, and so on.

This maps fairly closely to how the source code of a C/C++ program is set up: keep track of the line number, read and do what it says, then go on to the next line number. Some lines are if statements or while loops which can affect what gets done next. The same is the case in machine code. Not surprisingly, since C maps fairly directly onto machine instructions.

So a simple C statement such as “a = b * 123;” turns into machine instructions which:

  1. fetch the value of variable “b”
  2. multiply it by the constant “123”
  3. store the result in variable “a”.

Each of those variables are nothing but aliases for specific addresses in RAM, as defined during the transformation to machine code.

If you were to look into what the “gcc” compiler and linker do, you’d be surprised how close C is to raw machine code (C++ slightly less so). Most of the C constructs have a very simple and obvious mapping to what a computer does, including the ARM-based LPC8xx µC.

The key benefit of C and C++ over writing actual machine code, is that you don’t have to keep track of all the low-level details, i.e. which variable is where in RAM, where the code is located, how to perform if’s and while’s, and so on. Instead of losing our wits, we get to write in a symbolic fashion. Who cares that variable “a” maps to address 0x10000123 ?

Memory management

A key difference with other languages is how C and C+ manage memory, which in turn affects how (and where!) data is stored. In C and C++, there is no “automatic” memory management, other than the strict nesting of local variables in function calls.

In C, you can’t easily return a string of characters, for example – you have to decide where to put those characters, and then return a reference to that location (that’s a sneaky way of avoiding the word “pointer”). Fortunately, small values such as integers and floating point numbers (and references/pointers) can be returned and copied around as is.

Languages such as JavaScript, Java, Go, Python, and PHP are considerably more sophisticated in this area. They use a “garbage collector” or “reference counting” to keep track of strings, vectors (arrays), maps (hashes), sets, etc. Unfortunately, such robust memory management techniques are too complex to implement on small embedded µC’s. We simply don’t have that luxury when flash and RAM memory are both so limited!

In C, errors in the use of memory are common, and the result usually disastrous (crashes and hangs). In these other languages, it is much harder to mess up the use of memory.

Which makes it even more important to get a really good grip on how values, bits, bytes, words, arrays, and pointers work in C/C++. It’s not hard, only a matter of getting it right.

[Back to article index]