Computing stuff tied to the physical world

Lots of bits and pieces

Integers are values. But we need to be more precise than that: integers are also represented in some way inside a computer. We think in units, tens, and hundreds when we see “123”.

A computer only has the binary format to deal with, and treats an integer as a combination of 1’s, 2’s, 4’s, 8’s, 16’s, 32’s, 64’s, and 128’s when it deals with the integer 123:

  • 123 = 64 + 32 + 16 + 8 + 2 + 1

i.e.

  • 123 = 1x 64 + 1x 32 + 1x 16 + 1x 8 + 0x 4 + 1x 2 + 1x 1

That’s 1111011 when you omit the power of two values, which are always the same. Just as we’ve been taught early on that 123 = 1x 100 + 2x 10 + 3x 1, and then omit the 100/10/1.

Binary is really simple, but also very tedious… lots of 0’s and 1’s, and nothing else.

That’s why we invented hex notation: take each of the bits in groups of 4, and encode the 16 possible combinations as one “hex digit”: 0=0, 1=1, …, 9=9, 10=A, 11=B, …, 14=E, 15=F.

So binary 1111011 can be rewritten as 0111 1011 (adding zeroes on the left changes nothing), and then converted to hex digits: 0111 = decimal 7 = hex 7, and 1011 = decimal 11 = hex B. In C, this can be written as “0x7B” or “0x7b” (case is ignored in hex digits), or “0x007B”.

It takes some time to get used to hex notation, but it sure is a lot more concise than binary! Unlike decimal notation, hex notation has the property that you can always easily convert it to and from binary, 4 bits at a time, no matter how large the value. Digits are independent.

From here on, all values will be either in decimal or in hex. Just keep in mind that they are nothing more than notational representations of the underlying integer values.

It’s all about going round

RAM and flash memory are a range of bytes (a byte is 8 bits, i.e. 2 hex digits). But a byte is far too small to do much with. It can only store integer values 0x00 to 0xFF (i.e. 0..255).

That’s why we group multiple bytes as one. A 2-byte “int” can have values 0x0000 to 0xFFFF (0..65,535). A 4-byte int can be 0x00000000 to 0xFFFFFFFF (0..4,294,967,295).

What about negative values?

Easy. We throw away half the positive values and re-interpret them as negative ones. So instead of storing the range 0..255 in a byte, we re-use it to store the values -128..+127. Likewise, 2-byte ints: -32,768..+32,768 and 4-byte ints: -2,147,483,648..+2,147,483,649.

But there is some trickery involved: while we could define 0x00 as -128, 0x80 as 0, and 0xFF as 127, that’s not how it’s done. It’s quite convenient to have 0x00 represent zero.

Instead, we define 0x00 as 0, 0x7F as 127, and.. 0x80 as -128, and 0xFF as -1. This is called 2’s complement signed notation. It has some very nice properties:

  • 0x00 is 0, and adding 1 to it leads to 0x01, just as we’d expect
  • 0x7F is 127, and we’re not supposed to ever add 1 to it, because then it’ll overflow
  • 0x80 is -128, somewhat arbitrarily, but adding 1 to it is 0x81, i.e. -127 – makes sense
  • 0xFF is -1, and of we add 1 to it and ignore the carry, we get 0x00 – aha, that’s zero!

It helps to see the number scale mapping as a circle, i.e. to switch to modulo arithmetic:

Number circle

We cut the circle at some point to turn it into a linear series of integers. Moving one step to the right adds 1, moving one step to the left subtracts one:

Number circle flat

The place where the cirle has been cut is not so important. With 2’s complement, we keep the zero value mapped to 0, but that makes the first step to the left wrap around to the top!

In each of these cases, moving right increases, except for the edges or in the middle.

Just to re-iterate: the top of those three series above is used for unsigned values, and the bottom one is how signed byte values are represented in computer memory. The middle is how we humans think of signed values in the sense of progressing from smallest to largest.

Note that the values 0..127 are represented in exactly the same way signed and unsigned.

Another useful property is that for negative values, the top bit is always 1 (0x80..0xFF).

Bits

So far, we’ve looked at bytes, and combinations of 2 and 4 bytes for larger values. Let’s go the other way now: bytes are 8 bits. On a low level, we need to be able to extract them and mess with them in various ways. That’s where the “&” (AND), “|” (OR), and “^” (XOR) operators in C/C++ come into the picture, as well as “<<” (LSHIFT) and “>>” (RSHIFT).

If variable “a” has the integer value 123, then we can extract its lower 3 bits using any of:

int lower3 = a & 0x07;
int lower3 = a & 7;
int lower3 = a & 0b00000111;

To get the upper 3 bits out, we first shift everything down 5 bits (and discard them):

int lower3 = (a >> 5) & 0x07;
int lower3 = (a >> 5) & 7;
int lower3 = (a >> 5) & 0b00000111;

To get bits 4, 3, and 2 out, we can use any of:

int lower3 = (a >> 2) & 0x07;
int lower3 = (a >> 2) & 7;
int lower3 = (a >> 2) & 0b00000111;

Another (tricky!) way which is equivalent is to first mask the bits, then shift them down:

int lower3 = (a & 0x1C) >> 2;
int lower3 = (a & 28) >> 2;
int lower3 = (a & 0b00011100) >> 2;

Shifting right by 1, is like throwing the lowest bit away. This is the same as dividing by 2 – just as throwing the last digit of “123” away is the same as dividing by 10 for us humans.

Similarly, shifting left by 1 and inserting a zero, is the same as multiplying by 2. Again, in the decimal world, adding a 0 to the end of “123” is the same as multiplying it by 10.

Computers think in powers of 2, we think in powers of 10.

There are many more details, but that’s the gist of it.

More C/C++ notation

An important difference worth going into is “&” vs “&&” and “|” vs “||”. This can be quite confusing in C/C++. The single character operands are bitwise operations, as shown above.

The “&&” also means “and”, and “||” also means “or”, but in a very tricky way:

  • if (a && b) ... means: if a is true then if b is also true, then …
  • if (a || b) ... means: if a is false then if b is also false, then don’t

These operations are for efficiently doing something only when needed. With a && b, if a is not true, then we don’t even check b, since we already know it won’t matter. Similarly, with a || b, if a is true, then we don’t need to check b anymore, the result is already true.

The generated code is more efficient – “&&” and “||” are mostly used in if’s and while’s.

Structs

In C/C++, you can define a new custom structure type called “MyStruct” as follows:

typedef struct {
    int a, b;
    byte c, d;
} MyStruct;

And then, you can define a variable “x” with this:

MyStruct x;

What this does is define a consecutive area of memory containing first the int values “a” and then “b”, followed by the byte values “c” and then “d”. The size of each of these values depends completely on the machine architecture. On an ARM Cortex chip, MyStruct will need 10 bytes (each int 4, and 1 for each byte). On an 8-bit AVR, ints are 2 bytes, so MyStruct will need 6 bytes.

But the idea is the same: the struct defines a “type” with different fields, each having their own “member name” and their own specified type (and hence number of bytes used).

If a new variable “x” is defined as the MyStruct shown above, and it happens to be assigned to RAM address 0x10000010 and up, then on an ARM chip:

  • “x.a” will be at addresses 0x10000010..0x10000013
  • “x.b” will be at addresses 0x10000014..0x10000017
  • “x.c” will be at address 0x10000018
  • “x.d” will be at address 0x10000019

Structs are a way to group data together. The reason for that is pointers. Coming up next.

[Back to article index]