Computing stuff tied to the physical world

Casting types in C/C++

Now that we have pointers safely tucked into our tool belt, we can explore some of the more esoteric sides of pointers. And yes, dear reader, there are definitely a couple…

In C and C++, pointers are typed. The address could be anything, but the way pointers are declared, you have to include a specification of the type of what it points to.

But what if we need to get at the underlying bytes? For example, say we have some structured data we’re collecting in a remote sensor node:

typedef struct {
    uint8_t light, moved, humi, lobat;
    int16_t temp;
} Payload; 

Side note: the “uint8_t” and “int16_t” types are commonly used in C/C++ to indicate precisely what the type is, i.e. unsigned single-byte and signed double-byte in this case. They are defined in a standard C header file called “stdint.h”.

A quick count tells us that this data structure uses 6 bytes of memory.

How can we send this to another system… as 6 bytes?

Type casts

This is where C/C++ type casts come in: they let you force the compiler to treat something (a constant, variable, or pointer) as something else. It has several uses, but keep in mind that you’re overruling the safety of normal compiler type checking when using type casts.

A cast be be written as “(<type>) <value>“:

  • (uint32_t) 'a' : convert character (1 byte) code “a” to a 32-bit unsigned int
  • (uint8_t*) p : convert a variable p (of any type) to a pointer to uint8\_t‘s
  • (uint32_t) p : convert p (which could be a pointer!) to a 32-bit unsigned int

Casts can also force truncation, dropping bits if the result consists of fewer bytes:

  • (uint8_t) 0x1234 : convert a hex constant to an unsigned byte (i.e. 0x34)

Or transform an unsigned value into a signed one, or vice versa, for example:

  • (uint8_t) -1 : convert the signed int “-1” to an unsigned byte (i.e. “255”)
  • (int8_t) 254 : convert the value “254” to a signed byte (i.e. “-2”)

Despite appearances, none of these casts perform any calculation (other than chopping off a few bits or extending a sign bit, which is almost a no-op for computers. It’s better to view them as “re-interpretations” of a bit pattern. A byte pointer pointing to address 0x0003 can only be used to pass around or to dereference. But when cast to an int, it no longer acts in the same way – the value is the same, but it can now be doubled, squared, whatever.

Numeric types

There is one variant of type casts which differs from the rest. This is when you cast an int to a float, or a double, or vice versa. This does lead to some computational effort in the CPU, to convert integers to floating point numbers, or the other way around:

  • (int) 12.345” is the integer value 12
  • (float) 12345” is the floating point value 12345.0

It’s unfortunate that the same term “casting” is used for all the above. Numeric type casts are transformations which keep the semantics intact as much as possible, i.e. converting between two types of number systems – and very different internal representations!

The first class of casts is a completely different animal, as it exists to overrule the normal compiler checks – providing access to internal details of how data is stored in memory.

Const and volatile

The C/C++ language has a few special cases where casts can also be essential:

  • The “const” type indicates to the compiler that the variable cannot change, allowing the compiler to apply more aggressive optimisation techniques, often leading to more efficient code (smaller, faster, often both) – example:

    const int LED = 12;
    

    This is similar to “#define LED 12” – in some cases one or the other approach is required, but in general it’s best to try and use this “const” notation over #define’s.

  • The “volatile” type indicates to the compiler that the data is not normal memory, and could actually change at unexpected times. Hardware registers are often volatile, and so are variables which get changed in interrupts. Again, it’s a modifier, not a data type in itself, so an example would be:

    volatile uint32_t counter;
    

Both of these are important hints for the compiler. The trouble is that sometimes you want to overrule the logic and force the compiler to ignore these modifiers. As follows:

int myLed = (int) LED;
uint32_t now = (uint32_t) counter;

Casts and pointers

Those last two examples are in fact misleading. They work, but they tend to be useless and redundant because values are copied as a snapshot: neither const nor volatile apply to the result – the compiler knows this and in fact accepts the above without the casts.

This is not the case with pointers, when “const” or “volatile” is used:

const char* text = "abc";
volatile uint8_t* ioReg = ...;

Such declarations are common. The problem is that now these pointers are forced to have those modifier properties. You cannot (for good reason!) enter these statements:

text[1] = 'd';
uint8_t* myPtr = ioReg;

The reasoning here is that these pointers refer to something constant or volatile. Copying such pointers does not change the properties of what it refers to.

But if you really know what you’re doing, you can disable the normal checks:

((char*) text)[1] = 'd';
uint8_t* myPtr = (uint8_t*) ioReg;

Now the compiler will no longer generate an error. But you’re now also completely on your own as to whether the above code will really do what you intended. The safety net is gone.

Notation, notation

There is a lot of cryptic notation in C (and even more so in C++):

  • “x & y” is the logical bit-wise AND operator
  • “x * y” is the arithmetic multiplication operator
  • &var” returns a pointer to var (if var is a type T, then “&var” will be of type T*)
  • *ptr” is the reverse, a dereference (if ptr is a T*, the “*ptr” will be a T)
  • “name(value)” is a function call, with one argument “value”
  • “(T) value” is a cast of value to type T (it’s better to write it as “((T) value)”
  • “a[1]” is the 2nd element of array “a” (since elements start at zero in C/C++)
  • “a” is the address of array “a” (it can also be written as “&a”, same thing)
  • “a” is also the address of a[0], i.e. pointing to the first element of array “a”
  • “a+1” is the address of a[1], i.e. pointing to the next element in array “a”
  • “*(a+1)” is the value of a[1], i.e. the actual element (the same as “a[1]”)

If you think about it for a while, “a” as array without indexing is very much like a pointer.

And now some more type casting:

  • “(int*) a” is a pointer to an int, regardless what type “a” is (it better be meaningful!)
  • “(uint8_t) *(int*) p” is: the “int” at p, dereferenced, and then truncated to a unit8_t

You have to read really carefully when you come across this, a very common idiom:

int* p = ...
int a = *p++;
int b = *p++;

What it means:

  • p is a pointer to some integer, where and what is stored there we don’t know
  • dereference “p” (get the int it points at), put that in “a”, then increment “p” (by 4!)
  • dereference “p” (get the int it points at), put that in “b”, then increment “p” again

So a gets the value of the int at address p, and b gets the value of the next int in memory at a higher address. After these statements have been executed, p has been increased by 8 (assuming an int is 4 bytes, as is the case on ARM Cortex ┬ÁC’s).

Some tricky cases with pointers and auto-increment – subtle details, vast differences:

  • *p++” means: get (or set) what p points to, then increment p (same as “*(p++)”)
  • *(*p)++” means: return *p, then increment what p points to (p itself is unchanged!)
  • *++p” is similar to “*p++“, but it increments p first, then dereferences it
  • ++(*p)” is similar to “(*p)++“, but it increments first, then returns the value

So there you have it – the conciseness of C/C++ at its best and at its worst. It’s been like that for several decades now. And all we can do is to get used to it and learn to live with it.

[Back to article index]