Computing stuff tied to the physical world

Decoding bit fields

In Software on Sep 5, 2013 at 00:01

Prompted by a recent discussion on the forum, and because it’s a recurring theme, I thought I’d go a bit (heh…) into how bit fields work.

One common use-case in Jee-land, is with packets coming in from a roomNode sketch, which are reported by the RF12demo sketch on USB serial as something like:

OK 3 23 125 22 2

The roomNode sketch tightly packs its results into a packet using this C structure:

struct {
    byte light;     // light sensor: 0..255
    byte moved :1;  // motion detector: 0..1
    byte humi  :7;  // humidity: 0..100
    int temp   :10; // temperature: -500..+500 (tenths)
    byte lobat :1;  // supply voltage dropped under 3.1V: 0..1
} payload;

So how do you get the values back out, given just those 4 bytes (plus a header byte)?

To illustrate, I’ll use a slightly more complicated example. But first, some diagrams:



That’s 4 consecutive bytes in memory, and two ways a 32-bit long int can be mapped to it:

  • little-endian (LE) means that the first byte (at #00) points to the little end of the int
  • big-endian (BE) means that the first byte (at #00) points to the big end of the int

Obvious, right? On x86 machines, and also the ATmega, everything is stored in LE format, so that’s what I’ll focus on from now on (PIC and ARM are also LE, but MIPS is BE).

Let’s use this example, which is a little more interesting (and harder) to figure out:

struct {
    unsigned long before :15;
    unsigned long value :12;  // the value we're interested in
    unsigned long after :3;
} payload;

That’s 30 bits of payload. On an ATmega the compiler will use the layout shown on the left:


But here’s where things start getting hairy. If you had used the definition below instead, then you’d have gotten a different packed structure, as shown above on the right:

struct {
    unsigned int before :15;
    unsigned int value :12;  // the value we're interested in
    unsigned int after :3;
} payload;

Subtle difference, eh? The reason for this is that the declared size (long vs int) determines the unit of memory in which the compiler tries to squeeze the bit field. Since an int is 16 bits, and 15 bits were already assigned to the “before” field, the next “value” field won’t fit so the compiler will restart on a fresh byte boundary! IOW, be sure to understand what the compiler does before trying to decode stuff. One way to figure out what’s going on, is to set the values to some distinctive bit patterns and see what comes out as bytes:

payload.before = 0b1;  // bit 0 set
payload.value = 0b11;  // bits 0 and 1 set
payload.after = 0b111; // bits 0, 1, and 2 set

Tomorrow, I’ll show a couple of ways to extract this bit field.

  1. That loose packing looks like a bit of a gotcha waiting to happen. Thanks for the heads up.

    I’m familiar with compilers trying to align data to speed things up, but the language I use most, Delphi (pascal++ if you like) has an explicit “packed” operator to force things up nice and tight, although having said that, it doesn’t actually have support for bit field definitions, so all you’re squashing up is word boundaries to byte boundaries!

    Not that it worries me too much, I grew up on shifts, rotates and bit masks :-)

  2. ~jcw, great writeup :) I was beginning to think everyone was dancing around with bytes according to some arcane intuited knowledge. heh.

    When I first glanced over the post I thought I saw some kind of padding going on. But then I had read it I needed to know all about this irregularity in packing on AVR. What would it serve anyway?

    But I cannot reproduce the example you gave. When I pack both structs, they look to same to me. What am I forgetting? I tried some other packing patterns too, e.g. 3 times a byte-size type into 7-bit fields to see what happens etc. Nothing irregular.

    I assume the skipping-to-byte-boundary might be done at compilation time by AVR-GCC? Is it some compiler flag perhaps? I’m using an Arduino (1.0.1 or something)+Makefile setup on Linux btw.

    • Hmm, maybe avr-gcc isn’t packing the way I thought it did. I have to admit that I didn’t actually check the details on an 8-bit machine such as the ATmega. I’ll investigate, thx.

Comments are closed.