Computing stuff tied to the physical world

Debugging with LEDs

In AVR on Apr 10, 2010 at 00:01

There was a problem in the RF12 driver with startup, with hard-to-reproduce behavior, unfortunately. I couldn’t make heads or tails of it, until more details started coming in via the Jee Labs forumlong live open source!

The problem, as originally identified: sketches don’t start reliably, when power is applied through batteries.

The diagnosis: it’s not related to battery power, it’s due to the RFM12B staying in some sort of “powering-up” limbo mode for several seconds.

Trouble with this sort of problems is that they can be very hard to reproduce, and even harder to debug. It all happens at startup, before any comms with the outside world is possible, and what’s worse: sending out some debug bytes over the serial port may affect the problem, due to the time it takes to do so, or the interrupts taking place, or even RAM memory being changed by the debugging.

They’re called Heisenbugs, and they’re the worst…

In this case, the breakthrough came when I had a “reliably failing” setup: running the lcd_demo.pde sketch and inserting a call to rf12_config() was enough to get a hang after each power-up. Good – now I can chase this bug!

In this case, there was an LCD hooked up, but it gets initialized after the rf12_config() call. All I wanted to know at first, is where the code hangs.

Time for the LED debugger:

Dsc 1331

Two LEDs, inserted in an unused port. Both with 1 kΩ current limiting resistors in series (470 Ω would be brighter). One between DIO and GND, the other between +3V and AIO.

The code to init this is quite simple:

pinMode(5, OUTPUT);
digitalWrite(5, 0);
pinMode(15, OUTPUT);
digitalWrite(15, 1);

I used port 2, i.e. DIO = Arduino pin 5 and AIO = Arduino pin 15. Due to the way these LEDs are connected, the one on the left will light when DIO is set high and the one on the right will light when AIO is set low. The above code starts with both LEDs off.

Now, all I have to do is turn DIO on at the start, and move the code to turn it off further and further into the initalization code:

digitalWrite(5, 1);
digitalWrite(5, 0);

As expected, the LED never turns off again – rf12_config is going haywire.

Then I tried simplifying, replacing rf12_confg() by a call to rf12_intialize(). Same outcome: hangs. So it’s not the EEPROM code.

Tried out several things: delays, sending stuff to the RFM12B driver to try and initialize it better, or even twice – no difference: stil hangs.

Hm. The RF12 driver uses interrupts. Maybe it’s in there (yuck, even harder to debug). I called rf12_inititalize() with 0 as first arg, so attachInterrupt() would not be called – and bingo … no more hang.

So the RFM12B is generating interrupts, at a point when I don’t expect it: i.e. right after initialization. Interrupts should only happen when sending out data bytes (TX FIFO empty) or when receiving them (RX FIFO full).

To make a long story short: the RFM12B is pulling the IRQ line low for some eight seconds after power up. Since the IRQ line is set up as a level interrupt, that means it’s generating interrupts all the time!

No matter what I tried, I couldn’t make the RFM12B stop generating interrupts. But then, after about 8 seconds, it ended up going high and all was fine. Go figure.

So now I’ve added a loop which waits and polls the RFM12B in the init code, until it stops pulling IRQ low. On power up, that takes around 8 seconds. On reset, there is no wait. This should fix the remaining issue reported with hanging of JeeNodes, battery-powered or otherwise, i.e. any scenario which doesn’t use resets to restart the ATmega when the RFM12B is ready. If this affected you, get the latest code and re-compile / upload your sketch.

In hindsight, I’m surprised it worked before this fix!

Case closed. One more victory over bits and silicon. Onwards!