Yesterday’s weblog post documented my attempts to debug a problem while programming the JeeNode Micro with a Flash Board. I really thought it was a hardware bug.
Given that the results where slightly changing as I tried different things, but nevertheless seemed really consistent and repeatable when retrying exactly the same steps over and over again should have caused an alarm bell to ring, but alas it took me quite some time to figure it out: repeatable is a hint that the problem might be a software issue, not hardware.
So the next step was to use the data dump feature from the scope to collect the data written and well as the data being read back. These can be saved as CSV files, and with a bit of scripting, I ended up with this comparison:
This shows three editor windows super-imposed and lined up with, from left to right:
- the data read back
- the data written
- the hex file with the compiled sketch
As you can see, the writes are exactly what they should be – it’s the read which gives junk. And it’s not even that far off – just a few bits!
Looking closer, it is clear that some bits are “0” where they should have been “1”.
Uh, oh – I think I know what’s going on…
It looks like the chip hasn’t gone through an erase cycle! Flash programming works as follows: you “erase” the memory to set all the bits to “1”, and then you “program” it to set some of them to “0”. Programming can only flip bits from “1” to “0”!
And indeed… that was the problem. The Arduino boot loaders are set up to auto-erase on request, so avrdude
is called from the IDE with “bulk erase” disabled. That way, the IDE just sends the pages it wants to program, and the boot loader will make sure the erase is performed just before writing the page.
In this case, however, there is no boot loader: we are using the normal ISP conventions, whereby erasing and programming must be explicitly requested.
It was a matter of dropping the “-d” “-D” flag from the avrdude
command in platform.txt:
And now it works, flawlessly! Doh – this “bug” sure has kept me busy for a while…
“repeatable is a hint”
But only a hint. Debugging requires all the detective investigation abilities you can muster. You get better with time and practice. But even after many years of software writing and fixing you can still be thrown. A nice coverage of a tough “bug”
I greatly appreciate the contributions about bugs hunt and hints how to hunt bugs (like stack painting etc). They are at least as useful as howto (or more), but they are considerably less documented then howtos unfortunately. Thank you!
I’m glad that there are people in the world that have time to investigate these kind of issues. Respect…
Really appreciate your openness and honesty in writing about mistakes that look like bugs .. so very helpful .. this is the sort of trap I might well have walked into. Thank you indeed. Tony, New Zealand.
I guessed correctly! I actually ran into the same problem when I wrote a barebones SPI AVR flasher for my RasberryPi project. It mostly seemed to work, all the data on the wire was looking good, the sketch would partially checksum ok. I pulled my hair out for quite some time until I realized that programming can only unset (0) bits and chip erase is the only way to get them back to set (1).
My aha moment was thinking there was a connection between how the fuse bits operated and the rest of program memory. Again, the only way to set a bit was to do a chip erase.
In fact the -D (not -d) option in avrdude is to prevent it from helping you not forget this: When the -U option with flash memory is specified, avrdude will perform a chip erase before starting any of the programming operations, since it generally is a mistake to program the flash without performing an erase first.