There was a discussion on the Arduino developer’s mailing list about the impact of a small change to the digitalWrite() function, and for some time I’ve been hearing that digitalWrite() has a huge amount of overhead.
Time to find out.
Here is the sketch I used to measure how often a pin I/O command can be issued using various mechanisms:
The logic is that I’m counting how often the same command can be called between timer overflows, i.e. every 1024 µs (one byte, incrementing @ 16 MHz / 64), before the timer tick count changes again.
And here’s the sample output:
There’s a small amount of jitter, which tells me the loops are syncing up almost exactly on the timer ticks. Interrupts have not been disabled, so the timer interrupt is indeed being serviced – once for each loop.
What these values tell me, is that we can do about:
- 10 analog 10-bit readings per millisecond with analogRead()
- 128 pwm settings per millisecond with analogWrite()
- 220 pin reads per millisecond with digitalRead()
- 224 pin writes per millisecond with digitalWrite()
- 1056 pin reads per millisecond with direct port reads
- 1059 pin writes per millisecond with direct port writes
(I’ve corrected the counts by 1000/1024 to arrive at these millisecond values)
So the Arduino’s digital I/O in IDE version 0017 can do roughly 1/5th the speed of direct port access on a 16 MHz ATmega328.
But WAIT! – There’s a large systematic error in the above calculations, due to the loop overhead. It looks like the loop takes 1024000/1251 = 819 ns overhead, so the actual values are quite different: digitalRead() -> 3712 ns, direct port read -> 151 ns. Now the values are more like 1/25th!
So let’s redo this with more I/O in each loop iteration (all 4 ports):
The sample output now becomes:
With these results we get: one digitalRead() takes 4134 ns, one direct port read takes 83 ns (again correcting for 819 ns loop overhead). The conclusion being that digitalRead() is 50x as slow as direct port reads.
Which one is correct? I don’t know for sure. I retried the direct port read with 16 entries per loop, and got 67 ns, which seems to indicate that a direct port read takes one processor cycle (62.5 ns), as I would indeed expect.
Conclusion: if performance is the goal, then we may need to ditch the Arduino approach.
Update – Based on JimS’s timing code (see comments): digitalRead() = 58 cycles and direct pin read = 1 cycle.
Update #2 – The “1 cycle” mentioned above is indeed what I measured, but incorrect. The bit extraction was probably optimized away. So it looks like direct pin access can’t be more than 29x faster than digitalRead(). As pointed out by WestfW in the comments, digitalRead() and digitalWrite() have predictable performance across all use cases, including when the pin number is variable. In some cases that may matter more than raw speed.
Update #3 – Another caveat – Lies, damn lies, and statistics! – is that the register allocations for the above loops make it extremely difficult to draw exact conclusions. Let me just conclude with: there are order-of-magnitude performance implications, depending on how you do things. As long as you keep that in mind, you’ll be fine.