Computing stuff tied to the physical world

C++ templates

In AVR, Software on Jan 12, 2010 at 00:01

A recent post described the performance loss in the Arduino’s digitalRead() and digitalWrite() functions, compared to raw pin access.

Can we do better – i.e. hide the details, yet still get the benefits of raw I/O? Sure.

If you’ve used JeeNodes and in particular the “Ports” library, you’ll have noticed that there is a C++ class which hides the details of each port (i.e. JeeNode port, not ATmega port). Let’s look at that first:

Screen shot 2010-01-06 at 12.40.09.png

I’ve omitted the implementation, but there are still lots of secondary details.

The main point is that this is now C++, and uses a “Port” object as central mechanism. Each object has one byte of data, containing the port number (1..4).

Due to heavy inlining, there is almost no additional overhead for using the Port class over using digitalRead() and digitalWrite(), on which they are based. I verified it by running similar tests as in the recent post about pin I/O:

Screen shot 2010-01-06 at 12.48.50.png

Using the definition “Port orig (1);” – and sure enough the results are nearly the same.

There are two issues which make this approach sub-optimal: using the slow digital read/write calls, and storing the port number in a memory location which needs to be accessed at run time. There is no way for the compiler to optimize such calls, even “orig.digiRead()” should be the same as writing “bitRead(PORTD, 4)” in this example.

That’s where C++ templates come in. Check out this definition of a new “XPort” class (named that way to avoid a name conflict) and an example of use for port 1:

Screen shot 2010-01-06 at 12.54.02.png

(As you can see, I’m switching to a different, and hopefully clearer, API along the way)

There’s some funky <…> stuff going on. We’re in fact not declaring one class, but a whole family of classes, parametrized by the integer included in the <…> notation on the last line.

The big difference, is that each class now has that integer value “built-in”, so to speak. So we can define member functions which directly pass that value on to the corresponding bitRead() and bitWrite() macros. And then all of a sudden, all the overhead vanishes: since the member needs no access to object state, it can be made static, and since all the info is known in the header, it can be made inline as well.

So the above template is C++’s modern way of doing far more at compile time, allowing the optimizer to generate much better code.

Note that templates come with some pitfalls: first of all, it’s very easy to inadvertently generate huge amounts of code, so very careful inlining and base class derivation is essential. The second problem is that templates tend to be “instantiated” as late as possible by the compiler, which can lead to confusing error messages when the templates are wrong or used wrongly.

I’m still just exploring this approach for embedded use. The potential performance gains are substantial enough to give it a serious try. My hope is that the hard work can be done in a library, so that everyone else can just use it and benefit from these gains without having to think much about templates, let alone implement new ones. The “one” object declared above acts like any other C++ object, so using it will be just as easy as non-template objects.

Does the above lead to fast code? You bet. Here’s a test sketch:

Screen shot 2010-01-06 at 13.05.29.png

And here’s some sample output:

Screen shot 2010-01-06 at 13.06.14.png

As you can see, values 5 and 6 are virtually the same as values 7 and 8. We’ve obtained the performance of direct pin access while using a high-level port-style notation to access those pins. This is why templates are so attractive for embedded use.

The timings are different from the previous post because the loops are coded differently. In this case, only the relative comparisons are relevant.

  1. With regard to the XPort API, I still believe in the long run it is not better to abbreviate API terms for the sake of brevity. Everybody likes to abbreviate words differently and keeping how each library abbreviates a certain word wastes time and makes code harder to read. Many IDE’s have word completion so you aren’t having to type more. Just in the above examples there is “d”, “digi”, and “digital”.

    • I’m using “dio” and “aio”, because those are the names on the JeeNode headers. The above is indeed a mishmash, because it’s comparing all the different approaches. I agree that consistency is the least to aim for.

  2. Nice! If you change the PORTD and PORTC constants in your template with the right formulas you could access all Arduino pins and create a generic digitalRead/digitalWrite template. Now,- instead of declaring ‘pins’, you declare templates to the pins.

    The following:

    byte pinLED = 13; digitalWrite(pinLED, HIGH);

    becomes,

    XPin pinLED; pinLED.digitalWrite(HIGH);

    • That’s an interesting convention as well. Here’s a sneak preview of what I’ve been trying out: http://jeelabs.org/viewvc/svn/jeelabs/trunk/avr/JeeLib/JeeLib.h?view=markup

      include <JeeLib .h>

      using namespace Jee::Arduino; digitalWrite(13, 1);

      Am still exploring. One big dilemma is to what extent to maintain Arduino compatibility…

    • It could be dangerous, if I use variables as my pin numbers, without knowing better, the inlines could make the codesize explode.

      I liked your template idea, which forces constants (does it?) With the same type of (pin<8?….:…) code placed in a template you can force a user to either use the XPin method (fast, but constants at compile time) or else swap the parameters slightly and get the Arduino flexibility (with the speed penalty)

  3. I have no experience with all this so far, am just trying to figure out what the tradeoffs are for now. Yes, code size is a concern, but IMO constant pin numbers are the most common (especially if everything built on top also uses similar templates).

    Looks like there are still lots of trade-offs left to discover w.r.t. elegance, simplicity, understandability, and performance. Never a dull moment… :)

Comments are closed.