Computing stuff tied to the physical world

Posts Tagged ‘Techniques’

LevelDB, MQTT, and Redis

In Software on Sep 27, 2013 at 00:01

To follow up on yesterday’s post, here are some more interesting properties of LevelDB, in the context of Node.js and HouseMon:

  • The levelup wrapper for Node supports streaming, i.e. traversing a set of entries for reading, as well as sending an entire set of PUTs and/or DELs as a stream of requests. This is a very nice match for Node.js and for interfaces which need to be sent across the network (including to/from browsers).

  • Events are generated for each change to the database. These can easily be translated into a “live stream” of changes. This is really nothing other than publish / subscribe, but again in a form which matches Node’s asynchronous design really well.

  • Standard encoding conventions can be defined for keys and or values. In the case of HouseMon, I’m using UTF-8 as key encoding default and JSON as value encoding. The latter means that the default “stuff” stored in the database is simply a JavaScript object, and this is also what you get back on access and traversal. Individual cases can still override these defaults.

Together, all these features and conventions end up playing together very nicely. One particularly interesting aspect is the “persistent pub-sub” model you can implement with this. This is an area where the “Message Queue Telemetry Transport” MQTT has been gaining a lot of attention for things like home automation and the “Internet of Things” IoT.

MQTT is a message-based system. There are channels, named in a hierarchical manner, e.g. “/location/device/sensor”, to which one can publish measurement values (“publish /kitchen/roomnode/temp 17”). Other devices or software can subscribe to such channels, and get notified each time a value is published.

One interesting aspect of MQTT is its “RETAIN” functionality: for each channel, the last value can be retained, meaning that anyone subscribing to a channel will immediately be sent the last retained value, as if it just happened. This turns an ephemeral message-based setup into a state-based mechanism with change propagation, because you can subscribe to a channel after devices have published new values on it, yet still see what the best known value for that channel is, currently. With RETAIN, the MQTT system becomes a little database, with one value per channel. This can be very convenient after power-up as well.

LevelDB comes from the other side, i.e. a database with change tracking, but the effect is exactly the same: you can subscribe to a set of keys, get all their latest values and be notified from that point on whenever any keys in this range changes. This makes LevelDB and MQTT functionally equivalent, although LevelDB can handle much larger key spaces (note also that MQTT has features which LevelDB lacks, such as QoS).

It’s also interesting to compare this with the Redis database: Redis is another key-value store, but with more data structure options. The main difference with LevelDB, is that Redis is memory-based, with periodic snapshots to disk, whereas LevelDB is disk based, with caching in memory (and thus able to handle datasets which far exceed available RAM). Redis, LevelDB, and MQTT all support pub-sub in some form.

One reason for me to select LevelDB for HouseMon, is that it’s an in-process embedded database. This means that there is no need for a separate process to run alongside Node.js – and hence to need to install anything, or keep a background daemon running.

By switching to LevelDB, the HouseMon installation procedure has been simplified to:

  • download a recent version of HouseMon from GitHub
  • install Node (which includes the “npm” package manager)
  • run “npm install” to get everything needed by HouseMon

And that’s it. At this point, you just start up Node.js and everything will work.

But it’s not just convenience. LevelDB is also so compact, performant, and scalable, that HouseMon can now store all raw data as well as a complete set of aggregated per-hour values without running into resource limits, even on a Raspberry Pi. According to my rough estimates, it should all end up using less than 1 GB per year, so everything will easily fit on an 8 or 16 GB SD card.

The LevelDB database

In Software on Sep 26, 2013 at 00:01

In recent posts, I’ve mentioned LevelDB a few times – a database with some quite interesting properties.

Conceptually, LevelDB is ridiculously simple: a “key-value” store which stores all entries in sorted key order. Keys and values can be anything (including empty). LevelDB is not a toy database: it can handle hundreds of millions of keys, and does so very efficiently.

It’s completely up to the application how to use this storage scheme, and how to define key structure and sub-keys. I’m using a simple prefix-based mechanism: some name, a “~” character, and the rest of the key, of which each field can be separated with more “~” characters. The implicit sorting then makes it easy to get all the keys for any prefix.

LevelDB is good at only a few things, but these work together quite effectively:

  • fast traversal in sorted and reverse-sorted order, over any range of keys
  • fast writing, regardless of which keys are being written, for add / replace / delete
  • compact key storage, by compressing common prefix bytes
  • compact data storage, by a periodic compaction using a fast compression algorithm
  • modifications can be batched, with a guaranteed all-or-nothing success on completion
  • traversal is done in an isolated manner, i.e. freezing the state w.r.t. further changes

Note that there are essentially just a few operations: GET, PUT, DEL, BATCH (of PUTs and DELs), and forward/backward traversal.

Perhaps the most limiting property of LevelDB is that it’s an in-process embedded library, and that therefore only a single process can have the database open. This is not a multi-user or multi-process solution, although one could be created on top by adding a server interface to it. This is the same as for Redis, which runs as one separate process.

To give you a rough idea of how the underlying log-structured merge-tree algorithm works, assume you have a datafile with a sorted set of key-value pairs, written sequentially to file. In the case of LevelDB, these files are normally capped at 2 MB each, so there may be a number of files which together constitute the entire dataset.

Here’s a trivial example with three key-value pairs:

one-level

It’s easy to see that one can do fast read traversals either way in this data, by locating the start and end position (e.g. via binary search).

The writing part is where things become interesting. Instead of modifying the files, we collect changes in memory for a bit, and then write a second-level file with changes – again sorted, but now not only containing the PUTs but also the DELs, i.e. indicators to ignore the previous entries with the same keys.

Here’s the same example, but with two new keys, one replacement, and one deletion:

two-levels

A full traversal would return the entries with keys A, AB, B, and D in this last example.

Together, these two levels of data constitute the new state of the database. Traversal now has to keep track of two cursors, moving through each level at the appropriate rate.

For more changes, we can add yet another level, but evidently things will degrade quickly if we keep doing this. Old data would never get dropped, and the number of levels would rapidly grow. The cleverness comes from the occasional merge steps inserted by LevelDB: once in a while, it takes two levels, runs through all entries in both of them, and produces a new merged level to replace those two levels. This cleans up all the obsolete and duplicate entries, while still keeping everything in sorted order.

The consequence of this process, is that most of the data will end up getting copied, a few times even. It’s very much like a generational garbage collector with from- and to-spaces. So there’s a trade-off in the amount of writing this leads to. But there is also a compression mechanism involved, which reduces the file sizes, especially when there is a lot of repetition across neighbouring key-value pairs.

So why would this be a good match for HouseMon and time series data in general?

Well, first of all, in terms of I/O this is extremely efficient for time-series data collection: a bunch of values with different keys and a timestamp, getting inserted all over the place. With traditional round-robin storage and slots, you end up touching each file in some spot all the time – this leads to a few bytes written in a block, and requires a full block read-modify-write cycle for every single change. With the LevelDB aproach, all the changes will be written sequentially, with the merge process delayed (and batched!) to a different moment in time. I’m quite certain that storage of large amounts of real-time measurement data will scale substantially better this way than any traditional RRDB, while producing the same end results for access. The same argument applies to indexed B-Trees.

The second reason, is that data sorted on key and time is bound to be highly compressible: temperature, light levels, etc – they all tend to vary slowly in time. Even a fairly crude compression algorithm will work well when readings are laid out in that particular order.

And thirdly: when the keys have the time stamp at the end, we end up with the data laid out in an optimal way for graphing: any graph series will be a consecutive set of entries in the database, and that’s where LevelDB’s traversal works at its best. This will be the case for raw as well as aggregated data.

One drawback could be the delayed writing. This is the same issue with Redis, which only saves its data to file once in a while. But this is no big deal in HouseMon, since all the raw data is being appended to plain-text log files anyway – so on startup, all we have to do is replay the end of the last log file, to make sure everything ends up in LevelDB as well.

Stay tuned for a few other reasons on why LevelDB works well in Node.js …

Animation in the browser

In Software on Sep 23, 2013 at 00:01

If you’ve programmed for Mac OS X (my actual experience in that area is fairly minimal), then you may know that it has a feature called Core Animation – as a way to implement all sorts of flashy smooth UI transitions – sliding, fading, colour changes, etc.

The basic idea of animation in this context is quite simple: go from state A to state B by varying one or more values gradually over a specified amount of time.

It looks like web browsers are latching onto this trend as well, with CSS Animations. It has been possible to implement all sorts of animations for some time, using timed JavaScript loops running in the background to simulation this sort of stuff, as jQuery has been doing. Then again, getting such flashy features into a whole set of independently-developed (and competing) web browsers is bound to take more time, but yeah… it’s happening. Here’s an illustration of what text can do in today’s browsers.

In Angular, browser-side animation has recently become quite a bit simpler.

I’m no big fan of attention-drawing screen effects. There was a time when HTML’s BLINK tag got so much, ehm, attention that every page tried to look like a neon sign. Oh, look, a new toy tag, and it’s BLINKING… neat, let’s use it everywhere!

Even slide transitions (will!) get boring after a while.

But with dynamic web pages updating in real time, it seems to me that there is a case to be made for subtle hints that something has just happened. As trial, I used a fade transition (from jQuery) in the previous version of HouseMon to briefly change the background of a changed row to very light yellow, which then fades back to white:

Screen Shot 2013-09-22 at 20.26.32

The effect seems to work well – for me anyway: it’s easy to spot the latest changes in a table, without getting too distracted when looking at some specific data.

In the latest version of HouseMon, I wanted to try the same effect using the new Angular animation feature, and I must say: the basic mechanism is really simple – you make animations happen by defining the CSS start settings, the finish settings, and the time period plus transition style to use. In the case of adding a row to a table or deleting a row, it’s all built-in and done by just adding some CSS definitions, as described here.

In this particular case, things are slightly more involved because it’s not the addition or deletion of a row I wanted to animate, but the change of a timestamp on a row. This requires defining a custom Angular “directive”. Directives are the cornerstone of Angular: they can (re-) define the behaviour of HTML elements or attributes, and let you invent new ones to do just about anything possible in the DOM.

Unfortunately, directives can also be a bit obscure at times. Here is the one I ended up defining for HouseMon:

ng.directive 'highlightOnChange', ($animate) ->                                  
  (scope, elem, attrs) ->                                                        
    scope.$watch attrs.highlightOnChange, ->                                     
      $animate.addClass elem, 'highlight', ->                                    
        attrs.$removeClass 'highlight'   

With it, we can now use a new custom highlightOnChange attribute to an element to implement highlighting when a value on the page changes.

Here is how it gets used (in Jade) on the status table rows shown in yesterday’s post:

tr(highlight-on-change='row.time')

In prose: when row.time changes, start the highlight transition, which adds a custom highlight class and removes it again at the end of the animation sequence.

The CSS then defines the actual animation (using Stylus notation):

.highlight-add
  transition 1s linear all
  background lighten(yellow, 75%)
.highlight-add.highlight-add-active
  background white

I’m leaving out some finicky details, but that’s the essence of it all: set up an $animate call to trigger on the proper conditions, and then define what is going go happen in CSS.

Note that Angular picks name suffixes for the CSS classes during the different phases of animation, but then it can all be used for numerous types of animation: you pick the CSS setting(s) at start and finish, and the browser’s CSS engine will automatically “run” the selected animation – covering motions, colour changes, and lots more (could fonts be emboldened gradually or even be morphed in more sophisticated ways? I haven’t tried it).

The result of all this is a table with subtle row highlights wherever a new reading comes in. Triggered by some Angular magic, but the actual highlighting is a (not-yet-100%-standardised) browser effect.

Working with a CSS grid

In Software on Sep 22, 2013 at 00:01

As promised yesterday, here is some information on how you can get a page layouts off the ground real quick with the Zurb Foundation 4 CSS framework. If you’re more into Twitter Bootstrap, see this excellent comparison and overview. They are very similar.

It took a little while to sink in, until I read this article about how it’s supposed to be used, and skimmed through a 10-part series on webdesign tuts+. Then the coin dropped:

  • you create rows of content, using a div with class row
  • each row has 12 columns to put stuff in
  • you put the sections of your page content in divs with class large-6 and columns
  • each section goes next to the preceding one, so 6 and 6 puts the split in the middle
  • any number from 1 to 12 goes, as long as you use at most 12 columns per row

So far so good, and pretty trivial. But here’s where it gets interesting:

  • each column can have more rows, and each row again offers 12 columns to lay out
  • for mobile use, you can add small-N classes as well, with other column allocations

I haven’t done much with this yet, but this page was trivial to generate:

Screen Shot 2013-09-20 at 00.57.16

The complete definition of that inner layout is in app/status/view.jade:

.row
  .large-9.columns
    .row
      .large-8.columns
        h3 Status
      .large-4.columns
        h3
        input(type='text',ng-model='statusQuery',placeholder='Search...')
    .row
      .large-12.columns
        table(width='100%')
          tr
            th Driver
            th Origin
            th Name
            th Title
            th Value
            th Unit
            th Time
          tr(ng-repeat='row in status | filter:statusQuery | orderBy:"key"')
            td {{row.type}}
            td {{row.tag}}
            td {{row.name}}
            td {{row.title}}
            td(align='right') {{row.value}}
            td {{row.unit}}
            td {{row.time | date:"MM-dd   HH:mm:ss"}}

In prose:

  • the information shown uses the first 9 columns, (the remaining 3 are unused)
  • in those 9 columns, we have a header row and then the actual data table as row
  • the header is a h1 header on the left and an input on the right, split as 2:1
  • the table takes the full width (of those 9 columns in which it resides)
  • the rest is standard HTML in Jade notation, and an Angular ng-repeat loop
  • there’s also live row filtering in there, using some Angular conventions
  • the alternating row backgrounds and general table style are Foundation’s defaults

Actually, I left out one thing: the entries automatically get highlighted whenever the time stamp changes. This animation is a nice way to draw attention to updates – stay tuned.

Responsive CSS layouts

In Software on Sep 21, 2013 at 00:01

One of the things today’s web adds to the mix, is the need to address devices with widely varying screen sizes. Seeing a web site made for an “old” 1024×768 screen layout squished onto a mobile phone screen is not really great if everything just ends up smaller on-screen.

Enter grid-based CSS frameworks and responsive CSS.

There’s a nice site to quickly try designs at different sizes, at cybercrab.com/screencheck. Here’s how my current HouseMon screen looks on two different simulated screens:

Screen Shot 2013-09-19 at 23.45.02

Screen Shot 2013-09-20 at 00.17.54

It’s not just reflowing CSS elements (the bottom line stacks vertically on a small screen), it also completely alters the behaviour of the navigation bar at the top. In a way which works well on small screens: click in the top right corner to fold out the different menu choices.

Both Zurb Foundation and Twitter Bootstrap are “CSS frameworks” which provide a huge set of professionally designed UI elements – and now also do this responsive thing.

The key point here is that you don’t need to do much to get all this for free. It takes some fiddling and googling to set all the HTML sections up just right, but that’s it. In my case, I’m using Angular’s “ng-repeat” to generate the navigation bar menu entries on-the-fly:

nav.top-bar(ng-controller='NavCtrl')
  ul.title-area
    li.name
      h1
        a(ng-href='{{appInfo.home}}')
          img(src='/main/logo.png',height=22,width=22)
          |   {{appInfo.name}}
    li.toggle-topbar.menu-icon
      a(href='#')
        span
  section.top-bar-section
    ul.left
      li.myNav(ng-repeat='nav in navbar')
        a(ng-href='{{nav.route}}') {{nav.title}}
      li.divider
    ul.right
      li.myNav
        a(href='/admin') Admin

Yeah, ok, pretty tricky stuff. But it’s all documented, and nowhere near the effort it’d take to do this sort of thing from scratch. Let alone get the style and visual layout consistent and working across all the different browsers and versions.

Tomorrow, I’ll describe the way grid layouts work with Zurb Foundation 4 – it’s sooo easy!

In praise of AngularJS

In Software on Sep 20, 2013 at 00:01

This is how AngularJS announces itself on its home page:

HTML enhanced for web apps!

Hm, ok, but not terribly informative. Here’s the next paragraph:

HTML is great for declaring static documents, but it falters when we try to use it for declaring dynamic views in web-applications. AngularJS lets you extend HTML vocabulary for your application. The resulting environment is extraordinarily expressive, readable, and quick to develop.

That’s better. Let me try to ease you into this “framework” on which HouseMon is based:

  • Angular (“NG”) is client-side only. It doesn’t exist on the server. It’s all browser stuff.
  • It doesn’t take over. You could use NG for parts of a page. HouseMon is 100% NG.
  • It comes with many concepts, conventions, and terms you must get used to. They rock.

I can think of two main stumbling blocks: “Dependency Injection” and all that “service”, “factory”, and “controller” stuff (not to mention “directives”). And Model-View-Something.

It’s not hard. It wasn’t designed to be hard. The terminology is pretty clever. I’d say that it was invented to produce a mental model which can be used and extended. And like a beautiful clock, it doesn’t start ticking until all the little pieces are positioned just right. Then it will run (and evolve!) forever.

Dependency injection is nothing new, it’s essentially another name for “require” (Node.js), “import” (Python), or “#include” (C/C++). Let’s take a fictional piece of CoffeeScript code:

value = 123
myCode = (foo, bar) -> foo * value + bar

console.log myCode(100, 45)

The output will be “12345”. But with dependency injection, we can do something else:

value = 123
myCode = (foo, bar) -> foo * value + bar

define 'foo', 100
define 'bar', 45
console.log inject(myCode)

This is not working code, but it illustrates how we could define things in some more global context, then call a function and give it access to that context. Note that the “inject” call does something very magical: it looks at the names of myCode’s arguments, and then somehow looks them up and finds the values to pass to myCode.

That’s dependency injection. And Angular uses it all over the place. It will often call functions through a special “injector” and get hold of all args expected by the function. One reason this works is because closures can be used to get local scope information into the function. Note how myCode had access to “value” via a normal JavaScript closure.

In short: dependency injection is a way for functions to import all the pieces they need. This makes them extremely modular (and it’s great for test suites, because you can inject special versions and mock objects to test each piece of code outside its normal context).

The other big hurdle I had to overcome when starting out with Angular, is all those services, providers, factories, controllers, and directives. As it turns out, that too is much simpler than it might seem:

  • Angular code is packaged as “modules” (I actually only use a single one in HouseMon)
  • in these modules, you define services, etc – more on this in a moment
  • the whole startup is orchestrated in a specific way: boot, config, run, fly
  • ok, well, not “fly” – that’s just my name for it…

The startup sequence is key – to understand what goes where, and what gets called when:

  • the browser starts by loading an HTML page with <script> tags in it (doh)
  • all the code needed on the browser side has to be loaded up front
  • what this code does is define modules, services, and so on
  • nothing is really “running” at this point, these are all function definitions
  • the last step is to call angular.bootstrap(document, ['myApp']);
  • at this point, all the config sections defined in the modules are run
  • once that is over, all the run sections are run, this is the real application start
  • when that is done, we in essence enter The Big Event Loop in the browser: lift off!

So on the client side, i.e. the browser, the entire application needs to be structured as a set of definitions. A few things need to be done early on (config), a few more once everything has been defined and Angular’s “$rootScope” has been put in place (read: the main page context is ready, a bit like the DOM’s “document.onload”), and then it all starts doing real work through the controllers tied to various HTML elements in the DOM.

Providers, factories, services, and even constants and values, are merely variations on a theme. There is an excellent page by Matt Haggard describing these differences. It’s all smoke and mirrors, really. Everything except directives is a provider in Angular.

Angular is like the casing of a watch, with a powerful pre-loaded spring already in place. You “just” have to figure out the role of the different pieces, put a few of them in place, and it’ll run all by itself from then on. It’s one of the most elegant frameworks I’ve ever seen.

With HouseMon, I’ve put the main pieces in place to show a web page in the browser with a navigation top bar, and the hooks to tie into Primus and RPC. The rest is all up to the modules, i.e. the subfolders added to the app/ folder. Each client.coffee file is an Angular module. HouseMon will automatically load them all into the browser (via a /primus/primus.js file generated on-the-fly by Primus). This has turned out to be extremely modular and flexible – and it couldn’t have been done without Angular.

If you’re looking for a “way in” for the HouseMon client-side code, then the place to start is app/index.jade, which uses Jade’s extend mechanism to tie into main/layout.jade, so that would be the second place to start. After that, it’s all Angular-structured code, i.e. views, view templates with controllers, and the services they call.

I’m quite excited by all this, because I’ve never before been able to develop software in such a truly modular fashion. The interfaces are high-level, clean (and clear) and all future functionality can now be added, extended, replaced, or even removed again – step by step.

Ground control to… browser

In Software on Sep 19, 2013 at 00:01

After yesterday’s RPC example, which makes the server an easy-to-use service in client code, let’s see how to do things in the other direction.

I should note that there is usually much less reason to do this, since at any point in time there can be zero, one, or occasionally more clients connected to the server in the first place. Still, it could be useful for a server to collect client status info, or to periodically save some state. The benefit of server-initiated requests being that it turns authentication around, assuming we trust the server.

Ok, so here’s some silly code on the client which I’d like to be able to call from the server:

ng = angular.module 'myApp'

ng.config ($stateProvider, navbarProvider, primus) ->
  $stateProvider.state 'view1',
    url: '/'
    templateUrl: 'view1/view.html'
    controller: 'View1Ctrl'
  navbarProvider.add '/', 'View1', 11
  
  primus.client.view1_twice = (x) ->
    2 * x

ng.controller 'View1Ctrl', ->

This is the “View 1” module, similar to the Admin module, exposing code for server use:

  primus.client.view1_twice = (x) ->
    2 * x

On the server side, the first thing to note is that we can’t just call this at any point in time. It’s only available when a client is connected. In fact, if more than one client is connected, then there will be multiple instances of this code. Here is app/view1/host.coffee:

module.exports = (app, plugin) ->
  app.on 'running', (primus) ->
    primus.on 'connection', (spark) ->
      
      spark.client('view1_twice', 123).then (res) ->
        console.log 'double', res

The output on the server console, whenever a client connects, is – rather unsurprisingly:

    double 246

What’s more interesting here, is the logic of that server-side code:

  • the app/view1/host.coffee exports one function, accepting two args
  • when loaded on startup, it will be called with the app and Primus plugin object
  • we’re in setup mode at this point, so this is a good time to define event handlers
  • the app.on 'running' handler gets called when all server setup is complete
  • now we can set up a handler to capture all new client connections
  • when such a connection comes in, Primus will hand us a “spark” to talk to
  • a spark is essentially a WebSocket session, it supports a few calls and streaming
  • now we can call the ‘view1_twice’ code on the client via RPC
  • this creates and returns a promise, since the RPC will take a little bit of time
  • we set it up so that when the promise is fulfilled, console.log ... gets called

And that’s it. As with yesterday’s setup, all the communication and handling of asynchronous callbacks is taken care of behind the scenes.

There is some asymmetry in naming conventions and the uses of promises, but here’s the summary:

  • to define a function ‘foo’ on the host, define it as app.host.foo = …
  • to call that function on the client, use host ‘foo’, …
  • the result is an Angular promise, use “.then …” to pick it up if needed
  • to define a function ‘bar’ on the client, define it as primus.client = …
  • to call that function from the server, use spark.client ‘bar’, …
  • the result is a q promise, use “.then …” to pick it up

Ok, so there are some slight differences between the two promise API’s. But the main benefit of this all is that the passing of time and networking involved is completely abstracted away. Maybe I’ll be able to unify this further, some day.

Promise-based RPC

In Software on Sep 18, 2013 at 00:01

Or maybe this post should have been titled: “Taming asynchronous RPC”.

Programming with asynchronous I/O is tricky. The proper solution IMO, is to have full coroutine or generator support in the browser and in the server. This has been in the works from some time now (see ES6 Harmony). In short: coroutines and generators can “yield”, which means something that takes time can suspend the current stack state until it gets resumed. A bit like green (non-preemptive) threads, and with very little overhead.

But we ain’t there yet, and waiting for all the major browsers to reach that point might be a bit of a stretch (see the “Generators (yield)” entry on this page). The latest Node.js v0.11.7 development release does seem to support it, but that’s only half the story.

So promises it is for now. On the server side, this is available via Kris Kowal’s q package. On the client side AngularJS includes promises as $q. And as mentioned before, I’m also using q-connection as promise-based Remote Procedure Call RPC mechanism.

The result is really neat, IMO. First, RPC from the client, calling a function on the server. This is app/admin/client.coffee for the client side:

ng = angular.module 'myApp'

ng.config ($stateProvider) ->
  $stateProvider.state 'admin',
    url: '/admin'
    templateUrl: 'admin/view.html'
    controller: 'AdminCtrl'

ng.controller 'AdminCtrl', ($scope, host) ->
  $scope.hello = host 'admin_dbinfo'

This Angular boilerplate defines an Admin page with app/admin/view.jade template:

h3 Storage use
table
  tr
    th Group
    th Size
  tr(ng-repeat='(group,size) in hello')
    td {{group}}
    td(align='right') ≈ {{size}} b

The key is this one line:

$scope.hello = host 'admin_dbinfo'

That’s all there is to doing RPC with a round-trip over the network. And here’s the result, as shown in the browser (formatted with the Zurb Foundation CSS framework):

Screen Shot 2013-09-17 at 13.37.33

There’s more to it than meets the eye, but it’s all nicely tucked away: the host call does the request, and returns a promise. Angular knows how to deal with promises, so it will fill in the “hello” field when the reply arrives, asynchronously. IOW, the above code looks like synchronous code, but does lots of things at another point in time than you might think.

On the server, this is the app/admin/host.coffee code:

Q = require 'q'

module.exports = (app, plugin) ->
  app.on 'setup', ->

    app.host.admin_dbinfo = ->
      q = Q.defer()
      app.db.getPrefixDetails q.resolve
      q.promise

It’s slightly more involved because promises are more exposed. First, the RPC call is defined on application setup. When called, it creates a new promise, calls getPrefixDetails with a callback, and immediately returns this promise to fill in the result later.

At some point, getPrefixDetails will call the q.resolve callback with the result as argument, and the promise becomes fulfilled. The reply is then sent to the client by the q-connection package.

Note that there are three asynchronous calls involved in total:

    rpc call -> network send -> database action -> network reply -> show
                ============    ===============    =============

Each of the underlined steps is asynchronous and based on callbacks. Yet the only step we had to deal with explicitly on the server was that custom database action.

Tomorrow, I’ll show RPC in the other direction, i.e. the server calling client code.

Flow – the inside perspective

In Software on Sep 17, 2013 at 00:01

This is the last of a four-part series on designing big apps (“big” as in not embedded, not necessarily many lines of code – on the contrary, in fact).

The current 0.8 version of HouseMon is my first foray into the world of streams and promises, on the host server as well as in the browser client.

First a note about source code structure: HouseMon consists of a set of subfolders in the app folder, and differs from the previous SocketStream-based design in that dependency info, client-side code, client-side layout files + images, and host-side code are now all grouped per module, instead of strewn across client, static, and server folders.

The “main” module starts the ball rolling, the other modules mostly just register themselves in a global app “registry”, which is simply a tree of nested key/value mappings. Here’s a somewhat stripped down version of app/main/host.coffee:

module.exports = (app, plugin) ->
  app.register 'nodemap.rf12-868,42,2', 'testnode'
  app.register 'nodemap.rf12-2', 'roomnode'
  app.register 'nodemap.rf12-3', 'radioblip'
  # etc...

  app.on 'running', ->
    Serial = @registry.interface.serial
    Logger = @registry.sink.logger
    Parser = @registry.pipe.parser
    Dispatcher = @registry.pipe.dispatcher
    ReadingLog = @registry.sink.readinglog

    app.db.on 'put', (key, val) ->
      console.log 'db:', key, '=', val

    jeelink = new Serial('usb-A900ad5m').on 'open', ->

      jeelink # log raw data to file, as timestamped lines of text
        .pipe(new Logger) # sink, can't chain this further

      jeelink # log decoded readings as entries in the database
        .pipe(new Parser)
        .pipe(new Dispatcher)
        .pipe(new ReadingLog app.db)

This is all server-side stuff and many details of the API are still in flux. It’s reading data from a JeeLink via USB serial, at which point we get a stream of messages of the form:

{ dev: 'usb-A900ad5m', msg: 'OK 2 116 254 1 0 121 15' }

(this can be seen by inserting “.on('data', console.log)” in the above pipeline)

These messages get sent to a “Logger” stream, which saves things to file in raw form:

L 17:18:49.618 usb-A900ad5m OK 2 116 254 1 0 121 15

One nice property of Node streams is that you can connect them to multiple outputs, and they’ll each receive these messages. So in the code above, the messages are also sent to a pipeline of “transform” streams.

The “Parser” stream understands RF12demo’s text output lines, and transforms it into this:

{ dev: 'usb-A900ad5m',
  msg: <Buffer 02 74 fe 01 00 79 0f>,
  type: 'rf12-868,42,2' }

Now each message has a type and a binary payload. The next step takes this through a “Dispatcher”, which looks up the type in the registry to tag it as a “testnode” and then processes this data using the definition stored in app/drivers/testnode.coffee:

module.exports = (app) ->

  app.register 'driver.testnode',
    in: 'Buffer'
    out:
      batt:
        title: 'Battery status', unit: 'V', scale: 3, min: 0, max: 5

    decode: (data) ->
      { batt: data.msg.readUInt16LE 5 }

A lot of this is meta information about the results decoded by this particular driver. The result of the dispatch process is a message like this:

{ dev: 'usb-A900ad5m',
  msg: { batt: 3961 },
  type: 'testnode',
  tag: 'rf12-868,42,2' }

The data has now been decoded. The result is one parameter in this case. At the end of the pipeline is the “ReadingLog” write stream which timestamps the data and saves it into the database. To see what’s going on, I’ve added an “on ‘put'” handler, which shows all changes to the database, such as:

db: reading~testnode~rf12-868,42,2~1379353286905 = { batt: 3961 }

The ~ separator is a convention with LevelDB to partition the key namespace. Since the timestamp is included in the key, this storage will store each entry in the database, i.e. as historical storage.

This was just a first example. The “StatusTable” stream takes a reading such as this:

db: reading~roomnode~rf12-868,5,24~1379344595028 = {
  "light": 20,
  "humi": 68,
  "moved": 1,
  "temp": 143
}

… and stores each value separately:

db: status~roomnode/rf12-868,5,24/light = {
  "key":"roomnode/rf12-868,5,24/light",
  "name":"light",
  "value":20,
  "type":"roomnode",
  "tag":"rf12-868,5,24",
  "time":1379344595028
}
// and 3 more...

Here, the information is placed inside the message, including the time stamp. Keys must be unique, so in this case only the last value will be kept in the database. In other words: this fills a “status” table with the latest value of each measurement value.

As last example, here is a pipeline which replays a saved log file, decodes it, and treats it as new readings (great for testing, but useless in production, clearly):

createLogStream = @registry.source.logstream

createLogStream('app/replay/20121130.txt.gz')
  # .pipe(new Replayer)
  .pipe(new Parser)
  .pipe(new Dispatcher)
  .pipe(new StatusTable app.db)

Unlike the previous HouseMon design, this now properly deals with back-pressure.

The “Replayer” stream is a fun one: it takes each message and delays passing it through (with an updated timestamp), so that this stream will simulate a real-time feed with new messages trickling in. Without it, the above pipeline processes the file as fast as it can.

The next step will be to connect a change stream through the WebSocket to the client, so that live status information can be displayed on-screen, using exactly the same techniques.

As you can see, it’s streams all the way down. Onwards!

Flow – the application perspective

In Software on Sep 16, 2013 at 00:01

This is the third of a four-part series on designing big apps (“big” as in not embedded, not necessarily many lines of code – on the contrary, in fact).

Because I couldn’t fit it in three parts after all.

Let’s talk about the big picture, in terms of technologies. What goes where, and such.

A decade or so ago, this would have been a decent model of how web apps work, I think:

app1

All the pages, styles, images, and some code fetched as HTTP requests, rendered on the server, which sits between the persistent state on the server (files and databases), and the network-facing end.

A RESTful design would then include a clean structure w.r.t how that state on the server is accessed and altered. Then we throw in some behind-the-secnes logic with Ajax to make pages more responsive. This evolved to general-purpose client-side JavaScript libraries like jQuery to get lots of the repetitiveness out of the developer’s workflow. Great stuff.

The assumption here is that servers are big and fast, and that they work reliably, whereas clients vary in capabilities, versions, and performance, and that network connection stability needs to stay out of the “essence” of the application logic.

In a way, it works well. This is the context in which database-backed websites became all the rage: not just a few files and templating to tweak the website, but the essence of the application is stored in a database, with “widgets”, “blocks”, “groups”, and “panes” sewn together by an ever-more elaborate server-side framework – Ruby on Rails, Django, Drupal, etc. WordPress and Redmine too, which I gratefully rely on for the JeeLabs sites.

But there’s something odd going on: a few days ago, the WordPress server VM which runs this daily weblog here at JeeLabs crashed on an out-of-memory error. I used to reserve 512 MB RAM for it, but had scaled it back to 256 MB before the summer break. So 256 MB is not enough apparently, to present a couple of weblog pages and images, and handle some text searches and a simple commenting system.

(We just passed 6000 comments the other day. It’s great to see y’all involved – thanks!)

Ok, so what’s a measly 512 MB, eh?

Well, to me it just doesn’t add up. Ok, so there’s a few hundred MB of images by now. Total. The server is running off an SSD disk, so we should be able to easily serve images with say 25 MB of RAM. But why does WP “need” hundreds of MB’s of RAM to serve a few dozen MB’s of daily weblog posts? Total.

It doesn’t add up. Self-inflicted overhead, for what is 95% a trivial task: serving the same texts and images over and over again to visitors of this website (and automated spiders).

The Redmine project setup is even weirder: currently consuming nearly 700 MB of RAM, yet this ought to be a trivial task, which could probably be served entirely out of say 50 MB of RAM. A tenfold resource consumption difference.

In the context of the home monitoring and automation I’d like to take a bit further, this sort of resource waste is no longer a luxury I can afford, since my aim is to run HouseMon on a very low-end Linux board (because of its low power consumption, and because it really doesn’t do much in terms of computation). Ok, so maybe we need a bit more than 640 KB, but hey… three orders of magnitude?

In this context, I think we can do better. Instead of a large server-side byte-shuffling engine, we can now do this – courtesy of modern browsers, Node.js, and AngularJS:

app2

The server side has been reduced to its bare minimum: every “asset” that doesn’t change gets served as is (from files, a database, whatever). This includes HTML, CSS, JavaScript, images, plain text, and any other “document” type blob of information. Nginx is a great example of how a tiny web server based on async I/O and written in C can take care of any load we down-to-earth netizens are likely to encounter.

Let me stress this point: there is no dynamic “templating” or “page generation” on the server side. This takes place in the browser – abstracted away by AngularJS and the DOM.

In parallel, there’s a “Real-Time Server” running, which handles the application logic and everything that needs to be dynamic: configuration! status! measurements! commands! I’m using Node.js, and I’ve started using the fascinating LevelDB database system for all data persistence. The clever bit of LevelDB is that it doesn’t just fetch and store data, it actually lets you keep flows running with changes streamed in and out of it (see the level-livefeed and multilevel projects).

So instead of copying data in and out of a persistent data store, this also becomes a way of staying up to date with all changes on that data. The fusion of a database and pubsub.

On the client side, AngularJS takes care of the bi-directional flow between state (model) and display (view), and Primus acts as generic connection library for getting all this activity across the net – with connection keep-alive and automatic reconnect. Lastly, I’m thinking of incorporating the q-connection library for managing all asynchronicity. It’s fascinating to see network round trip delays being woven into the fabric of an (async / event-driven) JavaScript application. With promises, client/server apps are starting to feel like a single-system application again. It all looks quite, ehm… promising :)

Lots of bits that need to work together. So far, I’m really delighted by the flexibility this offers, and the fact that the server side is getting simpler: just the autonomous data acquisition, a scalable database, a responsive WebSocket server to make the clients (i.e. browsers) come alive, and everything else served as static data. State is confined to the essence of the app. The rest doesn’t change (other than during development of course), needs no big backup solution, and the whole kaboodle should easily fit on a Raspberry Pi – maybe something even leaner than that, one day.

The last instalment tomorrow will be about how HouseMon is structured internally.

PS. I’m not saying this is the only way to go. Just glad to have found a working concoction.

Flow – the developer perspective

In Software on Sep 15, 2013 at 00:01

This is the second of a three-part four-part series on designing big apps (“big” as in not embedded, not necessarily many lines of code – on the contrary, in fact).

Software development is all about “flow”, in two very different ways:

I’ve been looking at lots of options on how to address that first bit, i.e. how to set up an environment which can help me through the edit-run-debug cycle as smoothly and quickly as possible. I didn’t find anything that really fits my needs, so as any good ol’ software developer would do, I scratched this itch recently and ended up creating a new tool.

It’s all about flow. Iterating through the coding process without pushing more buttons or keys than needed. Here’s how it works – there are three kinds of files I need to deal with:

  • HTML – app pages and docs, in the form of Jade and Markdown text files
  • CSS – layout and visual design, in the form of Stylus text files
  • CODE – the logic of it all, host- and client-side, in the form of CoffeeScript text files

Each of these are not the real thing, in the sense that they are all dialects which need to be translated to HTML, CSS, and JavaScript, respectively. But I don’t want a solution which introduces lots of extra steps into that oh-so treasured edit-run-debug cycle of software development. No temporary files, please – let’s generate everything as needed on-the-fly, and let’s set up a single task to take care of it all.

The big issue here is when to do all that pre- (and re-) processing. Again, since staying in the flow is the goal, that really should happen whenever needed and as quickly as possible. The workflow I’ve found to suit me well is as follows:

  • place all the development source files into one nested area (doh!)
  • have the system watch for changes to any files in this area
  • if it’s a HTML file change: reload the browser
  • if it’s a CSS file change: update the browser (no need to reload)
  • if it’s a CODE file change: restart the server and reload the browser

The term for this is “live reload”. It’s been in use for ages by web developers. You save a file, and BAM … the browser updates itself. But that’s only half the story with a client + server setup, as you also may have to restart the server. In the past, I’ve done so using the nodemon utility. It works, but along with other tools too watch for changes and pre-compile to the desired format, it all got a bit clunky.

In the new HouseMon project, it’s all done behind the scenes in a single process. Well, two actually: when in development mode, HouseMon forks itself into a supervisor and a worker child. The supervisor just restarts the worker whenever it exits. And the worker watches for files changes and either reloads or adjusts the browser, or exits. The latter becomes a way for the worker to quickly restart itself from scratch due to a code change.

The result works phenomenally well: I edit in MacVim, multiple files even, and nothing happens. I can bring up the Dash reference guides to look up something. Still nothing happens. But when I switch away from MacVim, it’s set up to save all changes, and everything then kicks in on auto-pilot. Instantly or a few seconds later (depending on the change), the latest changes will be running. The browser may lose a little of its context, but the URL stays the same, so I’m still looking at the same page. The console is open, and I can see what’s going on there. Without ever switching to the browser (ok, I need to switch away from MacVim to something else, but often that will be the command line).

It’s not yet the same as live in-app development (as was possible with Tcl), but it’s getting mighty close, because the app returns to where it was before without pushing any buttons.

There’s one other trick in this setup which is a huge time-saver: whenever the supervisor is launched (“node .“), it scans through all the folders inside the main ./app folder, and collects npm and bower package files as well as Makefile files. All dependencies and requirements are then processed as needed, making the use of 3rd party packages virtually effortless – on the server and in the client. Total automation. Look ma, no hands!

None of this machinery needs to be present in production mode. All this reloading and package dependency handling stuff can then be ignored. Deployment just needs a bunch of static files, served through Node.js, Nginx, Apache, whatever – plus a light-weight Node.js server for the host-side logic, using WebSockets or some fallback mechanism.

As for the big picture on how HouseMon itself is structured… more coming, tomorrow.

Flow – the user perspective

In Software on Sep 14, 2013 at 00:01

This is the first of a three-part four-part series on designing big apps (“big” as in not embedded, not necessarily many lines of code – on the contrary, in fact).

Ok, so maybe what follows is not the user perspective, but my user perspective?

I’m not a fan of pushing clicking buttons. It’s a computer invention, and it sucks.

Interfaces (I’m not keen on the term user interfaces either, but it’s more to the point than just “interfaces”) – anyway, interfaces need interaction to indicate what information you want to see, and to perform a real action, i.e. something that has effect on some permanent state. From setting up preferences or a configuration, to turning on a light.

But apart from that, interaction to fetch or update something is tedious, unwanted, silly, and a waste of time. Pushing a button to see the time is 2,718,281 times more inconvenient than looking at the time. Just like asking for something involves a lot more interaction (and social subtleties) than having the liberty to just do it.

When I look outside the window, I expect to see the actual weather. And when I see information on a screen, I expect it to be up-to-date. I don’t care if the browser does constant refreshes to pull changes from a server or whether it’s set up to respond to push notifications coming in behind the scenes. When I see “22.7°C”, it needs to be a real reading, with a known and acceptable lag and accuracy – unless it’s historical data.

This goes a lot further than numbers and graphs. When I see a button, then to me that is a promise (and invitation) that some action will be taken when I push it. Touch on mobile devices has the advantage here, although clicking via the mouse or typing a power-user keyboard shortcut is often perfectly acceptable – even if slightly more indirect.

What I don’t want is a button which does nothing, or generates an error message. If that button isn’t intended to be used, then it should be disabled or it shouldn’t be there at all:

Screen Shot 2013-09-13 at 11.56.54

Screen Shot 2013-09-13 at 11.57.17

That’s the “Graphs” install page in HouseMon (the wording could have been better).

When a list or graph of readings is shown on-screen, this needs to auto-update in real time. That’s the whole point of HouseMon – access to specific information (that choice is a preference, and will require interaction) to see what’s going on. Now – or within the last few minutes, in the case of sensors which are periodically sending out their readings. A value without an indication of its “up-to-date-ness” is useless. That could be either implicit by displaying it in some special way if the information is stale (slowly turning red?), or explicit, by displaying a time stamp next to it (a bit ugly and tedious).

Would you want it any other way when doing online banking? Would you accept seeing your account balance without guarantee that it is recent?nah, me neither :) – so why do we put up with web pages with copies of some information, at some point in time, getting obsolete the very moment that page is shown?

When “on the web” – as a user – I want to deal with accurate information. When I see something tagged as “updated 2 minutes ago”, then I want to see that “2” change into a “3” within the next minute. See GitHub for an example of this. It works, it makes sense, and it makes the whole technology get out of the way: the web should be an intermediary between some information and me. All the stuff in between is secondary.

Information needs to update – we need to stop copying it into a web page, and send that copy off to the browser. All the lipstick in the world won’t help if what we see is a copy of what we’re looking for. As developers, we need to stop making web sites fancy – we should make them live first and then make them gorgeous. That’s what RIA‘s and SPA‘s are about.

One more thing – not only does information need to flow, these flows should be visualised:

Screen Shot 2013-09-13 at 11.02.15

Such a user interface can be used during development to manage data flows and to design software which ties each result shown on-screen back to its origin. In tomorrow’s apps, every change should propagate – all the way to the pixels shown on-screen by the browser.

Now let’s go back to static web servers to make it happen. Stay tuned…

Decoding bit fields – part 2

In Software on Sep 6, 2013 at 00:01

To follow up on yesterday’s post, here is some code to extract a bit field. First, a refresher:

bytes5

The field marked in blue is the value we’d like to extract. One way to do this is to “manually” pick each of the pieces and assemble them into the desired form:

bytes4

Here is a little CoffeeScript test version:

buf = new Buffer [23, 125, 22, 2]

for i in [0..3]
  console.log "buf[#{i}] = #{buf[i]} = #{buf[i].toString(2)} b"

x = buf.readUInt32LE 0
console.log "read as 32-bit LE = #{x.toString 2} b"

b1 = buf[1] >> 7
b2 = buf[2]
b3 = buf[3] & 0b111
w = b1 + (b2 << 1) + (b3 << 9)
console.log "extracted from buf: #{w} = #{w.toString 2} b"

v = (x >> 15) & 0b111111111111
console.log "extracted from int: #{v} = #{v.toString 2} b"

Or, if you prefer JavaScript, the same thing:

var buf = new Buffer([23, 125, 22, 2]);

for (var i = 0; i <= 3; ++i) {
  console.log("buf["+i+"] = "+buf[i]+" = "+buf[i].toString(2)+" b");
}

var x = buf.readUInt32LE(0);
console.log("read as 32-bit LE = "+x.toString(2)+" b");

var b1 = buf[1] >> 7;
var b2 = buf[2];
var b3 = buf[3] & 0x7;
var w = b1 + (b2 << 1) + (b3 << 9);
console.log("extracted from buf: "+w+" = "+w.toString(2)+" b");

var v = (x >> 15) & 0xfff;
console.log("extracted from int: "+v+" = "+v.toString(2)+" b");

The output is:

    buf[0] = 23 = 10111 b
    buf[1] = 125 = 1111101 b
    buf[2] = 22 = 10110 b
    buf[3] = 2 = 10 b
    read as 32-bit LE = 10000101100111110100010111 b
    extracted from buf: 1068 = 10000101100 b
    extracted from int: 1068 = 10000101100 b

This illustrates two ways to extract the data: “w” was extracted as described above, but “v” used a trick: first use built-in logic to extract an integer field which is too big (but with all the bits in the right order), then use bit shifting and masking to pull out the required field. The latter is much simpler, more concise, and usually also a little faster.

Here’s an example in Python, illustrating that second approach via “unpack”:

import struct

buf = struct.pack('BBBB', 23, 125, 22, 2)
(x,) = struct.unpack('<l', buf)

v = (x >> 15) & 0xFFF
print(v)

And lastly, the same in Lua, another popular language:

require 'struct'
require 'bit'

buf = struct.pack('BBBB', 23, 125, 22, 2)
x = struct.unpack('<I4', buf)

v = bit.band(bit.rshift(x, 15), 0xFFF)
print(v)

So there you go, lots of ways to extract useful data from the RF12demo sketch output!

Decoding bit fields

In Software on Sep 5, 2013 at 00:01

Prompted by a recent discussion on the forum, and because it’s a recurring theme, I thought I’d go a bit (heh…) into how bit fields work.

One common use-case in Jee-land, is with packets coming in from a roomNode sketch, which are reported by the RF12demo sketch on USB serial as something like:

OK 3 23 125 22 2

The roomNode sketch tightly packs its results into a packet using this C structure:

struct {
    byte light;     // light sensor: 0..255
    byte moved :1;  // motion detector: 0..1
    byte humi  :7;  // humidity: 0..100
    int temp   :10; // temperature: -500..+500 (tenths)
    byte lobat :1;  // supply voltage dropped under 3.1V: 0..1
} payload;

So how do you get the values back out, given just those 4 bytes (plus a header byte)?

To illustrate, I’ll use a slightly more complicated example. But first, some diagrams:

bytes1

bytes2

That’s 4 consecutive bytes in memory, and two ways a 32-bit long int can be mapped to it:

  • little-endian (LE) means that the first byte (at #00) points to the little end of the int
  • big-endian (BE) means that the first byte (at #00) points to the big end of the int

Obvious, right? On x86 machines, and also the ATmega, everything is stored in LE format, so that’s what I’ll focus on from now on (PIC and ARM are also LE, but MIPS is BE).

Let’s use this example, which is a little more interesting (and harder) to figure out:

struct {
    unsigned long before :15;
    unsigned long value :12;  // the value we're interested in
    unsigned long after :3;
} payload;

That’s 30 bits of payload. On an ATmega the compiler will use the layout shown on the left:

bytes3

But here’s where things start getting hairy. If you had used the definition below instead, then you’d have gotten a different packed structure, as shown above on the right:

struct {
    unsigned int before :15;
    unsigned int value :12;  // the value we're interested in
    unsigned int after :3;
} payload;

Subtle difference, eh? The reason for this is that the declared size (long vs int) determines the unit of memory in which the compiler tries to squeeze the bit field. Since an int is 16 bits, and 15 bits were already assigned to the “before” field, the next “value” field won’t fit so the compiler will restart on a fresh byte boundary! IOW, be sure to understand what the compiler does before trying to decode stuff. One way to figure out what’s going on, is to set the values to some distinctive bit patterns and see what comes out as bytes:

payload.before = 0b1;  // bit 0 set
payload.value = 0b11;  // bits 0 and 1 set
payload.after = 0b111; // bits 0, 1, and 2 set

Tomorrow, I’ll show a couple of ways to extract this bit field.

Back pressure

In Software on Sep 4, 2013 at 00:01

No, not the medical term, and not this carambolage either (though it’s related):

52-car-pile-up

What I’m talking about is the software kind: back pressure is what you need in some cases to avoid running into resource limits. There are many scenario’s where this can be an issue:

  • In the case of highway pile-ups, the solution is to add road signalling, so drivers can be warned to slow down ahead of time – before that decision is taken from them…
  • Sending out more data than a receiver can handle – this is why traditional serial links had “handshake” mechanisms, either software (XON/XOFF) or hardware (CTS/RTS).
  • On the I2C bus, hardware-based clock stretching is often supported as mechanism for a slave to slow down the master, i.e. an explicit form of back pressure.
  • Sending out more packets than (some node in) a network can handle – which is why backpressure routing was invented.
  • Writing out more data to the disk than the driver/controller can handle – in this case, the OS kernel will step in and suspend your process until things calm down again.
  • Bringing a web server to its knees when it gets more requests than it can handle – crummy sites often suffer from this, unless the front end is clever enough to reject requests at some point, instead of trying to queue more and more work which it can’t possibly ever deal with.
  • Filling up memory in some dynamic programming languages, where the garbage collector can’t keep up and fails to release unused memory fast enough (assuming there is any memory to release, that is).

That last one is the one that bit me recently, as I was trying to reprocess my 5 years of data from JeeMon and HouseMon, to feed it into the new LevelDB storage system. The problem arises, because so much in Node.js is asynchronous, i.e. you can send off a value to another part of the app, and the call will return immediately. In a heavy loop, it’s easy to send off so much data that the callee never gets a chance to process it all.

I knew that this sort of processing would be hard in HouseMon, even for a modern laptop with oodles of CPU power and gigabytes of RAM. And even though it should all run on a Raspberry Pi eventually, I didn’t mind if reprocessing one year of log files would take, say, an entire day. The idea being that you only need to do this once, and perhaps repeat it when there is a major change in the main code.

But it went much worse than I expected: after force-feeding about 4 months of logs (a few hundred thousand converted data readings), the Node.js process RAM consumption was about 1.5 GB, and Node.js was frantically running its garbage collector to try and deal with the situation. At that point, all processing stopped with a single CPU thread stuck at 100%, and things locked up so hard that Node.js didn’t even respond to a CTRL-C interrupt.

Now 1.5 GB is a known limit in the V8 engine used in Node.js, and to be honest it really is more than enough for the purposes and contexts for which I’m using it in HouseMon. The problem is not more memory, the problem is that it’s filling up. I haven’t solved this problem yet, but it’s clear that some sort of back pressure mechanism is needed here – well… either that, or there’s some nasty memory leak in my code (not unlikely, actually).

Note that there are elegant solutions to this problem. One of them is to stop having a producer push data and calls down a processing pipeline, and switch to a design where the consumer pulls data when it is ready for it. This was in fact one of the recent big changes in Node.js 0.10, with its streams2 redesign.

Even on an embedded system, back pressure may cause trouble in software. This is why there is an rf12_canSend() call in the RF12 driver: because of that, you cannot ever feed it more packets than the (relatively slow) wireless RFM12B module can handle.

Soooo… in theory, back pressure is always needed when you have some constraint further down the processing pipeline. In practice, this issue can be ignored most of the time due to the slack present in most systems: if we send out at most a few messages per minute, as is common with home monitoring and automation, then it is extremely unlikely that any part of the system will ever get into any sort of overload. Here, back pressure can be ignored.

The Knapsack problem

In Musings on Jun 20, 2013 at 00:01

This is related to the power consumption puzzle I ran into the other day.

The Knapsack problem is a mathematical problem, which is perhaps most easily described using this image from Wikipedia:

500px-Knapsack.svg

Given a known sum, but without information about which items were used to reach that sum (weight in this example), can we deduce what went in, or at least reason about it and perhaps exclude certain items?

This is a one-dimensional (constraint) knapsack problem, i.e. we’re only dealing with one dimension (weight). Now you might be wondering what this has to do with anything on this weblog… did I lose my marbles?

The relevance here, is that reasoning about this is very much like reasoning about which power consumers are involved when you measure the total house consumption: which devices and appliances are involved at any point in time, given a power consumption graph such as this one?

Screen Shot 2013-06-17 at 09.35.49

By now, I know for example that the blip around 7:00 is the kitchen fridge:

Screen Shot 2013-06-17 at 09.35.49

A sharp start pulse as the compressor motor powers up, followed by a relatively constant period of about 80 W power consumption. Knowing this means I could subtract the pattern from the total, leaving me with a cleaned up graph which is hopefully easier to reason about w.r.t. other power consumers. Such as that ≈ 100 W power segment from 6:35 to 7:45. A lamp? Unlikely, since it’s already light outside this time of year.

Figuring out which power consumers are active would be very useful. It would let us put a price on each appliance, and it would let us track potential degradation over time, or things like a fridge or freezer needing to be cleaned due to accumulated ice, or how its average power consumption relates to room temperature, for example. It would also let me figure out which lamps should be replaced – not all lamps are energy efficient around here, but I don’t want to discard lamps which are only used a few minutes a day anyway.

Obviously, putting an energy meter next to each appliance would solve it all. But that’s not very economical and also a bit wasteful (those meters are going to draw some current too, you know). Besides, not all consumers are easily isolated from the rest of the house.

My thinking was that perhaps we could use other sensors as circumstantial evidence. If a lamp goes on, the light level must go up as well, right? And if a fridge or washing machine turns on, it’ll start humming and generating heat in the back.

The other helpful bit of information, is that some devices have clear usage patterns. I can recognise our dishwasher from its very distinctive double power usage pulse. And the kitchen boiler from it’s known 3-minute 2000 W power drain. Over time, one could accumulate the variance and shape of several of these bigger consumers. Even normal lamps have a plain rectangular shape with fairly precise power consumption pattern.

Maybe prolonged data collection plus a few well thought-out sensors around the house can give just enough information to help solve the knapsack problem?

Fast I/O on a Raspberry Pi

In Hardware, Software, Linux on Jun 15, 2013 at 00:01

To follow up on yesterday’s post about WiringPi and how it was used to display an image on a colour LCD screen attached to the RPi via its GPIO pins.

A quick calculation indicated that the WiringPi library is able to toggle the GPIO pins millions of times per second. Now this might not seem so special if you are used to an ATmega328 on a JeeNode or other Arduino-like board, but actually it is

Keep in mind that the RPi code is running under Linux, a multi-tasking 32-bit operating system which not only does all sorts of things at (nearly) the same time, but which also has very solid memory and application protection mechanisms in place:

  • each application runs in its own address space
  • the time each application runs is controlled by Linux, not the app itself
  • applications can crash, but they cannot cause the entire system to crash
  • dozens, hundreds, even thousands of applications can run next to each other
  • when memory is low, Linux can swap some parts to disk (not recommended with SD’s)

So while the LCD code works well, and much faster than one might expect for all the GPIO toggling it’s doing, this isn’t such a trivial outcome. The simplest way to deal with the RPi’s GPIO pins, is to manipulate the files and directories under the /sys/class/gpio/ “device tree”. This is incredibly useful because you can even manipulate it via plain shell script, using nothing but the “echo”, “cat”, and “ls” commands. Part of the convenience is the fact that these manipulations can take place entirely in ASCII, e.g. writing the string “1” or “0” to set/reset GPIO pins.

But the convenience of the /sys/class/gpio/ virtual file system access comes at a price: it’s not very fast. There is too much involved to deal with individual GPIO pins as files!

WiringPi uses a different approach, called “memory mapped files”.

Is it fast? You bet. I timed the processing time of this snippet of C code:

int i;
for (i = 0; i < 100000000; ++i) {
  digitalWrite(1, 1);
  digitalWrite(1, 0);
}
return 0;

Here’s the result:

    real    0m19.592s
    user    0m19.490s
    sys     0m0.030s

That’s over 5 million pulses (two toggles) per second.

This magic is possible because the I/O address space (which is normally completely inaccessible to user programs) has been mapped into a section of the user program address space. This means that there is no operating system call overhead involved in toggling I/O bits. The mapping is probably virtualised, i.e. the kernel will kick in on each access, but this is an interrupt straight into protected kernel code, so overhead is minimal.

How minimal? Well, it takes less than 90 ns per call to digitalWrite(), so even when running at the RPi’s maximum 1 GHz rate, that’s less than 90 machine cycles.

Note how the RPi can almost toggle an I/O pin as fast as an ATmega running at 16 MHz. But the devil is in the details: “can” is the keyword here. Being a multi-tasking operating system, there is no guarantee whatsoever that the GPIO pin will always toggle at this rate. You may see occasional hick-ups in the order of milliseconds, in fact.

Is the RPi fast? Definitely! Is it guaranteed to be fast at all times? Nope!

Accurate timing with jittery packets

In Hardware on Jun 11, 2013 at 00:01

A few days back I mentioned the SYNC message mechanism in CANopen, and how it can have about 150 µs jitter when sent out on a CAN bus running at 1 Mbit/s.

Here’s the basic idea of sending out SYNC packets at a pre-defined fixed rate:

JC's Grid, page 78

Once you have such a mechanism, you can do things like send out commands which all devices need to perform on the next sync for example. Nice for controlling the X, Y, and Z axis of of a CNC machine with the accuracy needed to make them all move exactly in sync.

The SYNC message has a very very high priority on the CAN bus, but even so, it can be delayed if it wants to send while some other transmission is taking place. So even though it will probably be first one to go out, it still has to wait for the bus to become available:

JC's Grid, page 78 copy

As a result, that SYNC pulse we wanted to arrive on time T ended up coming in at T + X. With X being under 200 µs when the bus is running at 1 MHz, but still.

Can we do better?

Yes, and no. We can’t avoid the jitter in that SYNC packet. But while the master sending out that SYNC packet cannot time it more precisely, it does know when the packet actually went out. So it does in fact know what T + X ended up being for that transmission.

And here’s the clever bit: the master can now send out a second (slightly lower priority) packet with a time stamp in it, i.e. containing that “T + X” value, down to the microsecond in fact. The arrival time of that time stamp packet is not so important. What matters is that each slave tracks exactly when the sync came in, and then uses the time stamp to adjust for any difference or drift that may be taking place.

This won’t help make the SYNC pulses come in more accurately, but it will let each slave know exactly what the time of that pulse was, and hence how accurately their own clocks are running. With this mechanism, all clocks can safely be assumed to run within 1..2 µs of each other, and therefore all future commands can now be tied to specific times which are known to be matched up between all slaves with two orders of magnitude better accuracy.

Ingenious, that CAN bus, eh?

Blinking in real-time

In AVR, Software on May 24, 2013 at 00:01

As promised yesterday, here’s an example sketch which uses the ChibiOS RTOS to create a separate task for keeping an LED blinking at 2 Hz, no matter what else the code is doing:

#include <ChibiOS_AVR.h>

static WORKING_AREA(waThread1, 50);

void Thread1 () {
  const uint8_t LED_PIN = 9;
  pinMode(LED_PIN, OUTPUT);
  
  while (1) {
    digitalWrite(LED_PIN, LOW);
    chThdSleepMilliseconds(100);
    digitalWrite(LED_PIN, HIGH);
    chThdSleepMilliseconds(400);
  }
}

void setup () {
  chBegin(mainThread);
}

void mainThread () {
  chThdCreateStatic(waThread1, sizeof (waThread1),
                    NORMALPRIO + 2, (tfunc_t) Thread1, 0);
  while (true)
    loop();
}

void loop () {
  delay(1000);
}

There are several things to note about this approach:

  • there’s now a “Thread1” task, which does all the LED blinking, even the LED pin setup
  • each task needs a working area for its stack, this will consume a bit of memory
  • calls to delay() are forbidden inside threads, they need to play nice and go to sleep
  • only a few changes are needed, compared to the original setup() and loop() code
  • chBegin() is what starts the RTOS going, and mainThread() takes over control
  • to keep things similar to what Arduino does, I decided to call loop() when idling

Note that inside loop() there is a call to delay(), but that’s ok: at some point, the RTOS runs out of other things to do, so we might as well make the main thread similar to what the Arduino does. There is also an idle task – it runs (but does nothing) whenever no other tasks are asking for processor time.

Note that despite the delay call, the LED still blinks in the proper rate. You’re looking at a real multitasking “kernel” running inside the ATmega328 here, and it’s preemptive, which simply means that the RTOS can (and will) decide to break off any current activity, if there is something more important that needs to be done first. This includes suddenly disrupting that delay() call, and letting Thread1 run to keep the LEDs blinking.

In case you’re wondering: this compiles to 3,120 bytes of code – ChibiOS is really tiny.

Stay tuned for details on how to get this working in your projects… it’s very easy!

It’s time for real-time

In Software on May 23, 2013 at 00:01

For some time, I’ve been doodling around with various open-source Real-time operating system (RTOS) options out there. There are quite a few out there to get lost in…

But first, what is an RTOS, and why would you want one?

The RTOS is code which can manage multiple tasks in a computer. You can see what it does by considering what sort of code you’d write if you wanted to periodically read out some sensors, not necessarily all at the same time or equally often. Then, perhaps you want to respond to external events such as a button press of a PIR sensor firing, and let’s also try and report this on the serial port and throw in a command-line configuration interface on that same serial port…

Oh, and in between, let’s go into a low-power mode to save energy.

Such code can be written without RTOS, in fact that’s what I did with a (simpler) example for the roomNode sketch. But it gets tricky, and everything can become a huge tangle of variables, loops, conditions, and before you know it … you end up with spaghetti!

In short, the problem is blocking code – when you write something like this, for example:


void setup () {}

void loop () {
  digitalWrite(LED, HIGH);
  delay(100);
  digitalWrite(LED, LOW);
  delay(400);
}

The delay() calls will put the processor into a busy loop for as long as needed to make the requested number of milliseconds pass. And while this is the case, nothing else can be done by the processor, other than handling hardware interrupts (such as timer ticks).

What if you wanted to respond to button presses? Or make a second LED blink at a different rate at the same time? Or respond to commands on the serial port?

This is why I added a MilliTimer class to JeeLib early on. Let’s rewrite the code:


MilliTimer ledTimer;
bool ledOn;;

void setup () {
  ledTimer.set(1); // start the timer
}

void loop () {
  if (ledTimer.poll()) {
    if (ledOn) {
      digitalWrite(LED, LOW);
      ledTimer.set(400);
    } else {
      digitalWrite(LED, HIGH);
      ledTimer.set(100);
    }
    ledOn = ! ledOn;
  }
  // anything ...
}

It’s a bit more code, but the point is that this implementation is no longer blocking: instead of stopping on a delay() call, we now track the progress of time through the MilliTimer, we keep track of the LED state, and we adjust the time to wait for the next change.

As a result, the comment line at the end gets “executed” all the time, and this is where we can now perform other tasks while the LED is blinking in the background, so to speak.

You can get a lot done this way, but things do tend to become more complicated. The simple flow of each separate activity starts to become a mix of convoluted flows.

With a RTOS, you can create several tasks which appear to all run in parallel. You don’t call delay(), but you tell the RTOS to suspend your task for a certain amount of time (or until a certain event happens, which is the real magic sauce of RTOS’es, actually).

So in pseudo code, we can now write our app as:

  TASK 1:
    turn LED on
    wait 100 ms
    turn LED off
    wait 400 ms
    repeat

  MAIN:
    start task 1
    do other stuff (including starting more tasks)

All the logic related to making the LED blink has been moved “out of the way”.

Tomorrow I’ll expand on this, using an RTOS which works fine in the Arduino IDE.

Sharing Node ID’s

In Software on Mar 30, 2013 at 00:01

The RF12 driver has a 5-bit slot in its header byte to identify either the sender or the destination of a packet. That translates to 32 distinct values, of which node ID “0” and “31” are special (for OOK and catch-all use, respectively).

So within a netgroup (of which there can be over 250), you get at most 30 different nodes to use. Multiple netgroups are available to extend this, of course, but then you need to set up one central node “listener” for each netgroup, as you can’t listen to multiple netgroups with a single radio.

I’m using JeeNodes all over the place here at JeeLabs, and several of them have been in use for several years now, running on their own node ID. In fact, many have never been reflashed / updated since the day they were put into operation. So node ID’s are starting to become a bit scarce here…

For testing purposes, I use a different netgroup, which nicely keeps the packets out of the central HouseMon setup used for, eh… production. Very convenient.

But with the new JeeNode Micro, I actually would like to get lots more nodes going permanently. Especially if the coin cell version can be made to run for a long time, then it would be great to use them as door sensors, lots more room nodes, and also stuff outside the house, such as the mailbox, humidity sensors, and who knows what else…

Now the nice thing about send-only nodes such as door sensors, is that it is actually possible to re-use the same node ID. All we need is a “secondary ID” inside the packet.

I’ve started doing this with the radioBlip2 nodes I’m using to run my battery lifetime tests:

Screen Shot 2013-03-26 at 10.11.14

(BATT-0 is this unit, BATT-1 is a JNµ w/ coin cell, BATT-2 is a JNµ w/ booster on 1x AA)

This is very easy to do: add the extra unique secondary ID in the payload and have the decoder pick up that byte as a way to name the node as “BATT-<N>”. In this case, it’s even compatible with the original radioBlip node which does not have the ID + battery levels.

Here is the payload structure from radioBlip2, with secondary ID and two battery levels:

struct {
  long ping;      // 32-bit counter
  byte id :7;     // identity, should be different for each node
  byte boost :1;  // whether compiled for boost chip or not
  byte vcc1;      // VCC before transmit, 1.0V = 0 .. 6.0V = 250
  byte vcc2;      // battery voltage (BOOST=1), or VCC after transmit (BOOST=0)
} payload;

And here’s the relevant code snippet from the radioBlip.coffee decoder in HouseMon:

  decode: (raw, cb) ->
    count = raw.readUInt32LE(1)
    result =
      tag: 'BATT-0'
      ping: count
      age: count / (86400 / 64) | 0
    if raw.length >= 8
      result.tag = "BATT-#{raw[5]&0x7F}"
      result.vpre = 50 + raw[6]
      if raw[5] & 0x80
        # if high bit of id is set, this is a boost node reporting its battery
        # this is ratiometric (proportional) w.r.t. the "vpre" just measured
        result.vbatt = result.vpre * raw[7] / 255 | 0
      else
        # in the non-boost case, the second value is vcc after last transmit
        # this is always set, except in the first transmission after power-up
        result.vpost = 50 + raw[7] if raw[7]
    cb result

In case you’re wondering: the battery voltage is encoded as 20 mV steps above 1.0V, allowing a single byte to represent values from 1.0 to 6.1V (drat, it’ll wrap when the boost version drops under 1.0V, oh well…).

Note that this trick works fine for send-only nodes, but it would need some additional logic in nodes which expect an ACK, since the ACK will be addressed to all nodes with the same ID. In many cases, that won’t matter, since presumably all the other nodes will be sound asleep, but for precise single-node ACKs, the reply will also need to include the extra secondary ID, and all the nodes will then have to check and compare it.

So now it’s possible to re-use a node ID for an entire “family” of nodes:

  • over 250 different netgroups are available for use
  • each netgroup can have a maximum of 30 different node ID’s
  • each node ID can be shared with over 250 send-only nodes

That’s over 1.8 million nodes supported by the RF12 driver. More than enough, I’d say :)

JeeNode Micro breakout

In AVR, Hardware on Mar 24, 2013 at 00:01

While messing around with a bunch of JeeNode Micro’s here, I decided to create a little convenience breakout board for it:

DSC_4399

Sooo many ways to mess up – it definitely helps to avoid silly mistakes! This particular board doesn’t do that much, but it’s still a nice convenience when you’re drowning in cables and trying to debug some of the nastier bits, such as the Flash Board ISP programmer:

DSC_4406

The above setup was the result of what turned out to be a wild goose chase – stay tuned…

Processing P1 data

In Software on Jan 3, 2013 at 00:01

Last post in a series of three (previous posts here and here).

The decoder for this data, in CoffeeScript, is as follows:

Screen Shot 2012-12-31 at 15.28.22

Note that the API of these decoders is still changing. They are now completely independent little snippets of code which do only one thing – no assumptions on where the data comes from, or what is to be done with the decoded results. Each decoder takes the data, creates an object with decoded fields, and finishes by calling the supplied “cb” callback function.

Here is some sample output, including a bit of debugging:

Screen Shot 2012-12-31 at 13.43.10

As you can see, this example packet used 19 bytes to encode 10 values plus a format code.

Explanation of the values shown:

  • usew is 0: no power is being drawn from the grid
  • genw is 6: power feed into the grid is 10 x 6, i.e. ≈ 60W
  • p1 + p3 is current total consumption: 8 + 191, i.e. 199W
  • p2 is current solar power output: 258W

With a rate of about 50 Kbit/sec using the standard RF12 driver settings, and with some 10 bytes of packet header + footer overhead, this translates to (19 + 10) * 8 * 20 = 4,640 µS “on-air” time once every 10 seconds, i.e. still under 0.05 % bandwidth utilisation. Fine.

This is weblog post #1200 – onwards! :)

Encoding P1 data

In Software on Jan 2, 2013 at 00:01

After yesterday’s hardware hookup, the next step is to set up the proper software for this.

There are a number of design and implementation decisions involved:

  • how to format the data as a wireless packet
  • how to generate and send out that packet on a JeeNode
  • how to pick up and decode the data in Node.js

Let’s start with the packet: there will be one every 10 seconds, and it’s best to keep the packets as small as possible. I do not want to send out differences, like I did with the otRelay setup, since there’s really not that much to send. But I also don’t want to put too many decisions into the JeeNode sketch, in case things change at some point in the future.

The packet format I came up with is one which I’d like to re-use for some of the future nodes here at JeeLabs, since it’s a fairly convenient and general-purpose format:

       (format) (longvalue-1) (longvalue-2) ...

Yes, that’s right: longs!

The reason for this is that the electricity meter counters are in Watt-hour, and will immediately exceed what can be stored as 16-bit ints. And I really do want to send out the full values, also for gas consumption, which is in 1000th of a m3, i.e. in liters.

But for these P1 data packets that would be a format code + 8 longs, i.e. at least 25 bytes of data, which seems a bit wasteful. Especially since not all values need longs.

The solution is to encode each value as a variable-length integer, using only as many bytes as necessary to represent each value. The way this is done is to stored 7 bits of the value in each byte, reserving the top-most bit for a flag which is only set on the last byte.

With this encoding, 0 is sent as 0x80, 127 is sent as 0xFF, whereas 128 is sent as two bytes 0x01 + 0x80, 129 is sent as 0x01 + 0x81, 1024 as 0x08 + 0x80, etc.

Ints up to 7 bits take 1 byte, up to 14 take 2, … up to 28 take 4, and 32 will take 5 bytes.

It’s relatively simple to implement this on an ATmega, using a bit of recursion:

Screen Shot 2012-12-31 at 13.47.01

The full code of this p1scanner.ino sketch is now in JeeLib on GitHub.

Tomorrow, I’ll finish off with the code used on the receiving end and some results.

Extracting data from P1 packets

In Software on Dec 1, 2012 at 00:01

Ok, now that I have serial data from the P1 port with electricity and gas consumption readings, I would like to do something with it – like sending it out over wireless. The plan is to extend the homePower code in the node which is already collecting pulse data. But let’s not move too fast here – I don’t want to disrupt a running setup before it’s necessary.

So the first task ahead is to scan / parse those incoming packets shown yesterday.

There are several sketches and examples floating around the web on how to do this, but I thought it might be interesting to add a “minimalistic sauce” to the mix. The point is that an ATmega (let alone an ATtiny) is very ill-suited to string parsing, due to its severely limited memory. These packets consist of several hundreds of bytes of text, and if you want to do anything else alongside this parsing, then it’s frighteningly easy to run out of RAM.

So let’s tackle this from a somewhat different angle: what is the minimal processing we could apply to the incoming characters to extract the interesting values from them? Do we really have to collect each line and then apply string processing to it, followed by some text-to-number conversion?

This is the sketch I came up with (“Look, ma! No string processing!”):

Screen Shot 2012 11 29 at 20 44 16

This is a complete sketch, with yesterday’s test data built right into it. You’re looking at a scanner implemented as a hand-made Finite State Machine. The first quirk is that the “state” is spread out over three global variables. The second twist is that the above logic ignores everything it doesn’t care about.

Here’s what comes out, see if you can unravel the logic (see yesterday’s post for the data):

Screen Shot 2012 11 29 at 20 44 49

Yep – that’s just about what I need. This scanner requires no intermediate buffer (just 7 bytes of variable storage) and also very little code. The numeric type codes correspond to different parameters, each with a certain numeric value (I don’t care at this point what they mean). Some values have 8 digits precision, so I’m using a 32-bit int for conversion.

This will easily fit, even in an ATtiny. The moral of this story is: when processing data – even textual data – you don’t always have to think in terms of strings and parsing. Although regular expressions are probably the easiest way to parse such data, most 8-bit microcontrollers simply don’t have the memory for such “elaborate” tools. So there’s room for getting a bit more creative. There’s always a time to ask: can it be done simpler?

PS. I had a lot of fun come up with this approach. Minimalism is an addictive game.

Push-scanning with coroutines

In Software on Oct 18, 2012 at 00:01

The “benstream.lua” decoder I mentioned yesterday is based on coroutines. Since probably not everyone is familiar with them (they do not exist in languages such as C and C++), it seems worthwhile to go over this topic briefly.

Let’s start with this completely self-contained example, written in Lua:

Screen Shot 2012 10 15 at 01 23 15

Here’s the output generated from the test code at the end:

  (string) abcde
  (string) 123
  (number) 12345
  (boolean) false
  (number) 987
  (boolean) false
  (number) 654
  (table) 
  (number) 321
  (table) 
  (number) 99999999
  (boolean) true
  (string) one
  (number) 11
  (string) two
  (number) 22
  (table) 
  (string) bye!

There is some weird stuff in here, notably how the list start, dict start, and list/dict end are returned as “false”, “true”, and the empty list ({} in Lua), respectively. The only reason for this is that these three values are distinguishable from all the other possible return values, i.e. strings and numbers.

But the main point of this demo is to show what coroutines are and what they do. If you’re not familiar with them, they do take getting used to… but as you’ll see, it’s no big deal.

First, think about the task: we’re feeding characters into a function, and expect it to track where it is and return complete “tokens” once enough data has been fed through it. If it’s not ready, the “cwrap” function will return nil.

The trouble is that such code cannot be in control: it can’t “pull” more characters from the input when it needs them. Instead, it has to wait until it gets called again, somehow figure out where it was the last time, and then deal with that next character. For an input sequence such as “5:abcde”, we need to track being in the leading count, then skip the colon, then track the actual data bytes as they come in. In the C code I added to the EmBencode library recently, I had to implement a Finite State Machine to keep track of things. It’s not that hard, but it feels backwards. It’s like constantly being called away during a meeting, and having to pick up the discussion again each time you come back :)

Now look at the above code. It performs the same task as the C code, but it’s written as if it was in control! – i.e. we’re calling nextChar() to get the next character, looping and waiting as needed, and yet the coroutine is not actually pulling any data. As you can see, the test code at the end is feeding characters one by one – the normal way to “push” data, not “pull” it from the parsing loop. How is this possible?

The magic happens in nextchar(), which calls coroutine.yield() with an argument.

Yield does two things:

  • it saves the complete state of execution (i.e. call stack)
  • it causes the coroutine to return the argument given to it

It’s like popping the call stack, with one essential difference: we’re not throwing the stack contents away, we’re just moving it aside.

Calling a coroutine does almost exactly the opposite:

  • it restores the saved state of execution
  • it causes coroutine.yield() to return with the argument of the call

These are not full-blown continuations, as in Lisp, but sort of halfway there. There is an asymmetry in that there is a point in the code which creates and starts a coroutine, but from then on, these almost act like they are calling each other!

And that’s exactly what turns a “pull” scanner into a “push” scanner: the code thinks it is calling nextChar() to immediately obtain the next input character, but the system sneakily puts the code to sleep, and resumes the original caller. When the original caller is ready to push another character, the system again sneakily changes the perspective, resuming the scanner with the new data, as if it had never stopped.

This is in fact a pretty efficient way to perform multiple tasks. It’s not exactly multi-tasking, because the caller and the coroutine need to alternate calls to each other in lock-step for all this trickery to work, but the effect is almost the same.

The only confusing bit perhaps, is that argument to nextChar() – what does it do?

Well, this is the way to communicate results back to the caller. Every time the scanner calls nextChar(), it supplies it with the last token it found (or nil). The conversation goes like this: “I’m done, I’ve got this result for you, now please give me the next character”. If you think of “coroutine.yield” as sort of a synonym for “return”, then it probably looks a lot more familiar already – the difference being that we’re not returning from the caller, but from the entire coroutine context.

The beauty of coroutines is that you can hide the magic so that most of the code never knows it is running as part of a coroutine. It just does its thing, and “asks” for more by calling a function such as nextChar(), expecting to get a result as soon as that returns.

Which it does, but only after having been put under narcosis for a while. Coroutines are a very neat trick, that can simplify the code in languages such a Lua, Python, and Tcl.

Time-controlled transmissions

In Hardware, Software on Jun 27, 2011 at 00:01

Receiving data on remote battery-powered JeeNodes is a bit of a dilemma: you can’t just leave the receiver on, because it’ll quickly drain the battery. Compare this to sending, where nodes can easily run for months on end.

The difference is that with a remote node initiating all transmissions, it really only has to enable the RFM12B wireless module very briefly. With a 12..23 mA current drain, brevity matters!

So how do you get data from a central node, to remote and power-starved nodes?

One way is to poll: let the remote node ask for data, and return that data in an ACK packet as soon as asked. This will indeed work, and is probably the easiest way to implement that return data path towards remote nodes. One drawback is that if all nodes start polling a lot, the band may become overloaded and there will be more collisions.

Another approach is to agree on when to communicate. So now, the receiver again “polls” the airwaves, but now it tracks real time and knows when transmissions for it might occur. This is more complex, because it requires the transmitter(s) and receiver(s) to be in sync, and to stay in sync over time.

Note that both approaches imply a difficult trade-off between power consumption and responsiveness. Maximum responsiveness requires leaving the receiver on at all times – which just isn’t an option. But suppose we were able to stay in sync within 1 ms on both sides. The receiver would then only have to start 1 ms early, and wait up to 2 ms for a packet to come in. If it does this once a second, then it would still be on just 0.2% of the time, i.e. a 500-fold power saving.

Let’s try this out. Here’s the timedRecv.pde sketch (now in the RF12 library):

Screen Shot 2011 06 24 at 20.54.37

It listens for incoming packets, then goes into low-power mode for 2 seconds, then it starts listening again. The essential trick is to report two values as ACK to the sender: the time the receiver started listening (relative to that receiver’s notion of time), and the number of milliseconds it had to wait for the packet to arrive.

There’s no actual data processing – I’m just interested in the syncing bit here.

The sender side is in the timedSend.pde sketch:

Screen Shot 2011 06 24 at 20.58.17

This one tries to send a new packet each time the receiver is listening. If done right, the receiver will wake up at just the right time, and then go to sleep again. The ACK we get back in the sender contains valuable information, because it lets us see how accurate our timing was.

Here’s what I get when sending a new packet exactly 2,096 milliseconds after an ACK comes in:

Screen Shot 2011 06 24 at 20.39.33

Not bad, one ack for each packet sent out, and the receiver only has to wait about 6 milliseconds with its wireless receiver powered up. I’ve let it run for 15 minutes, and it didn’t miss a beat.

For some reason, send times need to be ≈ 2.1 s instead of the expected 2.0 s.

Now let’s try with 2,095 milliseconds:

Screen Shot 2011 06 24 at 20.37.17

Something strange is happening: there’s consistently 1 missed packet for each 5 successful ones!

My hunch is that there’s an interaction with the watchdog timer on the receiver end, which is used to power down for 2000 milliseconds. I suspect that when you ask it to run for 16 ms (the miminum), then it won’t actually synchronize its timer, but will fit the request into what is essentially a free-running counter.

There may also be some unforeseen interaction due to the ACKs which get sent back, i.e. there’s a complete round-trip involved in the above mechanism

Hmm… this will need further analysis.

I’m using a standard JeeNode on the receiving end, i.e. running at 16 MHz with a ceramic resonator (specs say it’s 0.5% accurate). On the sender side, where timing is much more important, I’m using a JeeLink which conveniently has an accurate 16 MHz crystal (specs say it’s 10 ppm, i.e. 0.001% accurate).

But still – even this simple example illustrates how a remote can receive data while keeping its wireless module off more than 99% of the time.

Avoiding memory use

In Hardware on May 25, 2011 at 00:01

On an ATmega328, memory is a scarce resource, as I’ve mentioned recently. Flash memory is usually OK, I’ve yet to run into the 30..32 Kb limit on code. But the crunch comes in all other types of memory – especially RAM.

This becomes apparent with the Graphics Board, which needs a 1 Kb buffer for its display, and with the EtherCard, which can often not even fit a full 1.5 Kb Ethernet packet in RAM.

The Graphics Board limitation is not too painful, because there’s the “JeePU“, which can off-load the graphics display to another JeeNode via a wireless connection. Something similar could probably be done based on I2C or a serial interface.

But the EtherCard limitation is awkward, because this essentially prevents us from building more meaningful web interfaces, and richer web server functionality, for example.

The irony is that there’s plenty of unused RAM memory, just around the corner in this case: the ENC28J60 Ethernet controller chip has 8 Kb RAM, of which some 3.5 Kb could be used for further Ethernet packet buffers… if only the code were written differently!

In fact, we have to wonder why we need any RAM at all, given that the controller has so much of it.

The problem with the EtherCard library, is that it copies an entire received frame to RAM before use, and that it has to build up an entire frame in RAM to send it out.

I’d like to improve on that, but the question is how.

A first improvement is already in the EtherCard library: using strings in flash memory. There’s also basic string expansion, which you can see in action in this code, taken literally from the etherNode.pde example sketch:

Screen shot 2011 05 24 at 22 57 26

The $D’s get expanded to integer values, supplied as additional arguments to buf.emit_p(). This simplifies generating web pages with values (and strings) inserted, but it doesn’t address the issue that the entire web page is still being constructed in a RAM buffer.

Wouldn’t it be nice if we could do better than that, especially since the packet needs to end up inside the ENC28J60 controller anyway?

Two possibilities: 1) generate directly to the ENC28J60’s RAM, or 2) generate a description of the result, so that the real output data can be produced when needed.

For now, this is all just a mental exercise. It looks like option #1 could be implemented fairly easily. The benefit would be that only the MAC + IP header needs to stay in RAM, and that the payload would go directly into the controller chip. A huge RAM saving!

But that’s only a partial solution. The problem is that it assumes an entire TCP/IP response would fit in RAM. For simple cases, that would indeed be enough – but what if we want to send out a multi-packet file from some other storage, such as an external Memory Plug or an SD card?

The current EtherCard library definitely isn’t up to this task, but I’d like to think that one day it will be.

So option #2 might be a better way: instead of preparing a buffer with the data that needs to be sent, we prepare a set of instructions which describe what is needed to generate the buffer. This way, we could generate a huge “buffer” – larger than available RAM – and then produce individual packets as needed, i.e. as the TCP/IP session advances, packet by packet.

This way, we could have some large “file” in a Memory Plug, and its contents would be copied to the ENC28J60’s RAM on-demand, instead of all in advance.

It seems like a lot of contortions to get something more powerful going for the EtherCard, but this approach is in fact a very common one, called Zero Copy. With “big” computers, the copying is done with DMA and special-purpose hardware, but the principle is the same: don’t copy bytes around more than strictly needed. In our case, that means copying bytes from EEPROM to ENC28J60, without requiring large intermediate buffers.

Hm, maybe it’s time to create a “ZeroCopy” library, sort of a “software based generic DMA” between various types of memory… maybe even to/from external I/O devices such as a serial port or I2C device?

It could perhaps be modeled a bit like Tcl’s “channels” and “fcopy” command. Not sure yet… we’ll see.

RF bootstrap design

In Software on May 24, 2011 at 00:01

After some discussion on the forum, I’d like to present a draft design for an over-the-air bootstrap mechanism, IOW: being able to upload a sketch to a remote JeeNode over wireless.

Warning: there is no release date. It’ll be announced when I get it working (unless someone else gets there first). This is just to get some thoughts down, and have a first mental design to think about and shoot at.

The basic idea is that each remote node contacts a boot server after power up, or when requested to do so by the currently running sketch.

Each node has a built-in unique 2-byte remote ID, and is configured to contact a specific boot server (i.e. RF12 band, group, and node ID).

STEP 1

First we must find out what sketch should be running on this node. This is done by sending out a wireless packet to the boot server and waiting for a reply packet:

  • remote -> server: intial request w/ my remote ID
  • server -> remote: reply with 12 bytes of data

These 12 bytes are encrypted using a pre-shared secret key (PSK), which is unique for each node and known only to that node and the boot server. No one but the boot server can send a valid reply, and no one but the remote node can decode that reply properly.

The reply contains 6 values:

  1. remote ID
  2. sketch ID
  3. sketch length in bytes
  4. sketch checksum
  5. extra sketch check
  6. checksum over the above values 1..5

After decoding this info, the remote knows:

  • that the reply is valid and came from a trusted boot server
  • what sketch should be present in flash memory
  • how to verify that the stored sketch is complete and correct
  • how to verify the next upload, if we decide to start one

The remote has a sketch ID, length and checksum stored in EEPROM. If they match with the reply and the sketch in memory has the correct checksum, then we move forward to step 3.

If no reply comes in within a reasonable amount of time, we also jump to step 3.

STEP 2

Now we need to update the sketch in flash memory. We know the sketch ID to get, we know how to contact the boot server, and we know how to verify the sketch once it has been completely transferred to us.

So this is where most of the work happens: send out a request for some bytes, and wait for a reply containing those bytes – then rinse and repeat for all bytes:

  • remote -> server: request data for block X, sketch Y
  • server -> remote: reply with a check value (X ^ Y) and 64 bytes of data

The remote node gets data 64 bytes at a time, and burns them to flash memory. The process repeats until all data has been transferred. Timeouts and bad packets lead to repeated requests.

The last reply contains 0..63 bytes of data, indicating that it is the final packet. The remote node saves this to flash memory, and goes to step 3.

STEP 3

Now we have the proper sketch, unless something went wrong earlier.

The final step is to verify that the sketch in flash memory is correct, by calculating its checksum and comparing it with the value in EEPROM.

If the checksum is bad, we set a watchdog timer to reset us in a few seconds, and … power down. All our efforts were in vain, so we will retry later.

Else we have the proper sketch and it’s available in flash memory, so we leave bootstrap mode and launch it.

That’s all!

ROBUSTNESS

This scheme requires a working boot server. If none is found or in range, then the bootstrap will not find out about a new sketch to load, and will either launch the current sketch (if valid), or hit a reset and try booting again a few seconds later.

Not only do we need a working boot server, that server must also have an entry for our remote ID (and our PSK) to be able to generate a properly encrypted reply. The remote ID of a node can be recovered if lost, by resetting the node and listening for the first request it sends out.

If the sketch hangs, then the node will hang. But even then a hard reset or power cycle of the node will again start the boot sequence, and allows us to get a better sketch loaded into the node. The only drawback is that it needs a hard reset, which can’t be triggered remotely (unless the crashing sketch happens to trigger the reset, through the watchdog or otherwise).

Errors during reception lead to a failed checksum at the end, which then leads to a reset and a new boot loading attempt. There is no resume mechanism, so such a case does mean we have to fetch all the data blocks again.

SECURITY

This is the hard part. Nodes which end up running some arbitrary sketch have the potential to cause a lot of damage if they also control real devices (lights are fairly harmless, but thermostats and door locks aren’t!).

The first line of defense comes from the fact that it is the remote node which decides when to fetch an update. You can’t simply send packets and make remote nodes reflash themselves if they don’t want to.

You could interrupt AC mains and force a reset in mains-powered nodes, but I’m not going to address that. Nor am I going to address the case of physically grabbing hold of a node or the boot server and messing with it.

The entire protection is based on that initial reply packet, which tells each remote node what sketch it should be running. Only a boot server which knows the remote node’s PSK is able to send out a reply which the remote node will accept.

It seems to me that the actual sketch data need not be protected, since these packets are only sent out in response to requests from a remote node (which asks for a specific sketch ID). Bad packets of any kind will cause the final checksums to fail, and prevent such a sketch from ever being started.

As for packets flying around in a fully operational home network: that level of security is a completely separate issue. Sketches can implement whatever encryption they like, to secure day-to-day operation. In fact, the RF12 library includes an encryption mechanism based on XTEA for just that purpose – see this weblog post.

But for a bootstrap mechanism, which has to fit in 4 Kb including the entire RF12 wireless packet driver, we don’t have that luxury. Which is why I hope that the above will be enough to make it practical – and safe!

Saving RAM space

In AVR, Software on May 23, 2011 at 00:01

Yesterday’s post was about finding out how much free memory there is in an ATmega running your sketch.

The most common out-of-memory case is free RAM, which is where all the interesting stuff happens – not surprising, if you interpret “interesting” as “changing”, which by necessity has to happen mostly in RAM.

Let’s go into some ways to reduce RAM usage. As mentioned yesterday, C strings are often a major cause of RAM bloat. Here’s part of a simple sketch to report which one of five buttons have been pressed:

Screen shot 2011 05 22 at 21 34 00

Let’s assume that the checkButton() returns a value from 1 to 5 when a button press has been detected, and 0 otherwise. The problem? We’ve just used about 150 bytes of RAM…

Given how simple and regular this example is, here’s an easy way to improve on it:

Screen shot 2011 05 22 at 21 34 45

Total RAM usage will drop to just over 50 bytes.

Here’s another way to do it:

Screen shot 2011 05 22 at 21 42 43

This one uses 42 bytes for the data, and 26 bytes for the remaining two strings, i.e. total 68 bytes. I’ve included this example because it illustrates a more data-driven approach, but it leads to some waste because the colors array requires a fixed amount of 6×7 character space.

Here’s a variation of that, which is more idiomatic in C:

Screen shot 2011 05 22 at 21 46 14

It differs in a subtle but important detail: the array is now an array of pointers to string constants.

Estimating RAM use is slightly more involved: 1+4+6+5+7+7 bytes for the strings (including the zero byte at the end of each one) = 30 bytes, PLUS 12 bytes for the pointer array (6 pointers, each 2 bytes). That’s still 42 bytes, so no gain compared to the previous fixed-size array.

Using standard C/C++, that’s about all you can do. And it still wastes some 40..70 bytes of RAM. This may not sound like much, but keep in mind that the same will happen everywhere you use a string in your code. C strings are painfully awkward for tiny embedded MPU’s such as the ATmega and ATtiny series.

Fortunately, there is one more trick at our disposal, which removes the need for RAM altogether …

The trick is to place these strings in flash memory, alongside the code, and extract the characters of the string whenever we need them. It’s a great trick, but it will affect our sketch everywhere, unfortunately.

First of all, we need to include this line at the top of our sketch:

    #include <avr/pgmspace.h>

This header file gives access to a number of preprocessor macros and functions, needed to define strings in the proper way, and to read the character data from flash memory at run time.

The reason for this added complexity, is that flash memory isn’t simply an “address” you can read out. The AVR family uses two separate address spaces for code and data. This is called a Harvard architecture. As far as pointers go in C, there is no access to data in flash memory. Well – there is, because function pointers in C automatically refer to code in flash memory, but there is no way to mix these: data pointers cannot access flash, and function pointers cannot refer to RAM.

Back to the task at hand. We need a small utility function which can print a string located in flash ROM memory:

    void showString (PGM_P s) {
        char c;
        while ((c = pgm_read_byte(s++)) != 0)
            Serial.print(c);
    }

Note that the argument is not a const char*, but a PGM_P (defined in the pgmspace.h include file).

Now let’s redo the code with this ROM-based approach:

Screen shot 2011 05 22 at 22 16 10

The result? No RAM is used up by any of these strings, yippie!

The price to pay is a slightly larger compiled sketch, and more importantly: we have to use that “PSTR(…)” notation with each of the strings to make it all work.

This technique is not invasive, i.e. you don’t have to choose between RAM-based and ROM-based C strings for the entire sketch. It’s probably easier to only do this in those parts of your sketch which use lots of strings.

ATmega memory use

In AVR, Software on May 22, 2011 at 00:01

Sometimes, it’s useful to find out how much memory a sketch uses.

Sometimes, it’s essential do so, i.e. when you’re reaching the limit. Because strange and totally unpredictable things happen once you run out of memory.

Running out of flash memory for the code is easy to avoid, as the Arduino IDE will tell you exactly how much is being used after each compile / upload:

Running out of EEPROM memory is harder, but usually not an issue, since very few sketches use substantial amounts of EEPROM, if any.

Running out of RAM space is the nasty one. Because it can happen at any time, not necessarily at startup, and not even predictably because interrupt routines can trigger the problem.

There are three areas in RAM:

  • static data, i.e. global variables and arrays … and strings !
  • the “heap”, which gets used if you call malloc() and free()
  • the “stack”, which is what gets consumed as one function calls another

The heap grows up, and is used in a fairly unpredictable manner. If you release areas, then they will lead to unused gaps in the heap, which get re-used by new calls to malloc() if the requested block fits in those gaps.

At any point in time, there is a highest point in RAM occupied by the heap. This value can be found in a system variable called __brkval.

The stack is located at the end of RAM, and expands and contracts down towards the heap area. Stack space gets allocated and released as needed by functions calling other functions. That’s where local variables get stored.

The trick is to keep RAM usage low, because it’s a scarce resource: an ATmega has a mere 2048 bytes of RAM.

Here’s a small utility function which determines how much RAM is currently unused:

int freeRam () {
  extern int __heap_start, *__brkval; 
  int v; 
  return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval); 
}

And here’s a sketch using that code:

void setup () {
    Serial.begin(57600);
    Serial.println("\n[memCheck]");
    Serial.println(freeRam());
}

void loop () {}

The result will be:

[memCheck]
1846

Ok, so we have about 1.8 Kb free in a tiny sketch which does almost nothing. More precisely: a basic sketch uses 2048 – 1846 = 202 bytes for internal bookkeeping and stuff (128 bytes of which are needed for the hardware serial input buffer, BTW).

When that value drops to 0, your sketch will crash. This might show as an endless loop, strange calculations or output, or constant restarts. Don’t expect a nice error message!

Let’s make a tiny change:

Serial.println("\n[memCheck2]");

Output:

[memCheck2]
1844

Huh?

The first part of the explanation is that all C strings are also stored in RAM! This explains why adding a single character to the string reduced available memory.

The second part of the explanation, is that flash memory is organized in words. Therefore, sometimes when you expect a single-byte effect, this may get rounded up due to the way things are stored in flash memory.

And the third part of the explanation, is that all C strings also get stored in flash memory. The reason is that RAM contents is undefined on power-up, so one of the tasks performed by the C runtime startup code, is to copy all the strings from flash to RAM memory.

Moral of the story: be very careful when doing things with strings on an ATmega, because you may find that you quickly run out of memory space. Adding lots of verbose debugging print statements might cause more problems than you think!

Update – Here’s a great AVR RAM memory overview:

(from the avr-libc website)

C++ overloading with enums

In Software on May 20, 2011 at 00:01

A brief excursion into the land of advanced C++ tricks…

Create an enumTrick sketch with the following contents:

  #include "header.h"

  static int fun (One x) {
      return 10 + x;
  }

  static int fun (Two x) {
      return 20 + x;
  }

  void setup () {
      Serial.begin(57600);
      Serial.println("\n[enumTrick]");
      Serial.println(fun(A));
      Serial.println(fun(B));
      Serial.println(fun(C));
      Serial.println(fun(D));
      Serial.println(fun(E));
      Serial.println(fun(F));
  }

  void loop () {}

Now create a “header.h” file with the following contents:

  typedef enum { A, B, C } One;
  typedef enum { D, E, F } Two;

The stupid Arduino IDE pre-processing logic makes it necessary to place these definitions in a separate header file, unfortunately.

You can see that there are two definitions of fun (heh).

This code compiles without errors, because C++ supports “function overloading”, a mechanism to disambiguate function calls through the number and type of the arguments. In this case the args differ in type, being different enumeration constants.

Here’s the output:

  [enumTrick]
  10
  11
  12
  20
  21
  22

Note that if you were to call fun with an integer value, you’d get an error:

  enumTrick.cpp: In function 'void setup()':
  enumTrick:17: error: call of overloaded 'fun(int)' is ambiguous
  enumTrick.cpp:12: note: candidates are: int fun(One) <near match>
  enumTrick.cpp:13: note:                 int fun(Two) <near match>

Ok, so what’s the point?

Well, this provides a way to create a simpler API for drivers. Say we have a device which can turn the lights on and off, and dim the lights to a certain level (a Dimmer Plug, perhaps, or an X10 interface). We could do this:

typedef enum { ON, OFF } CmdSet1;
typedef enum { DIM } CmdSet2;

void DeviceControl (CmdSet1 cmd) { ... }
void DeviceControl (CmdSet2 cmd, byte level) { ... }

Now the only valid calls are those where ON or OFF is specified and nothing else, or DIM is specified and a single byte value is required. Every other mix generates an error at compile time. And there’s a single “DeviceControl” function you need to deal with and remember.

It’s definitely a contrived example, but I’m currently digging into some stuff which could probably benefit from this. Basically, you get static typing to help simplify more cases at compile time, which I expect will lead to more compact code.

Which – on a low-end 8-bit MPU – can be a big deal!

No time, yet

In Hardware, Software on Feb 5, 2011 at 00:01

Heh, I bet this isn’t about what you thought the title suggested :)

I’ve been spending a couple of hours today, trying to get a DCF77 time code receiver working on the new ookRelay2.pde sketch. There’s a dcf77demo.pde sketch in the Ports library, which actually does all the hard work of decoding that pulse train.

It’s fairly tricky code, and I tend to throw in lots of tricky hacks (silly me, I know):

Screen Shot 2011 02 04 at 23.31.52

To use this, dcf77poll() has to be called often in loop(), like so:

Screen Shot 2011 02 04 at 23.14.35

(lots of #ifdef’s in there now, to make the sketch more configurable)

The weird thing is that this all worked fine in the previous incarnation of the OOK relay.

This timing code is far less critical than the 433/868 MHz OOK decoding, by the way. Pulses come in once a second, and all it needs to do is disambiguate short and long pulses. Easy stuff for an ATmega.

Except today… well, I don’t know why, but I can’t make any sense out of what’s happening. Been debugging with a few prints to the serial port via LEDs, but no luck. I’ve switched prototype setups, redone it all from scratch, changed DCF77 receivers… nada. Worst of all, there’s no pattern.

It’s probably the wrong moon phase, or somethin’ – I’ll revisit this some other time (soon).

Poor Man’s Mesh Network

In Software on Jan 16, 2011 at 00:01

The recent packet relay and Extended node ID posts set the stage for a very simple way to create a fairly long-range (and very low-cost) little Wireless Sensor Network around the house.

Say hello to the Poor Man’s Mesh Network :)

Screen Shot 2011 01 14 at 13.54.50

(the main computer doesn’t have to be a PC/Mac, an embedded Linux box would also work fine)

Well, that’s a bit presumptuous. It’s not really “mesh” in the sense that there is no dynamic or adaptive routing or re-configuration involved at all. All the packet routes are static, so failure in one of the relays for example, will make everything “behind” it unreachable. In the world of wireless, that matters, because there are always unexpected sources of interference, and occasionally they completely wipe out the ability of a network to communicate on a given frequency band. For example, a couple of times a year, the data I’m collecting from my electricity and gas meters here at JeeLabs just stops getting in. Not sure what’s going on, but I’m not touching that setup in any way, and it’s unrelated to power outages. It might not be an RF issue, but who knows.

So what I’m referring to is not a super-duper, full-fledged, kitchen-sink-included type of wireless network. But given the extremely low cost of the nodes, and the fact that the software needs less than 4 kb of code, I think it’s a good example of how far you can get when using simplicity as guiding principle. Even an 8-bit MPU with 8 Kb of flash and 512 bytes of RAM would probably be sufficient for sensor nodes – so with the ATmega328’s used in JeeNodes, we really have oodles of spare capacity to implement pretty nifty applications on top.

Most of the details of how this works have already been presented in previous posts.

The one extra insight, is that the packet relay mechanism can be stacked. We can get data across N relays if we’re willing to limit our data packets to 66-N bytes. So by sacrificing a slight reduction in payload length we can extend the number of relays, and hence the total maximum range, as much as we like. Wanna get 10 times as far? No problem, just place a bunch of relays in the right spots along the whole path. Note that ACK round-trip delays will increase in such a setup.

The big design trade-off here is that all packet routing is static, i.e. it has to be set up manually. Each sensor node (the stars in that diagram) needs to have a netgroup which matches a relay or central node nearby, and within each netgroup each sensor node has to have a unique ID.

It’s not as bad as it may seem though. First of all, the range of the RFM12B on 433, 868, and 915 MHz is pretty good, because sub-GHz radio waves are far less attenuated by walls and concrete floors than units operating at 2.4 GHz. This means that in a small home, you wouldn’t even need a relay at all. I get almost full coverage from one centrally-placed node here at JeeLabs, even though the house is full of stone walls and reinforced concrete floors. As I mentioned before, I expect to get to curb and to the far end of our (small) garden with one or two relay hops.

Second, this star topology is very easy to adjust when you need to extend it or make changes – especially if all packet relays are one “hop” away from the central node, i.e. directly talking to it. You can turn one relay off, make changes to the nodes behind it, and then turn it back on, and the rest of the network will continue to work just fine during this change.

I’ve extended the groupRelay.pde sketch a bit further, to be able to configure all the parameters in it from the serial/USB side. These settings are saved in EEPROM, and will continue to work across power loss. This means that a relay node can now be as simple as this:

Dsc 2412

IOW, a JeeLink, plugged into a tiny cheapo USB power adapter. All you need to do is pre-load the groupRelay sketch on it (once), adjust its settings (as often as you like), and plug it in where you need it. How’s that as maintenance-free solution? And you can add/drop/alter the netgroup structure of the entire network at any time, as long as you’re willing to re-configure the affected nodes. If some of them turn out to be hard to reach because they are at the limit of the range, just insert an extra relay and tell the central software about the topology change.

It doesn’t have to be a JeeLink of course. A JeeNode, or some home-brew solution would work just as well.

Now that this design has become a reality, I intend to sprinkle a lot more sensors around the house. There have been lots of little projects waiting for this level of connectivity, from some nodes outside near the entrance, to a node to replace one of the first projects I worked on at JeeLabs!

So there you go. Who needs complexity?

Extended node IDs

In Software on Jan 15, 2011 at 00:01

The packet relay implementation shown a few days ago has some properties which somewhat limit its usability.

It all has to do with node IDs. As a relay it all works fine, i.e. packets will get relayed as intended. But the problem is that with a relay, all the packets relayed into another netgroup appear to come from that single relay node.

To go into this, let’s first set up a relay with the following configuration:

  • the main netgroup is #1, the relay listens on netgroup #2
  • the central node in netgroup 1 is node 31
  • the relay node listens on netgroup 2 as node 31
  • the relay sends packets out to netgroup 1 as node 30

Here’s that setup:

Screen Shot 2011 01 14 at 12.05.01

So – in principle – we could have up to 59 sensor nodes for getting actual work done. But there’s a problem: all packets sent by nodes 1..30 in netgroup 2 look like packets coming from node 30 once they reach the central node in netgroup 1!

Note that this is no problem if the relay is only used to extend the range of a single node: all we have to do is give the relay the same number of the node in netgroup 1 as what is assigned to that single node in netgroup 2.

But with more nodes behind the relay, we’re losing all their node ID’s. This is not very practical if we need to know exactly where the reported sensor data was generated, for example. We’re hitting the limitation that there are only 31 different ways to identify all incoming packets!

The solution is to insert the node ID of the original node as first byte of the payload data. So a packet coming from say node 7 in netgroup 2, will come in as a packet from node 30 (the relay), with an extra byte inserted in front of the packet, containing the value 7. The groupRelay.pde sketch has been extended to support this “multi-node relaying” capability.

On the receiving end, the software needs to know that node 30 in netgroup 1 is a relay. It then knows that the first byte is not data but an “extended” node ID, and it can use that to re-identify the packet as coming from a node in netgroup 2.

The extra data byte means that the maximum payload length across a relay is one byte less than what the nodes in netgroup 1 can send to the central node, i.e. 65 bytes instead of the 66 bytes supported by the RF12 driver.

Let’s use some conventions for node ID’s to support this mechanism. Node ID’s will be assigned as follows:

  • node ID’s 1..26 are available for “real” end points, e.g. sensor nodes
  • node ID’s 27..30 are available for relays and other “management” nodes
  • node ID 31 is to be used as standard ID for the central receiver
  • node ID 31 is also used by relays on the listening side
  • lastly, node 31 is normally the node which always listens, and which sends out ACKs

Note that there is nothing in the RF12 driver implementation which enforces such a numbering scheme. It’s just a convention to simplify configuration, and to simplify the software later on.

This scheme has the following implications:

  • the maximum number of end point sensor nodes is 26 x 250 = 6,500 nodes
  • each netgroup can have at most 4 group relays, i.e. the “fanout” is up to 4
  • each relay introduces a “hop” and may reduce the max payload size by one byte

Tomorrow, I’ll describe how to use this in a larger context.

Packet relaying vs. storage

In Software on Jan 13, 2011 at 00:01

In yesterday’s post I introduced a groupRelay.pde sketch, which implements a packet relay.

This can be used to (approximately) double the range between sensor nodes and the central data-collecting node. I’ve got two uses for this myself:

  • To try and get through two layers of reinforced concrete here at JeeLabs, i.e. from the garage to the living room to the office floor where my central data-collecting node is. I can get through one floor just fine (easily, even with a few extra walls), but two is giving me trouble.

  • To have a simple way to work with multiple groups of JeeNodes around here for testing and experimentation, while still allowing me to “merge” one of the test groups with the main, eh, “production” group. This can easily be accomplished by turning a suitably-configured relay on or off.

Note that all traffic takes place in the same 868 MHz frequency band. This isn’t a way to double the amount of bandwidth – all the packets flying around here have to compete for the same RF air space. All it does is separate the available space into distinct logical groups, i.e. net groups, which can be used together.

To summarize from yesterday’s post, this is how the relay code works right now:

Screen Shot 2011 01 12 at 18.07.18

If you think of time as advancing from top to bottom in this diagram, then you can see how the packet comes in, then gets sent out, then the ACK comes in, and finally the ACK gets sent back to the originating node. Let’s call this the Packet pass-through (PPT) approach.

This is very similar to how web requests work across the internet. There is an “end-to-end” communication path, with replies creating one long “round trip”.

But that’s not the only way to do things. The other way is to use a Store-and-forward (SAF) mechanism:

Screen Shot 2011 01 12 at 18.07.30

In this case, the relay accepts the packet, stores it, and immediately sends back an ACK to the originating node. Then it turns around and tries to get the stored packet to its destination.

This is how email works, BTW. The SMTP servers on which email is built can all store emails, and then re-send those emails one step closer to their intended destination.

There are several differences between PPT and SAF:

  • with PPT, it takes longer for the originating node to get back an ACK
  • with SAF, you get an ACK right away, even before the destination has the data
  • with PPT, all failures look the same: no proper ACK is ever received
  • with SAF, you might have gotten an ACK, even though the destination never got the data
  • with PPT, the logic of the code is very simple, and little RAM is needed
  • with SAF, you need to store packets and implement timeouts and re-transmission

But perhaps most importantly for our purposes, PPT allows us to place payload data in the ACK packet, i.e. ACKs can contain replies, whereas with SAF, you can’t put anything in an ACK, because the originating node already got an empty ACK from the relay.

Since SAF is harder to implement, needs more storage, and can’t handle ACK reply data, it just an inferior solution compared to PPT, right?

Not so fast. The main benefit of SAF, is that it can deal with nodes which don’t have to be available at the same time. If the relay is always on, then it will always accept requests from originating nodes. But the destination nodes need not be available at that time. In fact, the destination node might use polling, and ask the intermediate relay node whether there is data waiting to be sent out to it. In effect, the SAF relay now becomes sort of a PO box which collects all incoming mail until someone picks it up.

The implications for battery-powered wireless networks are quite important. With an always-on relay node in the middle, all the other nodes can now go to sleep whenever they want, while still allowing any node to get data to any other node. The basic mechanism for this is that the low-power nodes sleep most of the time (yeay, micro power!) and then periodically contact the relay node in one of two ways:

  • sending out a packet they want to get to some other place
  • polling the relay to get data waiting for them back as ACK reply data

The “sleep most of the time” bit is an essential aspect of low-power wireless networks. They can’t afford to keep a node awake and listening for incoming wireless packets all the time. An RFM12B draws about 15 mA while in receive mode (more than an ATmega!), and keeping it on would quickly deplete any battery.

So if we want to create an ultra low-power wireless network, we will neeed a central relay node which is always on, and then all the other nodes can take control over when they want to send out things and ask for data from that central node whenever they choose to. Which means they could sleep 99.5% of the time and wake up for only a few milliseconds every second, for example. Which is of course great for battery life.

BTW, in case you hadn’t noticed: we’re now entering the world of mesh-networking…

But the drawbacks of SAF remain: more complex logic, and the need to be able to queue up a lot of packets. So we need one node which is always on, and has plenty of memory. Hmmm, ponder, ponder… I remember having seen something suitable.

Of course: the JeeLink! It draws power via USB and has a large DataFlash memory buffer. Whee, nice! :)

Relaying RF12 packets

In Software on Jan 12, 2011 at 00:01

Since the RF12 driver does not implement a full OSI network “stack”, there are no such things as routers and mesh networks in Jee-land. This means you’re basically limited to the range of a single 868/915 MHz point-to-point packet connection.

There are a number of ways to increase the range. One is to use a directional antenna (I’ve never tried this, but it has been mentioned on the discussion forum). Another option is to lower the transmission baud rate inside the radio and the bandwidth settings, so that the power is “beamed” into a narrower frequency range. Both of these are based on RF properties.

A third option if you can’t get from here to there in one go is to take multiple “hops”. That’s what a mesh network tries to do fully automatically, adapting on the fly to varying reception conditions and network topologies.

I’m more interested in much simpler approaches, which can easily be implemented in a little 8-bit ATmega with limited code storage. There’s a lot we can do, even within these pretty hard constraints – and it’s fun to push those little boundaries!

So here’s another option: a dedicated “packet relay” node, which listens to a specific RF12 net group most of the time, and when a packet comes in, it turns around and sends the packet out over another net group. Apart from picking two net groups for this, the mechanism should be fairly transparent.

Here’s what happens when you install such a relay:

Screen Shot 2011 01 11 at 23.44.59

That only covers packets going one way. A refinement is to also deal with ACK packets. In that case, the relay should wait a little while to see whether an ACK comes in, and if it does, send that ACK back out to the original packet source:

Screen Shot 2011 01 11 at 23.45.05

Here’s a groupRelay.pde sketch, which implements this:

Screen Shot 2011 01 11 at 23.58.18

It’s pretty straightforward. But it’s only made for broadcast scenarios, i.e. the originating node must be using a broadcast packet (i.e. send to special node “0”). The relay can then simply re-broadcast the information.

In the case of ACKs, the originating node id will be in the header. The relay saves this, sends out a packet with an ACK-type request of its own, waits briefly for such an ACK to come in, and when it does, sends an ACK back to the originating node.

Note how the loss of packets will have the same effect as without a relay. The only difference being that there are 2 hops (or 4 w/ ACK) where packets can be lost or get damaged.

Tomorrow, I’ll explain a little bit what’s going on and what sorts of trade-offs this leads to.

Subversion

In Software on Dec 15, 2010 at 00:01

After yesterday’s post about source code control systems, here are some notes about getting started with it.

I’ll focus on Subversion, because that’s what I’m using.

Update, 2013 – Subversion is no longer in use, everything is now on GitHub.

There are two sides to a version control system: the server, where the source code “repository” is kept, and the client, which is your own workstation / PC. In distributed systems (DVCS), for example Git, there is no such asymmetry – client and server can perform the same tasks. But Subversion uses a centralized model: one repository, any number of clients.

You’ll need at least the client software to be able to use Subversion:

  • On Windows, you can use TortoiseSVN. It integrates deeply with the Windows Explorer, meaning that the current status of each file is always visible. See this page for some screen shots.

  • On Mac OSX it’s built-in (starting with 10.5, Leopard, I think) – although you probably have to install the Xcode that came with your Mac. You can check by typing “svn” in the terminal. If you get “Type ‘svn help’ for usage.”, then it’s ready for business.

  • On Linux, it’s a package called “subversion”. Install via your favorite package manager, depending on which flavor of Linux you’re using (e.g. Ubuntu).

Here’s the big picture:

Screen Shot 2010 12 14 at 15.26.17

Assuming you’re trying this out with existing source code (from JeeLabs, for example), then the server is somewhere on the internet, so you don’t have to deal with its setup. A little terminology and some conventions:

  • the server has all the source code and all the changes in a “repository”
  • you can get a local “working copy” by “checking it out” from the server
  • you can do anything you like on your copy, but…
  • don’t rename files or move anything around, without telling subversion
  • you cand send changes back by “checking them in” (a.k.a. “commit”)
  • you can adjust your copy to the latest version by “updating” your copy

So the basic idea is to check out a working copy (once), and then doing an update whenever you want to make sure you have the latest and the greatest. This update is totally safe: if you made changes on your working copy, and you do an update, subversion will tell you that there are changes, and occasionally maybe even a “conflict”, i.e. if your changes interfere with changes made by others.

Note that conflicts can occur in two ways: when updating your modified copy, or when committing your changes back to the server. The latter can lead to a conflict when someone made changes which you didn’t pick up yet.

If you don’t intend to make changes, or only minimally, then conflicts are a non-issue. In that case, subversion is simply a convenient (and very quick/efficient) way to keep up to date with all the latest changes.

There is another convenience, in that you can check to see what has changed, before updating. Perhaps you want to first examine the changes – and only the changes – to decide whether you want to adopt that latest version. This can be done with the subversion “status” and “diff” commands, but with the new Redmine setup at JeeLabs, it’s actually much simpler to use the web interface for this. Simply browse the repository from the web, starting here, and review the comments shown under the “Latest revisions” header (last ones are at the top). By zooming in as described yesterday, you can quickly see whether the changes are of interest to you. You could even decide to opdate only a single source file. Subversion will track all this, even with some parts updated and others not.

Let me give an example how to use subversion for getting the Ports library. I’ll use the command-line interface for this, assuming that it maps relatively simply to the GUI-based TortoiseSVN on Windows.

Step 1 – Initial check-out

You need to be in the proper place to check out. In this case, we want to get a “Ports” folder inside a “libraries” folder, which in turn needs to be placed inside the Arduino IDE’s “sketches” directory. On Mac OSX, this is “~/Documents/Arduino”. So the first thing to do, is make sure that the libraries folder exists:

    % mkdir -p ~/Documents/Arduino/libraries

Then go to that directory:

    % cd ~/Documents/Arduino/libraries

Now, we’ll check out the Ports library from JeeLabs:

    % svn checkout svn://svn.jeelabs.org/jeelabs/trunk/libraries/Ports

A new directory is created with the same name, i.e. “Ports”. It will contain all the source files. So we’re essentially done.

One thing to note, is that subversion needs to track a bit of information (such as where this Ports thing came from and when it was checked out). It does this in sub-folders called “.svn” – the leading dot is a convention to keep this area hidden. So it’s there all right, but you often won’t see it. That’s fine. Never go in there and make any changes, or you’ll completely mess up your working copy, as far as subversion is concerned.

Step 2 – Using the source code

Nothing much to say here. Just use it as if you had created the Ports folder yourself, perhaps by unpacking a ZIP archive. Apart from the “.svn” hidden areas, it’s the same thing. You should check out a copy of the RF12 library as well, since that’s always needed when using the Ports library:

    % cd ~/Documents/Arduino/libraries
    % svn checkout svn://svn.jeelabs.org/jeelabs/trunk/libraries/RF12

You can make changes to the source files if you want to. Subversion will know what you did, when asked, because it keeps an extra copy of the original files in the “.svn” folders, and besides, it could also use the server to check.

The one thing you can’t do with subversion (some other systems are more lenient), is to delete, rename, or move files and folders as you were used to. Don’t do things like these, or use other commands or programs to the same effect:

    % mv foo.c bar.c
    % rm foo.h

Instead, ask subversion to do it for you, and all will be well:

    % svn mv foo.c bar.c
    % svn rm foo.h

Step 3 – Tracking updates

Updating your copy to the latest version in the repository is trivial, and can be done at any time with:

    % svn update

You will get a concise summary of which files were changed. Changes may also include adding new files and removing obsolete ones. After the update, your working copy will be in sync with the repository again.

If you only want to find out what would be changed, without actually updating anything, use the status command:

    % svn status -u

But usually it’s simpler to use the web interface for this. For the Ports library, just go to https://jeelabs.net/projects/cafe/repository/show/Ports.

Step 4 – Submitting changes

If you have write access to the repository, then you can also submit the changes you made in your working copy. All you need is a reason… :)

Seriously, the only thing that needs some thought is what to put in the comment describing this change. Because Subversion insists on some comment. I usually check in after every little change and add a very brief note describing it, such as:

    % svn ci -m "fixed bug X so Y works again"

Note that with all these commands, you don’t have to mention the respository at all. Subversion knows which repository is being referenced by looking at the information in the “.svn” folder in the current directory you’re in.

I won’t go into conflict resolution or any of the numerous more advanced commands in subversion. Check the excellent manual online to find out more.

That’s it!

The only comment I’d like to add to conclude this weblog post, is that the above was written specifically for subversion. Many other systems exists, but once you are used to the terminology and conventions of one, using another version control system is usually no big deal. Lots of people seem to be using Git these days. No doubt due to the free GitHub source repository server, available to anyone who wants to maintain their source code in a public place. It comes with a nice and powerful set of features, see the help section for an overview.

Have fun with version control – and say goodbye to version mixups once and for all!

Source code

In Software on Dec 14, 2010 at 00:01

From the beginning, all Open Source Software at JeeLabs has been maintained in a system called Subversion. This is a version control system which makes it easy to manage changes in (large amounts of) source code over (large periods of) time.

There are many such systems, with names like CVS, Subversion, Bazaar, Git, Mercurial, and more. I used CVS in the past, and switched to Subversion many years ago. It suits me well.

If you think that version control is not much more than “making frequent backups in an organized manner”, then you may be in for a surprise. Please keep reading…

All of these systems deal with the same task: managing edits. And all of them have reached a level of functionality and maturity, that frankly, I couldn’t understand how anyone writing software these days could live without…

That’s quite a statement, so let me explain.

Software development is a process. You write code, you fix bugs, you add new features over time – sometimes over a period of years, in fact. Coding is an error-prone activity, from trivial typo’s, to syntax errors, to feature omissions, to design errors… it’s all part of the process.

Suppose someone finds a bug. Wouldn’t it be great if you could go back and see what changes you made, and in what order? Maybe the bug is a new one? Maybe it got introduced by a recent change? Maybe it’s part of a very specific feature?

Here’s an extract of a special “annotated” view of the RF12.h header file. This overview was generated by Redmine, as it is now used for the JeeLabs Café:

Screen Shot 2010 12 13 at 21.05.29

The color-coded areas are “change sets”. The numbers indicate in which “revision” this particular code was added or last changed. The numbers are clickable, so if I click on “5976” for example, to view the last change related to the rf12_sendStart() code, I get this:

Screen Shot 2010 12 13 at 21.08.23

This shows exactly which files were changed at the same time, and the comment I added when I made that change, i.e. “more rf12_sendStart() low power modes”. The value is not just this little overview, but the fact that all other files and changes have been omitted. All irrelevant info is gone.

Let’s say I want to find out what was changed in the “radioBlip.pde” sketch at that time. I can click on the “(diff)” link, to get this:

Screen Shot 2010 12 13 at 21.10.20

Aha, this is apparently where I added some extra low-power-mode options to the RF12 driver, as I was writing the radioBlip example (red = deleted code, green = added code).

Note how only a small part of the source code is shown. I’m only interested in what changed, so all I get is a “diff” with a few lines of context. I can always go back to my full copy of the source code to see the rest.

This particular change was made 4 months ago. It took me three clicks in the web browser to find out what happened. Not “sort of”, but precisely. I don’t know about you, but to me that is an incredible way to keep track of things. I never have to guess, I can go back in time and zoom in on any specific detail I’m interested in right now.

Let me drive the point home: this isn’t a feature, it’s a necessity. There is no way I could call myself a software developer without tools like these.

All version control systems do this, it doesn’t really matter which one you use. All I can say is: if you haven’t been using any of them, pick one and put your whole (software) life in it. Today. You will have to learn some new conventions and some new habits. You will feel constrained initially. But once that wears off and the new habits kick in, you will wonder how you ever did without.

This doesn’t just apply to “official” software development, btw. Anything that takes more than say a day to work on, will benefit from revision control, IMO. Some people put their entire websites in there. Some people even put their entire home directory in there!

The nice thing about modern revision control systems, is that they are networked. With proper care and authentication, source code can live in a public “repository” accessible to more people. Websites such as http://github.com/ specialize in providing a complete environment for anyone who wishes to manage their source code in public. Completely free. All you need is the proper “client software” for the version control system you’re using, and a server (such as Github) where you get source code and send changes back.

By now, I no longer consider the source code on my hard disk to be crucially important. It’s a copy, “checked out” from my main subversion repository, which lives on an internet-facing server. I can get an up to date copy anywhere, just by checking out a new copy (as long as I remember where the server is, and my personal username/password).

The second killer feature (no less) of a version control system, is that it lets people work together on a single project, no matter where they are. Everyone gets their own locally checked-out copy of the same project, and everyone can make whatever changes they like, and “check-in” the changes whenever they please.

This happens very efficiently, because version control systems are immensely clever about dealing with text changes. They don’t send around complete projects and source files all the time, they quickly send little changes around (called “diffs”). This makes the mechanism 100% scalable. You could be working on a project with millions of lines of source code. Yet when you send in a change, the process is almost as quick as with a tiny project.

When more than one person makes a change to a source file, and sends in that change, three things can happen:

  • there is no one else who changed that file, so the change is accepted as is
  • someone else changed that file, but not in the same place – again, the change is automatically accepted as is, and “merged” in
  • someone else changed one or more of the same lines, in which case the change is flagged as a conflict

The last case is the only one which needs a bit of attention. The changes need to be compared, and someone has to decide what the final version should look like. Usually, the intent is easy to spot, and things can be resolved without any further discussion. In rare cases, the two people involved will have to discuss and resolve the issue between them.

This sounds like a lot of hassle. But it’s in fact one of my favorite features: I can check out someone else’s source code, and use it. But I can also make changes in any way I like, without ever sending them in. IOW, I can tweak any code I have in any way I like. Never will I risk losing those changes, even if I never send them in (often, they are not of general use anyway). And I’ll always be able to go back and see what my own changes were.

So with this guaranteed checking and comparing of source code, it becomes impossibe to lose changes. My own, or anyone else’s. The version control system will flag any potential problem before it actually becomes one.

More tomorrow, as I describe how to get version control going on your own machine, and how to hook up to the source code in the JeeLabs repository.

If you can’t wait and already know how to use Subversion, all the JeeLabs code can be found here:

    svn://svn.jeelabs.org/jeelabs/trunk/

If you just want to have a quick look around using the new Redmine web interface: the JeeLabs libraries can be browsed here, while JeeMon has its own repository area, which can be browsed here.

RF12 acknowledgements

In Software on Dec 11, 2010 at 00:01

The RFM12B wireless module is a transceiver, i.e. able to send and receive packets over wireless. This is an important advantage over simple sensor units which just send out what they measure, and things like RF-controlled power switches which only listen to incoming data but are not able to report their current state.

The only thing is… it’s a bit more work.

This is reflected in how the RF12 library works:

  • simple reception is a matter of regularly polling with rf12_recvDone()
  • simple transmission means you also have to call rf12_canSend() and rf12_sendStart()
  • the above are both essentially uni-directional, so packets can get lost

The second mechanism added to RF12 was a set of “easy transmission” functions, i.e. rf12_easyPoll() and rf12_easySend(). These look similar, but they send out data packets asking for an ACK (acknowledge) packet from the receiver to confirm that the packet was correctly received. If nothing comes in, they will re-send the packet (and repeat a few times, if needed). This mechanism greatly improves the chance of a message arriving properly at the destination. Losing an occasional packet is one thing, losing all retries is a lot less likely!

Note that packets can be damaged or get lost at any time. It may well be that the original packet arrived just fine, but the ACK got lost instead. The sender will resend, and then (probably) get the ACK which stops this retry cycle.

So with the easy transmission functions, note that very occasionally a packet might be received twice. If it is crucial to weed these out, you can include a counter in your data packets to help detect and ignore duplicates.

With RF12demo as receiver, ACK handling is automatic. It knows when the originating node wants to get an ACK, and will send it out as soon as possible. This is reported in the output as the text “-> ack”.

The code for this in RF12demo is horrendous:

Screen Shot 2010 12 10 at 18.27.08

This is silly, and overkill for simple cases. So let’s improve on it.

I’ve added two utility definitions to the RF12.h header, which can simplify the above code to:

Screen Shot 2010 12 10 at 19.58.02

That’s better, eh?

The rest is just there to deal with a special configuration setting in RF12demo.

So if all you want is to add logic in your own sketch to send back an empty ACK packet when requested, the above can be simplified even further to:

Screen Shot 2010 12 10 at 19.59.09

For completeness, here’s a complete processing loop for a receiving sketch which supports nodes using the easy transmission mechanism:

Screen Shot 2010 12 10 at 19.59.45

You have to send out the ACK after processing the packet, because the rf12_sendStart() call will re-use the same packet buffer and overwrite the incoming packet.

Also, RF12_WANTS_ACK and RF12_ACK_REPLY are defined as macros which access the global rf12_hdr variable, as set by rf12_recvDone(). IOW, the convenience comes for free, but it does depend on some fixed assumptions. I can’t think of a situation where this would lead to problems, given that RF12-based sketches are probably all structured in the same way, and that globals are part of the RF12 driver.

For another example, see the blink_recv.pde sketch, which has also been simplified with these two macros.

Binary packet decoding – part 2

In AVR, Software on Dec 8, 2010 at 00:01

Yesterday’s post showed how to get a 2-byte integer back out of a packet when reported as separate bytes:

Unfortunately, all is not well yet. Without going into details, the above may fail on 32-bit and 64-bit machines when sending a negative value such as -12345. And it’s not so convenient with other types of data. For example, here’s how you would have to reconstruct a 4-byte long containing 123456789, reported as 4 bytes:

Screen Shot 2010 12 07 at 09.56.08

And what about floating point values and C structs? The trouble with these, is that the receiving party doing the conversion needs to know exactly what the internal byte representation of the ATmega is.

Here is an even more complex example, as used in the roomNode.pde sketch:

Screen Shot 2010 12 07 at 08.44.28

This combines different measurement values into a 4-byte C struct using bit fields. Note how the “temp” value crosses two bytes, but only uses specific bits in them.

Fortunately, there is a fairly simple way to deal with all this. The trick is to decode the values back into meaningful values by the receiving ATmega instead of an attached PC. When doing so, we can re-use the same definition of the information. By using the same hardware and the same C/C++ compiler on both sides, i.e. the Arduino IDE, all internal byte representation details can be left to the compiler.

Let’s start with this 2-byte example again:

I’m going to rewrite it slightly, as:

Screen Shot 2010 12 07 at 08.57.23

No big deal. This sends out exactly the same packet. But now, we can rewrite the receiving sketch as follows:

Screen Shot 2010 12 07 at 09.00.14

The effect will be to send the following line to the serial / USB connection:

    MEAS 12345

The magic incantation is this line:

Screen Shot 2010 12 07 at 09.01.45

It uses a C typecast to force the interpretation of the bytes in the receive buffer into the “Payload” type. Which happens be the same as the one used by the sending node.

The benefit of doing it this way, is that the same approach can be used to transfer any type of data as a packet. Here is an example how a Room Node code sends out a 4-byte struct with various measurement results:

Screen Shot 2010 12 07 at 09.07.07

And here’s how the receiving node can convert the bytes in the packet back to the proper values:

Screen Shot 2010 12 07 at 09.10.55

The output will look like:

    ROOM 123 1 78 -15 0

Nice and tidy. Exactly the values we were after!

It looks like a lot of work, but it’s all very straightforward to implement. Most importantly, the correspondence between what happens in the sender and the receiver should now be obvious. It would be trivial to include more data. Or to change some field into a long or a float, or to use more or fewer bits for any of the bit fields. Note also that we don’t even need to know how large the packet is that gets sent, nor what all the individual bytes contain. Whatever the sender does to map values into a packet, will be reversed by the receiver.

This works, as long as the two struct definitions match. One way to make sure they match, is to place the payload definition in a separate header file, say “payload.h” and then include that file in both sketches using this line:

Screen Shot 2010 12 07 at 09.16.47

The price to pay for this flexibility and “representation independence”, is that you have to write your own receiving sketch. The generic RF12demo sketch cannot be used as is, since it does not have knowledge of the packet structures used by the sending nodes.

This can become a problem if different nodes use different packets sizes and structures. One way to simplify this, is to place all nodes using the same packet layout into a single net group, and then have one receiver per net group, each implemented in the way described above. Another option is to have a single receiver which knows about the different types of packets, and which switches into the proper decoding mode depending on who sent the packet.

Enough for now. Hopefully this will help you implement your own custom WSN to match exactly what you need.

Update – Silly mistake: the “rf12_sendData()” call doesn’t exist – it should be “rf12_sendStart()”.

Binary packet decoding

In AVR, Software on Dec 7, 2010 at 00:01

The RF12 library used with the RFM12B wireless radio on JeeNodes is based on the principle of sending individual “packets” of data. I’ve described the reasons for this design choice in a number of posts.

Let me summarize what’s going on with wireless:

  • RFM12B-based nodes can send binary packets of 0..66 bytes
  • these packets can contain any type of data you want
  • a checksum detects transmission errors to let you ignore bad packets
  • dealing with packet loss requires an ACK + re-transmission mechanism

Packets have the nice property that they either arrive intact as a whole or not at all. You won’t get garbled or inter-mixed packets when multiple nodes happen to send at (nearly) the same time. Compare this to some other solutions where all the characters sent end up in one big “soup” if the sending happens (nearly) simultaneously.

But first: what’s a packet?

Well, loosely speaking, you could say that a packet is like one line of text. In fact, that’s exactly what you end up with when using the RF12demo sketch as central receiver: a line of text on the serial/USB connection for each received packet. Packets with valid checksums will be shown as lines starting with “OK”, e.g.:

    OK 3 128 192 1 0
    OK 23 79 103 190 0
    OK 3 129 192 1 0
    OK 2 25 99 200 0
    OK 3 130 192 1 0
    OK 24 2 121 163 0
    OK 5 86 97 201 0
    OK 3 131 192 1 0

Let’s examine how that corresponds with the actual data sent by the node.

  • All the numbers are byte values, shown as numbers in the range 0..255.
  • The first byte is a header byte, which usually includes the node ID of the sender, plus some extra info such as whether the sender expects an ACK back.
  • The remaining data bytes are an exact copy of what was sent.

There appears to be some confusion about how to deal with the binary data in such packets, so let me go into it all in a bit more detail.

Let’s start with a simple example – sending one byte:

Screen Shot 2010 12 06 at 22.32.26

I’m leaving out tons of details, such as calling rf12_recvDone() and rf12_canSend() at the appropriate moments. This code is simply broadcasting one value as a packet for anyone who cares to listen (on the same frequency band and net group). Let’s also assume this sender’s node ID is 1.

Here’s how RF12demo reports reception of this packet:

    OK 1 123

Trivial, right? Now let’s extend this a bit:

Screen Shot 2010 12 06 at 22.38.13

Two things changed:

  • we’re now sending a larger int, i.e. a 2-byte value
  • instead of passing length 2, the compiler calculates it for us with the C “sizeof” keyword

Now, the incoming packet will be reported as:

    OK 1 57 48

No “1”, “2”, “3”, “4”, or “5” in sight! What happened?

Welcome to the world of multi-byte values, as computers deal with them:

  • a C “int” requires 2 bytes to represent
  • bytes can only contain values 0..255
  • 12345 will be “encoded” in two bytes as “12345 divided by 256” and “12345 modulo 256”
  • 12345 / 256 is 48 – this is the “upper” value (the top 8 bits)
  • 12345 % 256 is 57 – this is the “lower” value (the low 8 bits)
  • an ATmega stores values in little-endian format, i.e. lowest-range bytes come first
  • hence, as bytes, the int “12345” is represent as first 57 and then 48
  • and sure enough, that’s exactly what we got back from RF12demo

Yeah, ok, but why should we care about such details?

Indeed, on normal PC’s (desktop and mobile) we rarely need to. We just think in terms of our numbering system and let the computer do the conversions to and from text for us. That’s exactly what “Serial.print(12345)” does under the hood, even on an Arduino or a JeeNode. Keep in mind that “12345” is also a specific representation of the abstract quantity it stands for (and “0x3039” would be another one).

So we could have converted the number 12345 to the string “12345”, placed it into a packet as 5 bytes, and then we’d have gotten this message:

    OK 1 31 32 33 34 35

Hm. Still not quite what we were looking for. Because now we’re dealing with ASCII text, which itself is also an encoding!

But we could build a modified version of RF12demo which converts that ASCII-encoded result back to something like this:

    OKSTR 1 12345

There are however a few reasons why this is not necessarily a good idea:

  • sending would take 5 bytes instead of 2
  • string manipulation uses more RAM, which is scarce on an ATmega
  • lots of such little inefficiencies will add up, once more data is involved

There is an enormous gap in performance and availability of resources between a modern CPU (even on the simplest mobile phones) and this 8-bit few-bucks-chip we call an “ATmega”. Mega, hah!

But you probably didn’t really want to hear any of this. You just want your data back, right?

One way to accomplish this, is to keep RF12demo just as it is, and perform the proper transformation on the receiving PC. Given variables “a” = 57, and “b” = 48, you can get the int value back with this calculation:

Screen Shot 2010 12 06 at 23.11.04

Sure enough, 57 + 48 * 256 is… 12345 – hurray!

It’s obviously not hard to implement such a transformation in PHP, Python, C#, Delphi, VBasic, Java, Tcl… whatever your language of choice is.

But there’s more to it, alas (hints: negative values, floating point, structs, bitfields).

Stay tuned… more options to deal with these representation details tomorrow!

Update – Silly mistake: the “rf12_sendData()” call doesn’t exist – it should be “rf12_sendStart()”.

Electric circuits

In Hardware on Dec 5, 2010 at 00:01

At the risk of prolonging this “electronics course” and boring everyone after the past two installments about power and switching regulators, here are some thoughts about electronic circuits.

I find the hydraulic analogy very useful, as long as you don’t push it too far. There is a (thick) book called Practical Electronics for Inventors by Paul Scherz (2007) which manages to construct analogies for just about all types of electronic components. It’s a stretch, but it does help understand things.

Here’s an example I found on the web:

Transistor

It’s quite easy to see how a small current can control a larger one. It also falls short, because “current” is really “pressure” in this case. My suggestion would be: do look at those analogies if you want to quickly build a mental model of how a specific component works, but keep the limitations of such analogies in mind.

The neat thing about electricity, is that you can completely understand a circuit by just applying a bit of logic. By which I mean: reason how it probably / sort-of works. And predict some of its behavior. It really pays to invest a bit in understanding electronics circuits.

There is a whole world to discover when you get into this. There are lots of conventions to figure out, but the neat thing is that these conventions are pretty much standardized by now. Electric circuits have been around for a couple of centuries, even if some of the newer components are only decades old.

Here’s part of the Infrared Plug, as a circuit diagram:

Screen Shot 2010 12 04 at 23.47.50

Even without knowing all the conventions, you’re probably able to guess many of the components. Connectors (PORT1, SV1), resistors (R1, R2, R3, R4), capacitors (C1, C2), a transistor (Q1), two LEDs (LED1, LED2), and a box marked “LM555D” which is an integrated circuit for building timers and oscillators.

How it works would be a bit of a long story for this post, but let’s make a guess:

  • C1 is charged up because it’s connected between + (VCC) and - (GND)
  • there are two resistors (R1 and R2) which limit the current flowing into it
  • the charge on C1 is measured through pin 6 of the chip (THR, i.e. threshold)
  • pin 7 (DIS, i.e. discharge) is actually an output, which can be either open or low

What the LM555D does (consider it magic for now), is watch the voltage level on THR. When it is low, DIS is not connected. The capacitor will charge up, until THR reaches a certain voltage.

At that point, the chip (through magic), decides to pull DIS low, i.e. put it at almost 0 volts, i.e. ground level.

While DIS is low, something completely different happens: with pin 6 being at a fairly high voltage, and DIS low, capacitor C1 will start to discharge, with current flowing into R1 in the opposite direction.

At some point, the THR voltage will be very low again, at which point the chip (magically) decides that it’s time to disconnect DIS. Then the cycle repeats: C1 will start to charge up through R1 and R2, etc… ad infinitum.

These 4 components are the heart of what is called an astable multivibrator.

We haven’t even looked at the rest of the circuit yet. I’ll leave it at that for now, just mentioning that the transistor is there to drive a strong current through the LEDs in a rapid ON/OFF pattern.

The components are dimensioned in such a way that this circuit oscillates at… 38 KHz. Which “happens” to be what most remotes send out as well.

There is a lot more to say about electric circuits, of course. I just wanted to point out that there is a huge amount of information you can glean from such “schematics” of various circuit diagrams.

Since everything made at Jee Labs is open source, you can dive into all of this. Look for the PDF’s linked on the various hardware pages if you’re interested.

It’s fun. It’s enormously informative. And unlike my use of the term “magic” there is really nothing special going on at all. It’s only magic until you dive in. If you have questions about how JeeNode pins are hooked up, for example: check out the manual (PDF). It’s all in there!

Update – for entertaining introductions of the main electrical components, there are a couple of short videos by Collin Cunningham on the Gizmodo website.

Ethernet web client

In Hardware, Software on Nov 29, 2010 at 00:01

The Ether Card and the associated EtherCard library offer an easy way to connect a JeeNode to a LAN and internet. I consider this option useful enough to even want it within reach on a breadboard, so I designed the Bridge Board to support it as well. Here’s the setup, based on a JeeNode Experimenter’s Pack:

Dsc 2365

This is all it takes for an Ethernet-connected setup, and all JeeNode ports are still freely available for project use.

However, the EtherCard library only includes some examples for the web server approach, such as the EtherNode. What about using this setup to send requests out, and POSTing results to a remote web server?

Well, encouraged by a few great DNS-related patches to the code by Gérard Chevalier, who was also so kind to show me how he does DNS lookups and web requests, I’ve come up with two new example sketches:

  • The getStaticIP.pde sketch does just a HTTP get to a fixed IP address.
  • The getViaDNS.pde sketch looks up my server’s IP address by name, via a DNS lookup on startup.

Here is the code of getViaDNS:

Screen Shot 2010 11 27 at 15.38.35

Note that to make any kind of outgoing request outside your LAN, the IP address of the local LAN gateway has to be specified at setup time.

The dependency on the Ports library is due to the use of the MilliTimer class in this code. And because of that, the RF12 library also needs to be mentioned (it won’t increase the code size). Welcome to dependency hell…

Here is some sample output:

Screen Shot 2010 11 27 at 15.29.28

Even though this example is relatively short, I still find it quite awkward. The EtherCard library code really needs a major overhaul – far too much of the underlying mess is still being exposed for no good reason, IMO.

Furthermore, a massive code cleanup will be needed to overcome the current limitations of the EtherCard library: the need for a large RAM buffer, and the limitation that TCP requests and replies may only use a single ethernet frame (approx 1500 bytes).

But getting it working first was more important, and with these new examples you can now see how to implement outgoing TCP/IP requests (not just web/http, btw). Pachube, anyone?

The obligatory clock…

In Software on Nov 22, 2010 at 00:01

Take a Graphics Board, a JeeNode (or JeeNode USB, or JeeSMD), some code, and you get this:

Dsc 2303

I guess it’s sort of the equivalent of “Hello World” for graphical displays by now.

This is my code for it:

Screen Shot 2010 11 21 at 17.38.44

It even includes a blinking colon! :)

There’s a separate header file with the ten bitmaps used for each digit.

Here is a copy of the modified ST7565 code I use.

This clock runs on the ATmega’s 16 MHz clock, which is only accurate to about 0.5% with a ceramic resonator. More importantly, this clock has no notion of real time – it just starts counting when turned on. The value it starts from uses a funky trick available in RTClib: it’s set to the compile time of the sketch … this is actually not a bad choice during development. But for real use, you’ll need to add an RTC Plug or some other means of obtaining the time (such as a DCF77 receiver, if you’re in Europe).

Anyway. At least now you can see that it really is a graphical display.

Bleep!

In AVR, Hardware, Software on Nov 21, 2010 at 00:01

Puzzle for you – what is this thing?

Dsc 2301

Maybe the demo sketch helps?

Screen Shot 2010 11 18 at 12.25.23

Answer: it’s an 8-bit digital-to-analog converter, using an R-2R ladder network (with 10 kΩ & 20 kΩ resistors).

The above toneGen.pde sketch generates a sine wave of approx 1 KHz:

Screen Shot 2010 11 18 at 12.25.17

There’s lots of ways to improve on this and turn it into a general-purpose function generator for example.

The interesting part is that all four ports are used for the required 8 I/O pins, and that due to the regularity of the pin assignment, they can all be set at once with very little code. The pin assignments for the DAC are:

  • bit 0 = AIO1
  • bit 1 = AIO2
  • bit 2 = AIO3
  • bit 3 = AIO4
  • bit 4 = DIO1
  • bit 5 = DIO2
  • bit 6 = DIO3
  • bit 7 = DIO4

The maximum attainable frequency is about 1.84 Khz with this software approach (1.95 KHz with the sine256 table placed in RAM), but that’s only if you use all 256 steps of the sine wave. The loop itself runs at about 1.84 * 256 = 470 KHz, so based on the Nyquist–Shannon sampling theorem, it should be possible to generate decent signals up to well over 200 KHz. The trick is to increment the “index” variable by larger values:

Screen Shot 2010 11 18 at 12.54.06

Here’s the corresponding output:

Screen Shot 2010 11 18 at 12.55.31

Still a pretty decent sine wave, so 16 resistors are all it takes to cover the entire audible frequency range.

By using fixed-point calculations for “fractional indexing”, you can get even more fine-grained control over the frequency of the generated signal. The following version generates an 8.01 KHz sine wave on my setup (note that “index” is now a 16 bit unsigned integer):

Screen Shot 2010 11 18 at 13.03.55

Update – I’ve changed the main loop to avoid the calling overhead of loop() itself. That increases the maximum attainable frequency by another 50%. Note that interrupts must be disabled to produce a clean tone.

Speedier graphics

In Software on Nov 19, 2010 at 00:01

The Graphics Board is going to enable a bunch of fun projects around here.

Unfortunately, the ST7565 library is a bit slow in one specific way – display updates. Since everything is buffered in RAM, most other operations are actually quite fast, but to get that data onto the gLCD takes time.

I had to find out how much time, of course. Here’s my test sketch:

Screen Shot 2010 11 17 at 19.45.42

All the loop does is send the same RAM buffer contents to the display, over and over again. The time it takes in microseconds is sent to the serial port, and the result is quite bad, actually:

  • 126 milliseconds, i.e. 8 refreshes per second, max!

The good news is that it’s a clear bottleneck, so chances are that it can be found and then hopefully also avoided. Sleuthing time!

The ST7565 core primitives, responsible for getting the data out to the display were coded as follows:

Screen Shot 2010 11 17 at 18.43.49

Guess what: the shiftOut() in the Arduino library is written with calls to digitalWrite(). That rings a bell.

Ah, yes, good OLD digitalWrite() again, eh? Of course

So I rewrote this code using fixed pin numbers and direct port access:

Screen Shot 2010 11 17 at 18.45.18

And sure enough, it’s almost exactly TEN times faster:

  • 12.3 milliseconds, i.e. 80 refreshes per second.

Needless to say, I’m going to leave these changes in my copy of the ST7565 library – even though that means it’s no longer general purpose since the pin assignments are now hard-coded. A 10-fold performance increase of something which really needs to be snappy and responsive is non-negotiable for me.

Here is a copy of the ST7565 code I use.

Could this bottleneck have been avoided?

The ST7565 library was clearly written as general purpose library, so making it usable with any set of pin assignments makes a lot of sense. The price paid in this case, was a 10-fold drop in performance, plus a few extra bytes of RAM used in the ST7565 class.

I’ll revisit this topic some other time, to discuss the trade-offs and implications of compile-time vs run-time logic, as well as tricks such as: putting the pin choices in a header file, using #include for source files, pre-processing and code generation, and C++ templates.

For now, I’m just happy with the 80 Hz refresh rate :)

Update – I had to slow down the SPI clock a bit, because the display was not 100% reliable with the code shown above. The fix is in the ZIP file.

Assembling the Graphics Board

In Hardware on Nov 16, 2010 at 00:01

This is going to be a weblog post with lots of pictures…

The Graphics Board is easy to assemble, but there are two things to watch out for:

  • the polarity markings for two of the electrolytic 1µF caps are reversed
  • if you solder the display onto the board, then you can’t easily reach the back anymore

The former is a mini goof-up by yours truly, the latter is something to keep in mind when deciding which headers you are going to use (you can side-step the issue by mounting the display using headers, but that will increase the height of the entire stack).

Here goes. As usual, I’m going to proceed from low to high-profile components, because that lets you put the components in, flip the board over, and press on it to get the components soldered closely to the board.

First the 100 Ω backlight series resistor:

Dsc 2251 2

Note that there’s a solder jumper right above it. The default is to connect the middle pad with the leftmost pad marked “P”. This connects the backlight permanently to the PWR line. If you want to control the backlight through the IRQ signal, you can set the solder jumper to the “I” position instead (see the backlight() docs). This can be changed later if you change your mind.

Then 5 yellow decoupling caps. These are not polarized, but make sure you put them into the right holes:

Dsc 2249

Update: The latest kit version ships with ceramic capacitors replacing the electrolytic capacitors. This gives longer life and simplifies the construction detail since these capacitors are not polarised. See the Wiki for more detail

The next step needs some care, because these two electrolytic caps are marked the wrong way on the silkscreen – they should be placed as follows:

Dsc 2251

The proper way to connect these, is to put long lead on the right and on the bottom, respectively. Or, same thing: put the gray stripes of the plastic case insulation on the left and on the top.

Note: the two caps on the left will only just barely fit when a JeeNode v5 is added on later. If you use a JeeNode v4, it is probably better to place them flat on their side pointing left. When in doubt, make sure you check the mechanical layout before soldering these two leftmost caps!

The other two 1 µF caps go as indicated on the board, they will in fact be oriented exactly opposite to the caps right next to them:

Dsc 2252

More…

Read the rest of this entry »

Flippin’ bits – revisited

In Software on Nov 14, 2010 at 00:01

For one of the infinite number of projects here at Jee Labs, I wanted to know how fast you can toggle a pin. This has been covered before in a previous weblog post, but this time I’m going to actually measure the results.

I’m going to focus on sending out 8 pulses in a row, because my goal is to transfer one byte in or out of the system (i.e. software-based SPI).

Here’s the simplest possible way to do it:

Screen Shot 2010 11 14 at 02.13.56

Measured result = 124.60 KHz, i.e. ≈ 8 µs per bit.

Some of that is loop overhead, and one trick to avoid that is to “unroll” the loop:

Screen Shot 2010 11 13 at 18.19.02

Measured result = 129.14 KHz – no big difference?

The reason is that the Arduino library’s “digitalWrite()” is very slow. So let’s go back to the loop and use a faster mechanism:

Screen Shot 2010 11 14 at 02.17.19

Measured result = 1.72 MHzwhoa, 0.58 µS per bit!

Now let’s unroll that loop again:

Screen Shot 2010 11 13 at 18.25.30

Measured result = 3.03 MHz – yep, now the loop overhad makes a big difference…

Can we do better? Yes, we can (heh) – using the PIND trick to toggle an output:

Screen Shot 2010 11 14 at 02.18.30

Measured result = 2.16 MHz – that’s ≈ 0.46 µs per bit.

And now, at these speeds, loop unrolling makes an even bigger difference:

Screen Shot 2010 11 13 at 18.31.54

Measured result = 4.71 MHz!

That’s a bit odd though. The JeeNode is running at 16 MHz, and 4.71 MHz is not a very clear multiple or divisor of anything. How can a regular sequence of statements generate such an irregular frequency? The puzzle is solved by looking at the bigger picture with a scope:

Screen Shot 2010 11 13 at 18.59.09

(as you can see, my scope can’t sample 8 MHz square waves with good fidelity)

The scope measurement says it all: these are short bursts, because now the calling overhead of “loop()” is taking almost as much time as the 8 pulses.

If I unroll the loop further to contain several hundred “PIND = bit(4);” statements, the frequency readout increases to 7.82 MHz. IOW, each C statement takes one processor cycle!

Quite an amazing feat, the AVR MPUs are clearly RISC-class processors. And the gcc compiler is generating optimal code in this case.

So there you have it. A lowly JeeNode (or Arduino) can generate multi-megahertz signals on an I/O pin, just by writing a few lines of C code!

Note that these are limiting values. As soon as you start adding more logic to these loops – whether unrolled or not – the maximum attainable frequency will quickly drop. But still: not too shabby for them little chips!

Update – measurements were slightly off, because not all loops were 8 long. Fixed now, same conclusions.

IR packet decoding

In Software on Nov 10, 2010 at 00:01

The InfraredPlug class in the Ports library has been extended with a decoder which tries to recognize the incoming IR pulse trains. Remember that the data is returned as a list of on/off times, returned as multiple of the “slot size” given to the InfraredPort constructor on setup.

I merely added a new “decoder()” member:

Screen Shot 2010 11 09 at 12.16.09

It currently only recognizes the NEC protocol, which is what three of my remotes here use. The NEC packets have 32 bits of payload (or no bits at all in the case of a “repeat” packet).

Here’s is how the ir_recv.pde sketch calls this new decoder:

Screen Shot 2010 11 09 at 12.19.35

Sample output from three different buttons on three different remotes each:

Screen Shot 2010 11 09 at 12.52.03

The first remote (starting with 246) shows a “classical” NEC remote, where each second byte is the 1’s complement of the one before it.

The second remote (starting with 162) uses 1’s complement only on the command bytes at the end.

The third remote (starting with 119) is an Apple Remote which uses the extended NEC format, and presumably another way to check the validity of received packets.

Full details about the NEC and other protocols can be found on San Bergman’s page.

Reflow profiles, again

In Software on Nov 2, 2010 at 00:01

Now that the thermocouple readout is working properly, things are moving again with the reflow controller.

What we need to do is set up a reflow temperature “profile” which drives the whole process. First the top level code which controls the whole shebang:

Screen Shot 2010 11 01 at 12.10.38

I’ve left out the reporting part, but basically we need to respond to button presses and go through the whole profile once activated. The reflow profile logic is handled by “setPhaseAndTemp()” function:

Screen Shot 2010 11 01 at 12.12.33

As you can see, it is a simple state machine which moves from one phase to the next according to current temperature and timing conditions. The parameters I’m currently using are as follows:

Screen Shot 2010 11 01 at 12.19.36

Each phase has slightly different conditions, and criteria with which it decides to move on to the next phase.

I’ve added a “warmup” phase at the start for two reasons: 1) to always start off on the main part of the reflow profile from a fairly well-defined temperature, and 2) to implement a drying period which removes any humidity present in the chips, since some chips seem to be sensitive to that. That does somewhat lengthen the whole cycle, because there is a lot of overshoot at low temperatures.

Tomorrow, I’ll describe the complete setup and show you how it performs.

Conquering the thermocouple

In Hardware, Software on Nov 1, 2010 at 00:01

(No Halloween stuff on this side of the pond – I’ll defer to Seth Godin for some comments on that…)

A while back, I had to shelve my experiments with the reflow controller, because I couldn’t get a reliable temperature reading from the Thermo Plug when using a thermcouple.

Or rather, sometimes it worked, sometimes it didn’t: the physical computerer’s equivalent of a nightmare!

The thermocouple circuit is very sensitive to ground currents, apparently. The effect was that my setup would work fine on batteries, but jump all over the place when attached to the USB port. Not very convenient for development, obviously.

It still has some unexplained behavior, but I’ve been able to narrow it down, so there are two new pieces of good news: 1) it only works badly while data is being transferred over the USB port, and 2) with some averaging, the readout is actually rock solid, both on batteries and on USB. I still see a difference in readout when data is transferred over USB, but since this is a JeeNode, I can work around that in the final version: go wireless!.

Here’s the readout code which produces good readings – all remaining jitter is now in 1/10’s of degrees Celsius:

Screen Shot 2010 10 31 at 18.48.10

The output is in 1/100’s of °C, because I’m trying to avoid floating point math in this sketch.

And here is the measuring side of my new reflow setup:

Dsc 2200

The thermocouple is taped to the thin aluminium insert in the grill using heat-resistant Kapton tape. When I turn on the heater, I now see a clear rise in temperature within seconds – perfect!

Note that I’m using a 4x AA pack i.s.o. 3x AA, because the AD597 needs at least 2V more on its supply line than the highest output voltage it is going to report. With 4x 1.2V (worst case, i.e. near-empty eneloops), the range will be at least 4.8 – 2 / 0.010 = 280°C, i.e. plenty!

And indeed, I’ve verified that at 250°, it reports valid temperatures on the attached LCD Plug w/ display.

The other plug you see in the lower left is a Blink Plug, with two pushbuttons and two LEDs.

Let’s see if this time around we can get the whole thing going properly!

Sending data TO remote nodes

In Software on Oct 31, 2010 at 00:01

Yesterday’s post described an easy way to get some data from remote battery-powered nodes to a central node. This is the most common scenario for a WSN, i.e. when reading sensors scattered around the house, for example.

Sometimes, you need more reliability, in which case the remote node can request an “ACK” and wait (briefly) until that is received. If something went wrong, the remote node can then retry a little later. The key of ACKs is that you know for sure that your data packet has been picked up.

But what if you want to send data from the central node TO a remote node?

There are a couple of hurdles to take. First of all, remote nodes running on batteries cannot continuously listen for incoming requests – the RFM12B receiver draws more than the ATmega at full power, it would drain the battery within a few days. There is simply no way a remote node can be responsive 100% of the time.

One solution is to agree on specific times, so that both sides know when communication is possible. Even just listening 5 ms every 500 ms would create a fairly responsive setup, and still take only 1% of the battery as compared to the always-on approach.

But this TDMA-like approach requires all parties to be (and remain!) in sync, i.e. they all need to have pretty accurate clocks. And you have to solve the initial sync when powered up as well as when reception fails for a prolonged period of time.

A much simpler mechanism is to let the remote take the initiative at all times: let it send out a “poll” packet every so often, so we can send an ACK packet with some payload data if the remote node needs to be signaled. There is a price: sending a packet takes even more power than listening for a packet, so battery consumption will be higher than with the previous option.

The next issue is how to generate those acks-with-payload. Until now, most uses of the RF12 driver required only packet reception or simple no-payload acks. This is built into RF12demo and works as follows:

Screen Shot 2010 10 30 at 20.17.37

That’s not quite good enough for sending out data to remote nodes, because the central JeeLink wouldn’t know what payload to include in the ACK.

The solution is the RF12demo’s “collect mode”, which is enabled by sending the “1c” command to RF12demo (you can disable it again with “0c”). What collect mode does, is to prevent automatic ACKs from being sent out to remote nodes requesting it. Instead, the task is delegated to the attached computer:

Screen Shot 2010 10 30 at 20.17.48

IOW, in collect mode, it becomes the PC/Mac’s task to send out an ACK (using the “s” command). This gives you complete control to send out whatever you want in the ACK. So with this setup, remote nodes can simply broadcast an empty “poll” packet using:

    rf12_sendStart(0, 0, 0);

… and then process the incoming ACK payload as input.

It’s a good first step, since it solves the problem of how to get data from a central node to a remote node. But it too has a price: the way ACKs are generated, you need to take the round-trip time from JeeLink to PC/Mac and back into account. At 57600 baud, that takes at least a few milliseconds. This means the remote node will have to wait longer for the reply ACK – with the RFM12B still in receive mode, i.e. drawing quite a bit of current!

You can’t win ’em all. This simple setup will probably work fine with remotes set to wait for the ACK using Sleepy::loseSometime(16). A more advanced setup will need more smarts in the JeeLink, so that it can send out the ACK right away – without the extra PC round-trip delay.

I’ll explore this approach further when I get to controlling remote nodes. But that will need more work – such as secure transmissions: once we start controlling stuff by wireless, we’ll need to look into authorization (who may control this node?), authentication (is this packet really from who it says it is?), and being replay-proof (can’t snoop the packet and re-send it later). These are all big topics!

More on this some other time…

Simple RF12 driver sends

In AVR, Software on Oct 30, 2010 at 00:01

(Whoops… this post got mis-scheduled – fixed now)

Yesterday’s post illustrates an approach I’ve recently discovered for using the RF12 driver in a very simple way. This works in one very clear-cut usage scenario: sending wireless packets out periodically (without ACK).

Here’s the basic idiom:

Screen Shot 2010 10 29 at 13.01.24

What this does is completely ignore any incoming data, it just waits for permission to send when it needs to, and then waits for the send to complete by specifying “2” as last arg to rf12_sendStart().

No tricky loops, no idle polling, everything in one place.

With a few lines of extra code, the RFM12B module can be kept off while not used – saving roughly 15 mA:

Screen Shot 2010 10 29 at 13.07.11

And with just a few more lines using the Sleepy class, you get a low-power version which uses microamps instead of milliamps of current 99% of the time:

Screen Shot 2010 10 29 at 13.09.03

Note the addition of the watchdog interrupt handler, which is required when calling Sleepy::loseSomeTime().

The loseSomeTime() call can only be used with time ranges of 16..65000 milliseconds, and is not as accurate as when running normally. It’s trivial to extend the time range, of course – let’s say you want to power down for 10 minutes:

Screen Shot 2010 10 29 at 13.11.16

Another point to keep in mind with sleep modes, is that it isn’t always easy to keep track of time and allow other interrupts to wake you up again. See this recent post for a discussion about this.

But for simple Wireless Sensor Network node scenarios, the above idioms will give you a very easy way to turn your sketches into low-power mode which can support months of operation on a single set of batteries.

Update – it looks like the above RF12_sleep() arguments are completely wrong. They should be:

  • rf12_sleep(N) turns the radio off with a wakeup timer enabled if N is 1..127
  • rf12_sleep(0) turns the radio off
  • rf12_sleep(-1) turns the radio back on

This list matches what is documented on the wiki page.

Parsing input commands

In Software on Oct 24, 2010 at 00:01

I often need some way to get a bit of data and a few commands into an Arduino-type sketch from the serial port. Trivial but tedious stuff if you need to write the code for it, each time again.

Trouble is, the ATmega has very limited RAM space and string processing capabilities, so writing a super duper fancy string input parser is too wasteful of resources to be practical, in most cases.

In the RF12demo sketch, I used a very simple postfix notation, which lets you enter some bytes of data and a one-letter command. The syntax is:

<arg1>,<arg2>,... <cmd-char>

It works fairly well, but arguments are limited to single bytes and have to be entered as decimal values.

Here’s another attempt to simplify this sort of task. I’ve added an “InputParser” class to the Ports library, which takes care of basic parsing, while remaining flexible enough to allow use in different sketches. It supports input of bytes, ints, and longs in decimal or hex format, as well as short strings. It’s not JeeNode specific, and should work with any Arduino.

Here is the InpurtParser class definition (minor details omitted):

Screen Shot 2010 10 23 at 20.11.12

Note that many choices were made to keep the code overhead low, while allowing enough flexibility to input data in different ways. As a consquence, the syntax is a bit strict and limited:

  • args are separated by spaces or commas, with the command code typed last
  • plain numbers are interpreted as bytes: 123
  • negative values can be entered using a trailing minus sign: 123-
  • append a decimal point to enter 2-byte ints: 12345.
  • append a colon to enter as 4-byte longs: 123456789:
  • prefix with a dollar sign to use hex mode: $1A / $1A2B. / $123456:
  • enter strings between double quotes: “this is a string”

I’ve added a “parser_demo.pde” sketch to illustrate its use:

Screen Shot 2010 10 23 at 20.13.05

When initialized, the parser must be given a buffer it can use to collect all argument data and input strings in. In this simple example, a 50-byte buffer is allocated dynamically.

Each command is a separate function, which pulls its variables from the parser object when called. The parser is initialized with a table of these commands, listing the command code and minimum number of expected bytes, along with the function to call.

Here are some examples of input:

h
?
v
1234. v
"Hello, there!" s
123456789: "abc" $100. "defgh" 12345. c

Here is the output, corresponding to each of the above commands:

[parser_demo]
Hello!
Known commands: h v l s c
Not enough data, need 2 bytes
value = 1234
string = <Hello, there!>
complex = 123456789 <abc> 256 <defgh> 12345

Note that you get a list of known commands when entering a bad command (or “?”).

That last example illustrates how to pass a variety of input argument types to a command. The code for this is:

Screen Shot 2010 10 23 at 20.25.51

This demo requires about 5 Kb, including some 850 bytes for the command functions. 600 bytes for malloc (this is optional), and 600 bytes for the Serial class. The parser itself uses some 1.7 Kb.

By default, the parser object is attached to the “Serial” stream, but it can be hooked up to something else if necessary, such as an LCD or Ethernet.

Enjoy!

Room Node – next steps

In AVR, Software on Oct 19, 2010 at 00:01

First of all: thanks to everyone who commented on the recent posts, both online and by direct email. Fantastic feedback, lots of ideas, and several valuable corrections.

In yesterday’s post, I agonized about how hard it is to track time in some sort of reasonably continuous way when most of it is spent in comatose state. Fortunately, a Room Node won’t need that awareness very much. I just wanted to show how things get complex in terms of watchdogs running-but-not-yet-expired, and multiple interrupt sources.

For the next version of the rooms.pde sketch, my plan of attack is to simply ignore the issue. The watchdog will be used for guaranteed wake-up, while PIR motion detection and radio reception interrupts will simply cause the millis() clock to lose time, occasionally.

For ACKs, I’m going to start simple and wait for up to 2 milliseconds in idle mode, before resuming lower-power options again. One interesting suggestion was to slow down the clock while doing so (through the system clock pre-scaler), but that has to be done with care because a slower-running ATmega will also respond more slowly to RF12 driver interrupts – and there is not that much slack there.

Here’s the updated Scheduler class, now included in the Ports library:

Screen Shot 2010 10 18 at 22.54.27

Only a minor extension on the API side: by using pollWaiting() instead of poll(), you can tell the system to enter low-power mode until the next scheduled task. If another interrupt pre-empts this wait cycle, then it’ll break out of power-down mode and go through the loop and re-enter the low-power state the next time around.

The other change I’m currently testing is that a few Scheduler functions can be called from interrupt code, so this provides a way for interrupt code to safely schedule, resume, or cancel a task “via the main thread”.

This is the tiny change needed to make the schedule.pde demo energy efficient:

Screen Shot 2010 10 18 at 23.01.24

However, because it now uses the watchdog, you also need to add the following boilerplate to the sketch:

ISR(WDT_vect) { Sleepy::watchdogEvent(); }

The demo now draws 25 µA when I remove the LEDs.

So here’s the deal: if you can manage to write all your code is such a way that it never calls delay() – or delayMicroseconds() with large values – and instead uses this Scheduler-with-tasks approach, then you’ll get a fairly low power usage almost for free… i.e. without having to do anything else! (just don’t forget to turn the radio on and off as needed)

The code has been checked into subversion, and is also available as ZIP archive – see the Ports info.

Update – more changes checked in to better handle watchdog vs other interrupts.

Tracking time in your sleep

In AVR, Software on Oct 18, 2010 at 00:01

No, this isn’t a story about bio-rhythms :)

One of the challenges I’ll be up against with Room Nodes is how to keep track of time. The fact is that an ATmega is extraordinarily power efficient when turned off, and with it a JeeNode – under a few microamps if you get all the little details right. That leads to battery lifetimes which are essentially only determined by self-discharge!

But there are two problems with power off: 1) you need to be 100% sure that some external event will pull the ATmega out of this comatose state again, and 2) you can completely lose track of time.

Wireless packets are of no use for power-down mode: the RFM12B consumes many milliamps when turned on to receive packets. So you can’t leave the radio on and expect some external packets to tell you what time it is.

Meet the watchdog…

Fortunately, the ATmega has a watchdog, which runs on an internal oscillator. It isn’t quite as accurate, but it’ll let you wake up after 16 ms, 32ms, … up to 8 seconds. Accuracy isn’t such a big deal for Room Nodes: I don’t really need to know the temperature on that strict a schedule. Once every 4 .. 6 minutes is fine, who cares…

There’s a Sleepy class in the Ports library, which manages the watchdog. It can be used to “lose time” in a decently accurate way, and will use the slowest watchdog settings it can to get it out of power-down mode at just about the right time. To not disrupt too many activities, the “millis()” timer is then adjusted as if the clock had been running constantly. Great, works pretty well.

It’s not enough, though.

As planned for the next implementation, a Room Node needs to sleep one minute between wakeups to readout some sensors, but it also needs to wake up right away if the motion sensor triggers.

One solution would be to wake up every 100 ms or so, and check the PIR motion sensor to see whether it changes. Not perfect, but a 0.1s delay is not the end of the world. What’s worse though is that this node will now wake up 14,400 times per day, just to check a pin which occasionally might change. This sort of polling is bound to waste some power.

Meet the pin-change interrupt…

This is where pin-change interrupts come in, They allow going into full power-down, and then getting woken up by a change on any of a specified set of pins. Which is perfect, right?

Eh… not so fast:

Screen Shot 2010 10 17 at 22.05.30

Q: What time is it when the pin-change occurred?

A: No idea. Somewhere between the last watchdog and the one which will come next?

IOW, the trouble with the watchdog is that you still don’t really track time. You just know (approximately) what time it is when the watchdog fires again.

If the watchdog fires say every 8 seconds, then all we know at the time of a pin-change interrupt, is that we’re somewhere inside that 8s cycle.

We can only get back on track by waiting for that next watchdog again (and what if the pin change fires a second time?). In the mean time, our best bet is to assume the pin change happened at the very start of the watchdog cycle. That way we only need to move the clock forward a little once the watchdog lets us deduce the correct moment. FYI: everything is better than adjusting a clock backwards (timers firing again, too fast, etc).

Now as I said before, I don’t really mind losing track of time to a certain extent. But if we’re using 8-second intervals to get from one important measurement time to the next, i.e. to implement a 1-minute readout interval, then we basically get an 8-second inaccuracy whenever the PIR motion detector triggers.

That’s tricky. Motion detection should be reported right away, with an ACK since it’s such an important event.

So we’re somehere inside that 8-second watchdog cycle, and now we want to efficiently go through a wireless packet send and an ACK cycle? How do you do that? You could set the watchdog to 16 ms and then start the receiver and power down. The reception of an ACK or the watchdog will get us back, right? This way we don’t spend too much time waiting for an ack with the receiver turned on, guzzling electrons.

The trouble is that the watchdog is not available at this point: we still want to let that original 8-second cycle complete to get our knowledge of time back. Remember that the watchdog was started to get us back out in 8 seconds, but that it got pre-empted by a pin-change.

Let me try an analogy: the watchdog is like throwing a ball straight up into the air and going to sleep, in the knowledge that the ball will hit us and wake us up a known amount of time from now. In itself a pretty neat trick to keep track of the passage of time, when you don’t have a clock. Well, maybe not for humans…

The scenario that messes this up is that something else woke us up before the ball came down. If we re-use that ball for something else, then we have lost our way to track time. If we let that ball bring us back into sync, fine, but then it’ll be unavailable for other timing tasks.

I can think of a couple solutions:

  • Dribble – never use the watchdog for very long periods of time. Keep waking up very frequently, then an occasional pin-change won’t throw us off by much.

  • Delegate – get back on track by asking for an ack which tells us what time it is. This relies on a central receiving node (which is always on anyway), to tell us how to adjust our clock again.

  • Fudge it – don’t use the watchdog timer, but go into idle mode to wait for the ack, and make that idle period as short as possible – perhaps 2 ms. IOW, the ACK has to reach us within 2 milliseconds, and we’re not dropping into complete powerdown during that time. We might even get really smart about this and require that the reply come back exactly 1 .. 2 ms after the send, and then turn off the radio for 1 ms, before turning it on for 1 ms. Sounds crazy, until you realize that 1 ms of radio time uses as much energy as 5 seconds of power down time – which adds up over a year! This is a bit like what TDMA does, BTW.

All three options look practical enough to consider. Dribbling uses slightly more power, but probably not that much. Delegation requires a central node which plays along and replies with an informative ack (but longer packets take longer to receive, oops!). Fudging it means the ATmega will be in idle mode a millisec or two, which is perhaps not that wasteful (I haven’t done the math on that yet).

So there you go. Low power stuff isn’t always trivial, once you start pushing for the real gains…

Scheduling multiple tasks

In Software on Oct 17, 2010 at 00:01

Since Room Nodes are going to become more sophisticated in terms of timing, I’m adding some support code to simplify my work later on. In most cases, I think it’s not a good idea to start generalizing too early, but this is a pretty common utility: scheduling multiple activities in parallel. No rocket science, and badly needed…

So here’s a little “Scheduler” class I’m going to add to the Ports library:

Screen Shot 2010 10 17 at 00.00.53

Nothing fancy. My main concern is to keep RAM usage low. So the idea is that you can have a fixed number of tasks, each one with (at most) one timer running. Timers can be started and cancelled at will. The term “task” is used very loosely here: anything that needs to have an independent timer mechanism can be treated as a separate task. This is not multi-tasking or threading – just a way to manage activities which need to be performed some time into the future.

Unlike the Millitimer class, which works for durations up to 60000 milliseconds, the Scheduler class works in tenths of seconds, and supports durations from 0 to 6000 seconds, i.e. over an hour and a half. A zero duration means: run the associated task as soon as possible.

Probably best to show this in action with an example:

Screen Shot 2010 10 16 at 23.56.53

The way to use the scheduler, is to define an enum with all the tasks there can be, plus a “TASK_LIMIT” value, which is in fact simply the number of tasks.

Then you declare a “Scheduler” instance, and give it that TASK_LIMIT value, so it can allocate the proper amount of memory (each task needs only 2 bytes).

The only remaining thing to do is call the “poll()” member and handle each task request in a big switch statement.

The demo will blink two LEDs at different rates, the classical (and simplest) example of doing two different things at the same time. Eh… almost the same time.

I’m still putting the finishing touches on this code and making sure that this will play nicely with low-power and power-down modes, so it hasn’t been checked in yet – it should all be ready in at most a day or two. At which time this will become part of the Ports library, along with the above “schedule.pde” example sketch.

IR decoding with pin-change interrupts

In AVR, Hardware, Software on Oct 14, 2010 at 00:01

Yesterday’s post described a new “InfraredPlug” class which handles the main task of decoding IR pulse timings. The “irq_recv.pde” example sketch presented there depended on constant polling to keep the process going, i.e. there has to be a line like this in loop():

    ir.poll();

Worse, the accuracy of the whole process depends on calling this really often, i.e. at least every 100 µs or so. This is necessary to be able to time the pulse widths sufficiently accurately.

Can’t we do better?

Sure we can. The trick is to use interrupts instead of polling. Since I was anticipating support for pin-change interrupts, I already designed the class API for it. And because of that, the changes needed to switch to an interrupt-driven sketch are surprisingly small.

I’ve added a new irq_send_irq.pde sketch to the Ports library, which illustrates this.

The differences between using polling mode and pin-change interrupts in the code are as follows. First of all, we need to add an interrupt handler:

Screen Shot 2010 10 13 at 00.26.10

Second, we need to enable those interrupts on AIO2, i.e. analog pin 1:

Screen Shot 2010 10 13 at 00.26.44

And lastly, we can now get rid of that nasty poll() call in the loop:

Screen Shot 2010 10 13 at 00.27.51

That’s all there is to it. Does it still work? Of course:

Screen Shot 2010 10 13 at 00.29.16

Note: I made some small changes in the InfraredPlug implementation to tolerate interrupts and avoid race conditions.

This all seems like an insignificant change, but keep in mind that this completely changes the real-time requirements: instead of having to poll several thousands of times per second to avoid missing pulses or measuring them incorrectly, we can now check for results whenever we feel like it. Waiting too long would still miss data packets of course, but this means our code can now continue to do other lengthy things (or go into a low-power mode). Checking for incoming packets a few times a second is sufficient (IR remotes send out a packet every 100 ms or so while a button is pressed).

So the IR decoder now has the same background behavior as the RF12 driver: you don’t need to poll it in real-time, you just need to check once in a while to see whether a new packet has been received. Best of all, perhaps, is that you can continue to use calls to delay() even though they make the main loop less responsive.

There is another side effect of this change: if your code includes a call to “ir.send()”, then the receiver will see your own transmission, and report it as an incoming packet as well. Which shows that it’s running in the background. This could even be used for collision detection if you want to build a fancy IR wireless network on top of all this.

So there you go: an improved version of the InfraredPlug class, which lets you use either explicit polling or pin-change interrupts. The choice is yours…

Software PWM at 1 KHz

In AVR, Software on Oct 3, 2010 at 00:01

While pondering about some PWM requirements for a new project here, I was looking at the rgbAdjust.pde sketch again, as outlined in the Remote RGB strip control weblog post from a few months back. It does PWM in software, and plays some tricks to be able to do so on up to 8 I/O pins of a JeeNode, i.e. DIO and AIO on all 4 ports. The main requirement there, was that the PWM must happen fast enough to avoid any visible flickering.

The rgbAdjust sketch works as follows: prepare an array with 256 time slots, each indicating whether an I/O pin should be on or off during that time slot. Since each array element consists of one byte, there is room for up to 8 such bit patterns in parallel. Then continuously loop through all slots to “play back” the stored PWM patterns.

There is one additional refinement in that I don’t actually store the values, but only store a 1-bit during the change of values. That shaves off some overhead when rapidly changing I/O pins (see the Flippin’ bits post).

There are some edge cases (there always are, in software), such as dealing with full on and full off. Those two cases require no bit flipping, whereas normally there are always exactly two flips in the 256-cycle loop. But that’s about it. It works well, and when I simplified the code to support only brightness values 0..100 instead of the original 0..255, the PWM rate went up to over 250 Hz, removing all visible flicker.

So what rgbAdjust does, is loop around like crazy, keeping track of which pins to flip. ATmega’s are good at that, and because the RF12 driver is interrupt-driven, you can still continue to receive wireless data and control the RGB settings remotely.

But still, a bit complex for such a simple task. Isn’t there a simpler way?

As it turns out, there is… and it’ll even bump the PWM rate to 1 KHz. I have no idea what our cat sees, but I wouldn’t be surprised if cats turned out to be more sensitive than us humans. And since I intend to put tons of LED strips around the house, it better be pleasant for all its inhabitants!

What occurred to me, is that you could re-use a hardware counter which is always running in the ATmega when working with the Arduino libraries: the TIMER-0 millisecond clock!

It increments every 4 µs, from 0 to 255, and wraps around every 1024 µs. So if we take the current value of the timer as the current time slot, then all we need to do is use that same map as in the original rgbAdjust sketch to set all I/O pins!

Something like this, basically:

Screen Shot 2010 10 01 at 01.41.11

(assuming that the map[] array has been set up properly)

No more complex loops. All we need to do is call this code really, really often. It won’t matter whether some interrupts occur once in a while, or whether some extra code is included to check for packet reception, for example. What might happen (in the worst case, and only very rarely) is that a pin gets turned on or off a few microseconds late. No big deal, and most importantly: no systematic errors!

It’s fairly easy to do some other work in between, as long as the main code gets called as often as possible:

Screen Shot 2010 10 01 at 01.51.18

I’ve applied this approach to an updated rgbRemote.pde sketch in the RF12 library, and sure enough, the dimming is very smooth for intensity levels 25..255. Below 25, there is some flickering – perhaps from the millis() timer? Furthermore, I’m back to being able to dim with full 24-bit accuracy, i.e. 8 bits on each of the RGB color controls. Which could be fairly important when finely adjusting the white balance!

So there you have it: simpler AND better! – all for the same price as before :)

Save data to a logfile with JeeMon

In Software on Oct 1, 2010 at 00:01

I’ve added a little demo application called “SaveToFile” to JeeMon which saves data from an attached JeeNode or JeeLink to file, complete with timestamps.

When started, you’ll need to specify which device you want to log (unless there is only one):

Screen Shot 2010 09 30 at 21.01.18

Click on one of the lines, and you’ll get this dialog on Mac OS X (Windows and Linux look a little different):

Screen Shot 2010 09 30 at 21.01.31

Once you click “Save”, the main window changes to show the last received line:

Screen Shot 2010 09 30 at 21.02.07

Here’s the contents of the log file at this point:

Screen Shot 2010 09 30 at 21.02.55

You can just leave it running to collect more data.

The following code implements all this and shows a bit of Tcl and Tk scripting in action:

Screen Shot 2010 09 30 at 21.22.40

This code is now included in the JeeMon on-line demos, so all you need to do is get JeeMon, launch it, and then click on “Run web-based demo”:

Screen Shot 2010 09 30 at 21.04.34

(make sure you don’t have a file called “application.tcl” next to JeeMon)

A window will open, just select the “SaveToFile” link, and it’ll be downloaded to the “JeeMon-demos” folder and then launched.

To run this code from the downloaded copy, launch JeeMon as:

    ./JeeMon JeeMon-demos/SaveToFile.zip

To view the code, unpack that ZIP file and you’ll get what is shown above. You can adapt it to your needs, of course. It’s all open source software!

Sending strings in packets

In Software on Sep 29, 2010 at 00:01

Prompted by a question on the forum, I thought it’d be a good idea to write an extra “PacketBuffer” class which makes it easy to fill a buffer with string data to send out over wireless.

This new packetBuf.pde sketch in the RF12 library shows how to set up a packet buffer, fill it with some string data, and send it to other nodes in the same net group:

Screen Shot 2010 09 27 at 11.23.09

If you want to use the PacketBuffer class in your own code, just copy the dozen lines or so from the above example. The code is very small, because all the heavy lifting is done by the standard Arduino “Print” class.

The code includes logic to report incoming packets, which are also assumed to always contain text strings. To try this example, you’ll need at least two nodes, of course. Here’s output from the two I tried this with:

Screen Shot 2010 09 27 at 11.01.55

And the other side (not at the same time):

Screen Shot 2010 09 27 at 11.02.42

But having made this demo, I have to add that in general it’s not such a good idea to send out everything in string form. It takes (much) more code and RAM to deal with strings, and the receiver cannot easily convert everything back to numeric values. Then again, if you just want to send out strings to report or process on the PC, then the above may come in handy.

Enjoy!

AA board testing

In Hardware on Sep 23, 2010 at 00:01

The latest AA Power board is one of those ideas where I can’t help but think… why didn’t I think of it before?

There’s a definite cost trade-off (that LTC3525 is expensive, and it’s only available in SMD so it has to be assembled), but I’m definitely going to switch my room nodes over to it – once that elusive how to enclose it? question gets answered, that is…

Anyway. I’ve been assembling dozens of those boards lately. Hand-soldering them, in fact – because the paste stencils aren’t ready yet. The soldering part is easy, I’ve got that down fairly well nowadays. I like soldering.

The issue is that each board must be tested. Recalls are tedious / inconvenient / annoying for everyone, and not just embarrassing for me, but totally robbing me of my pride in what I do and what I make.

Doh. Welcome to the real world.

Time to use those Pogo test pins. The nice part is that the AA Power board is ridiculously easy to test: apply 1.5V and watch 3.3V come out. Or, more loosely: connect it to an AAA battery, and watch it drive a green LED.

Here’s the AAA battery + green LED + Pogo pins part:

Dsc 1961

Note that there is no circuit on this board – no components, just the missing pieces.

To use it, I simply push the board under test against it, and voilá …

Dsc 1962

If it lights up green (see lower right corner), it works!

Great time saver. Caught about 1 in 10 (mistakes on first test), until every one of the first 50 boards was working.

Onwards!

More RF12 driver notes

In Software on Sep 15, 2010 at 00:01

Yesterday’s RF12 driver changes added a new feature to capture packets from different net groups, and to easily send out packets to different netgroups (don’t hold your breath for a “group broadcast” to every node out there).

As reported on the forum, it’s still not perfect due to the high number of false packet triggers.

One effect of this is that correct packets are missed occasionally. Here is the output from yesterday again:

OKG 15 4 248 48
OKG 15 4 250 48
OKG 5 61 7 23 83 97 7 0 155 79
OKG 15 4 251 48
OKG 15 4 252 48
OKG 15 4 7 49
OKG 15 4 255 48

Node 4 in net group 15 is a little test node which sends out a packet once a second (it’s the radipBlip.pde sketch, in fact). And as you can see, 3 packets already got lost in that brief time window.

My hunch is that the radio syncs on far too many bit patterns. This starts the radio off, collecting bytes, until either the proper packet length has been reached or the maximum of 68 bytes (hdr + length + data).

The problem is that a correct preamble and sync pattern right after such a false start will be interpreted as data. It would be possible to solve this in software, by passing a set of valid groups to the RF12 driver, and have it abort the reception immediately if something comes in. Then the driver can immediately restart a fresh sync search, thus hopefulyl capturing a subsequent real packet.

More work will be needed in this direction, but I wanted to get the change in so others can try it out and have a go at coming up with improvements for this code.

Two more small changes were added to the RF12 driver: a way to use it in “buffer-less transmit” mode, and a slightly more flexible way to wait for the completion of packet sending.

Buffer-less transmit mode is a way to avoid having to use extra memory for sending out packets. An example:

    if (rf12_canSend()) {
        struct { ... } *buf = (void*) rf12_buf;
        rf12_len = sizeof *buf;
        // fill in values here, i.e. "buf->temp = 123", etc.
        rf12_sendStart(0);
    }

So instead of passing an area of memory to copy into the rf12_buf area, as done with a traditional call to the 3-argument rf12_sendStart(), you set up a pointer directly into that area and store results directly into them.

The crucial point is that you can only do this between the time when rf12_canSend() returns 1 and a subsequent call to rf12_sendStart(). That is the only time when the data in rf12_buf is guaranteed not to get changed by the RF12 driver itself.

In many cases, you don’t need all this trickery. This is just a way to reduce the amount of (scarce) ram needed to work with packets and the RF12 driver.

The other change in the RF12 driver concerns the following call, which uses the optional “sync” parameter to specify how to wait for send-completion:

    rf12_sendStart(hdr, buf, len, sync);

This recent addition to the RF12 driver is useful for reducing power consumption, because it lets you send out a packet through the RFM12B while the ATmega itself enters a low power mode.

I have added a new function called rf12_sendWait() which now does that waiting side separately. The above call is still available for backward compatibility, but for future use, it should be written as two calls:

    rf12_sendStart(hdr, buf, len);
    rf12_sendWait(sync);

The reason for this change, is that this now also supports the buffer-less transmit mode described above:

    if (rf12_canSend()) {
        struct { ... } *buf = (void*) rf12_buf;
        rf12_len = sizeof *buf;
        // fill in values here, i.e. "buf->temp = 123", etc.
        rf12_sendStart(0);
        rf12_sendWait(2);
    }

Small changes, all in all. This will of course be documented on the cafe/docs site, but since that site is going to be replaced by a more advanced wiki-based setup in the near future, this post will have to do for now. I don’t want to deal with two sets of documentation… maintaining one set is hard enough!

New RF12 driver mode

In Software on Sep 14, 2010 at 00:01

The RF12 driver has been extended with the ability to send and receive packets from any net group. This feature was long overdue – it allows a node to listen to packets coming from different net groups, and could be used to implement a packet forwarding “relay” or a mesh network.

The way to activate this “any net-group” mode, is to initialize the RF12 driver with net group zero:

    rf12_initialize(12, RFM_868MHZ, 0);

This puts the RFM12B in a special 1-byte sync mode. The new rf12_grp variable will contain the net group whenever a packet is received, i.e. whenever rf12_recDone() returns true.

To send a packet to a specific netgroup, say 34, you can now use the following code:

    if (rf12_canSend()) {
        ...
        rf12_grp = 34;
        rf12_sendStart(...);
    }

The RF12demo.pde has been extended to support this “any net-group” mode – simply set the group as “0g”.

When use in this special mode, the format of the reported lines changes slightly: “OK” becomes “OKG” plus the net group number – followed by the header byte + data bytes, as usual.

Here is some sample output, first normal mode, then any net-group mode:

Current configuration:
 A i1* g5 @ 868 MHz 
OK 3 215 66 0 0
OK 61 7 23 83 97 7 0 155 79
OK 61 52 239 0 218 39
OK 61 9 2 8 68 161 75 15 0 16 0
OK 3 217 66 0 0
OK 61 52 240 0 222 39
> 0g
 ?G 45 25 37 209 155 76 99 138 66 247 140 29 251 47 157 163 51 158
 ?G 7 133
 ?G 144 133 255 220 119 254 229 249 92 225 94 213 221 102 160 1 233 59 251
 ?G 255 185 205 23 165 55 175 172 161 229 207 108 141 152 56 127 208 134
> 1q
OKG 15 4 248 48
OKG 15 4 250 48
OKG 5 61 7 23 83 97 7 0 155 79
OKG 15 4 251 48
OKG 15 4 252 48
OKG 15 4 7 49
OKG 15 4 255 48

There is however a serious problem with the promiscuous / any net-group mode: the RFM12B module will report a lot of garbage. I added a “quiet” option to RF12demo to be able to suppress the reporting of bad packets, i.e. dropping “?…” lines. That’s why no “?…” lines were reported once the “1q” command was entered.

As you can see in the output, packets from both net-groups 5 and 15 are picked up and reported. Yippie!

The new version of the RF12 library has been checked into subversion and is also available as a ZIP file, see the software page for details.

There is a bit more to say about these changes… so stay tuned!

Modular nodes

In Software on Sep 13, 2010 at 00:01

Ok, so we have JeeNodes and JeePlugs, and it’s now possible to sense and hook up all sorts of fun stuff. In theory, it’s all trivial to use and easy to integrate with what you already have, right? Well… in practice there’s a lot of duplication involved – literally, in fact: for my experiments, I often take an existing sketch, make a fresh copy and start tweaking it. Shudder…

  • New plug. New bit of code to include in the sketch for that plug. New sketch.

  • New device connected. New bit of code to talk to that device. New sketch.

  • New idea. New logic to implement that idea. New sketch.

Yawn. Some of this WSN stuff sure is starting to become real tedious…

There are a couple of ways to deal with this. The traditional way is to modularize the source code as much as possible: create a separate header and implementation source file for each new plug, device, sensor, and function which might need to be re-used at some point. Then all you have to do is create a new sketch (again!) and include the bits and pieces you want to use in this sketch.

I have a big problem with that. You end up with dozens – if not hundreds – of tiny little files, all with virtually no code in them, since most interfaces definitions and interface implementations are trivial. My problem is not strictly the number of little files, but the loss of overview, and the inability to re-factor such code collections across the board. It just becomes harder and harder to simplify common patterns, which only show after you’ve got a certain amount of code. The noise of the C/C++ programming itself starts to drown out the essence of all these (small & similar) bits of interface code.

The other serious problem with too fine-grained modularization of the source code, is that you end up with a dependency nightmare. Some of my sketches need the RF12 driver, other need the PortsI2C class, yet others use the MilliTimer.

At the opposite end of the spectrum is the copy-and-paste until you drop approach, whereby you take the code (i.e. sketches) you have, and make copies of it all, picking the pieces you want to re-use, and dropping everything else. I’ve been doing that a bit lately, because most of this code is so trivial, but it’s a recipe for disaster – not only do I end up with more and more slightly different versions of everything over time, it also becomes virtually impossible to manage bug fixes and fold them into all the affected sources.

A version control system such as subversion can help (I couldn’t live without it), but it just masks the underlying issues, really. Being that some parts of the code deal with the essence of the interface, and other parts exists just to make the code into a compilable unit.

There is another alternative: go all out with C++ and OO, i.e. create some class hierarchies and make lots of members virtual. So that slight variations of existing code can be implemented as derived classes in C++, with only a bit of code for the pieces which differ from the previous implementation. This is not very practical on embedded microcontrollers such as the ATmega, however. V-tables (the technique used internally in C++ to implement such abstractions) tend to eat up memory when used for elaborate class hierarchies, and worse still, much of that memory will have to be in RAM.

There is a solution for this too, BTW: C++ templates. But I fear that the introduction of template programming (and meta-programming) is going to make the code virtually impenetrable for everyone except hard-core and professional C++ programmers. Already, my use of C++ in sketches is scaring some people off, from what I hear…

Is there a way to deal with a growing variety of little interface code snippets, in such a way that we don’t have to bring in a huge amount of machinery? Is there some way to plug in the required code in the same way as JeePlugs can be plugged in and used? Can we somehow re-use bits and pieces without having to copy and paste sketches together all the time?

I think there is…

The approach I’d like to introduce here is “code generation”. This technique has been around for ages, and it has been used (and abused) in a wide range of tasks.

The idea is to define a notation (a related buzzword is “DSL“) which is better suited for the specific requirements of Physical Computing, Wireless Sensor Nodes, and Home Automation. And then to generate real C/C++/sketch code from a specification which uses this notation to describe the bits and pieces involved:

Screen Shot 2010 09 12 at 17.16.38

To create a sketch for a JeeNode with the Room Board on it and using say an EtherCard as interface to the outside world, one could write something like the following specification:

Screen Shot 2010 09 12 at 18.29.39

The key point to make here is that this is not really a new language. The code you add is the same code you’d write if you had to create the sketch from scratch. But the repetitive stuff is gone. In a way, this is copy-and-paste taken to extremes: it is automated to the point that you no longer have to think of it as copying: all the pieces are simply there for immediate re-use.

Problems will not be gone simply by switching to a code generator approach. There will still be dependencies involved, for example. The “RoomBoard” device might well need the MilliTimer class to function properly. But it is no longer part of the code you write. It doesn’t show up in the source file, there’s no #include line as there would be in C/C++ or in a sketch. Which means it also no longer matters at this level whether the RoomBoard driver uses a MilliTimer class or not.

Code generation in itself also doesn’t solve the issue of having lots of little snippets of code. But what you can do, is combine lots of them together one source file, and then have the generator pick what it needs each time it is used:

Define RoomBoard {
    ...
}
Define EtherCard {
    ...
}

The technique of code generation has many implications. For one, you have to go through one more step before the real code can be compiled and then uploaded – you have to first produce an acceptable sketch. And with mistakes, the system has to be able to point you to the error in the original specification file, not some very obscure C/C++ statement in the generated source code.

And of course it’s a whole new mechanism on top of what already exists. One more piece of the puzzle which has to be implemented and maintained (by someone – not necessarily you). And documented – because even if the specification files can reduce a large amount of boilerplate code to a single line, that one line still needs to be clearly documented.

So far, these notes are just a thought-experiment. I’ll no doubt keep on muddling along with new sketches and plugs and nodes for some time to come.

But wouldn’t it be nice if one day, something like this were a reality?

Sleepy class

In AVR, Software on Sep 4, 2010 at 00:01

To make it simpler to experiment with the very low-power states of the ATmega on the JeeNode, I’ve moved some code to the Ports library.

It’s all wrapped up into a new “Sleepy” class:

Screen Shot 2010 09 03 at 14.07.54

See the powerdown_demo.pde and radioBlip.pde sketches for examples of use.

This class is documented in the Café:

Screen Shot 2010 09 03 at 14.23.39

FWIW, I’m also evaluating the Redmine system as a way to bring the code repository, Café docs, issue tracker, and Talk forums all into one context:

Screen Shot 2010 09 03 at 14.24.50

That site is still experimental, so I’m not making it public yet. The one missing feature holding me back is that Redmine does not have a good spam prevention mechanism, such as Akismet. At least last time I looked. But all in all, this would be a great way to provide a place to describe projects, fill in the documentation, and track all code changes and bugs collaboratively. If you’d like to have a sneak preview, or want to have a place to describe your project, or perhaps would like to help with the fairly gigantic task of getting a good documentation section going, please email me.

I’ve started copying over some content, but it’s going to take a while before everything has been brought over and adjusted. Both the old and the new system use Markdown, but there are always them pesky little details…

Anyway, back to the topic at hand – enjoy the low-power options, and please consider sharing your explorations, findings, and inventions in the Talk discussion forums.

Sleep!

In AVR, Software on Sep 1, 2010 at 00:01

The “powerdown_demo.pde” sketch in this recent post draws 20 µA, which surprised me a bit…

A while back, I got it down to a fraction of that, by turning off the brown-out detector (BOD) through a fuse bit on the ATmega. That’s a little circuit which prevents the ATmega from coming out of reset and doing anything if the voltage is too low.

As it turns out, you can turn off the BOD in software, but only for sleep mode. The reasoning being that there’s no point in protecting the processor from going berserk while it’s not doing anything in the first place…

Well, my code to turn off the BOD was wrong. You have to do this right before going to sleep. Here’s the updated powerdown_demo.pde sketch:

Screen Shot 2010 08 28 at 12.50.18

(correction – I mixed up my bits: change “PINB |= …” to “PINB = …” or “PORTD ^= …”)

The result?

Dsc 1872

Now we’re cookin’ again!

Fractional bits?

In AVR, Hardware, Software on Aug 28, 2010 at 00:01

To continue this little weblog series on bits, I’m going to go into bit fractions.

Yeah, right… there is no such thing, of course – unless you’re into probabilities or fuzzy logic, perhaps.

What I actually want to do, is describe “analog output” on the ATmega, using the Arduino library’s analogWrite() function. And throw in some bit manipulations along the way, to stay somewhat on topic.

The little secret with analogWrite() is that it doesn’t do what its name suggests. The ATmega has no way of generating an analog signal, i.e. a voltage level between 0 and VCC.

Instead, a pulse is generated, with a varying duty cycle. I.e. the “on” and the “off” times of the pulse will be different, the ratio of these times being proprtional to the 0..255 value passed as argument to analogWrite(). With “0”, the signal will be 100% off, with “255” the signal will be 100% on. With “128” the pulse will be on the same amount of time as off. With “1”, it will be on very briefly, and then off most of the time, and so on…

The neat thing is that you connect it to an incandescent lamp, or a motor, then the effect will be that these will light/turn at less than their full power, due to the time it takes for these devices to try and follow the pulse. So the effect is similar to a fractional adjustment: you can dim / slow down these devices by using analogWrite().

It even works with LEDs, although these turn on and off very fast. In this case, the reason is that our eyes can’t follow such fast changes, and so we perceive the result as dimmed as well. A whole industry was once created around this “persistence of vision” property of our eyes – it’s called TV…

Here’s a sketch which uses this pulsed output to control the brightness of a LED connected to DIO3 (i.e. D6):

Screen Shot 2010 08 27 at 16.44.38

Note that I didn’t have to define pin 6 as an output, analogWrite() does that.

What the above does, is ramp up gradually from 0 to 255, and then repeat:

Screen Shot 2010 08 27 at 16.49.06

Suppose we want it to fade in and out instead:

Screen Shot 2010 08 27 at 16.49.17

Try implementing this yourself.

Note that you’re going to need at least 9 bits of information to do this: 8 for the brightness level and 1 to keep track of whether you’re currently in the up ramp or in the down ramp:

Here’s one way to do it, using some bit trickery:

Screen Shot 2010 08 27 at 17.05.15

A few notes:

  • I’ve changed the “level” variable from an 8-bit byte to a 16-bit word
  • bit 8 toggles from 0 to 1 and back every 256 level counts
  • it’ll be 1 when level is 256..511, 768..1023, etc
  • when it’s 1, we flip the bits, i.e. 0 becomes 255, 1 becomes 254, etc
  • the analogWrite() function ignores all upper bits

If you think that was an obscure call to analogWrite(), try this one:

    analogWrite(6, level ^ -((level >> 8) & 1));

Maybe you can decypher it when written slightly differently?

    analogWrite(6, level ^ -bitRead(level, 8));

(hint: bitRead() always returns either 0 or 1)

It’s all pretty geeky stuff, and let’s hope you’ll never have to deal with code such as this again, but the point of this story is that there’s no magic. You just have to know what each operator does, and how to translate an integer from decimal to binary notation and back.

I’ll summarize my intuitive interpretation of bit operators below:

  • X | Y” = take X and copy all the 1’s of Y into it
  • X & Y” = take X and copy all the 0’s of Y into it
  • X ^ Y” = take X and flip all the bits where Y has 1’s
  • ~ X” = flip all the bits of X
  • – X” = arithmetic minus (same as “(~X) + 1” !)
  • ! X” = 1 if X is zero, 0 otherwise
  • X << N” = multiply X by 2, N times
  • X >> N” = divide X by 2, N times

Some tricks based on this:

  • ~ 0” = all bits set to 1 (same as “-1” !)
  • ~ 0 << N” = all bits 1, but N lowest bits set to 0
  • bit(N) – 1” = a constant with N lowest bits set to 1
  • X & (bit(N) – 1)” = the N lowest bits of X, the rest is 0
  • X & ~ (bit(N) – 1)” = X, but with the N lowest bits set to 0
  • !! X” = 0 if X is zero, 1 otherwise

An useful rule when writing logical expressions is: when in doubt, parenthesize! – see C operator precedence.

Sooo… use bit(), bitRead(), bitWrite(), bitSet(), and bitClear() wherever you can, since it usually makes the code easier to read. But there’s no need to get lost if you see ^&|~!’s in your expression – just slow down and decode such expressions step by step!

Flippin’ bits

In AVR, Hardware, Software on Aug 27, 2010 at 00:01

After yesterday’s post about setting and clearing bits, let’s explore reversing bits, i.e. changing them from 0 to 1 and back. And let’s do it by blinking an LED attached to DIO of port 1 – i.e. Arduino digital pin 4:

Screen Shot 2010 08 26 at 10.31.26

The “if (onOff = 0)” etc is the logic that toggles onOff between 0 and 1 on each pass through the loop:

    if (onOff == 0) onOff = 1; else onOff = 0;

But there are lots of ways to do the same thing, coded differently:

    if (onOff == 0) onOff = 1; else onOff = 0;
    if (onOff == 0) onOff = bit(0); else onOff = 0;
    if (onOff == 0) bitSet(onOff, 0); else bitClear(onOff, 0);
    onOff = onOff ? 0 : 1;
    onOff = (~ onOff) & 1;
    onOff = (onOff + 1) & 1;
    onOff = ! onOff;
    onOff = 1 - onOff;
    onOff = onOff ^ 1;
    onOff ^= 1;

See if you can figure out all of these.

Take your pick. Those last two use C’s XOR operator. I tend to prefer shorter source code, so I’d use that last notation (note that the resulting compiled code is not necessarily shorter than the other examples).

Now suppose you have a byte value “X” and you want to flip the 4th bit in it, while not changing anything else. That’s a bit more work. It could be done like this, for example:

    if (bitRead(X, 4) == 0) bitSet(X, 4); else bitClear(X, 4);

Or like either of these:

    X = X ^ bit(4);
    X ^= bit(4);

This shows clearly that the “^” XOR operator does exactly what we need: flip bits.

Back to blinking an actual LED, as done with the above sketch. Here’s a little mind bender – another sketch, doing the same using raw ports and the XOR operator:

Screen Shot 2010 08 26 at 10.58.10

The first example was doing things “the Arduino way”, using pinMode() and digitalWrite(). It compiles to 890 bytes of code. This second example goes straight to the hardware and uses 554 bytes of code:

  • Arduino digital pin 4 is bit 4 on the “D port” of an ATmega
  • “DDRD” is the “Data Direction Register”, where we set up pin 4 as an output
  • “PORTD” is the out “Port Register”, which controls the actual output signal

You can see the XOR in action in that last example. It takes all the output bits of port D (Arduino pins 0 .. 7), and flips just a single bit, i.e. bit 4.

Just for kicks, I’ll show you one more way to blink the LED:

Screen Shot 2010 08 26 at 11.03.44

This uses a relatively little-known feature of the hardware, which actually has “bit flipping” built-in. The “PIND” register is normally used for input, i.e. for reading the state of a pin as an input signal. But you can also write to that register. When you do, it will be used to flip output pins, but only for the bits which were set to 1. It’s essentially a built-in XOR.

That last example uses 550 bytes of code, most of which is overhead from the Arduino run-time library (setting up the milliseconds timer, etc). So what’s in a measly 4 bytes, right? Wrong. There is a minute, but sometimes important difference: the other approaches all had to read the register value first, flip the bit, and then write the value back. This last version only writes a (constant) value to a register. With interrupts, that can be very important: this last version can’t ever go wrong, it will always flip the requested bit. The other version could have an interrupt occur between the read and the write. It’s a known issue for the Arduino Mega. It can lead to code which runs for a week, and then fails mysteriously. Bugs like these are fiendishly hard to properly diagnose.

Bit-flipping can be quite useful for physical computing. Not only does it let you easily toggle specific bits, and change the state of some output pins, it can also be a way to clear a bit. Let’s say you need to generate a (very) quick pulse. Here are four ways to accomplish the same thing:

    bitSet(PORTD, 4); bitClear(PORTD, 4);
    PORTD |= bit(4); PORTD ^= bit(4);
    PORTD |= bit(4); PIND = bit(4);
    PIND = bit(4); PIND = bit(4);

That second one based on XOR works, because bit 4 is known to be one, so setting it to zero is always the same as flipping it. That’s also why the third PORTD/PIND example works, with PIND doing the XOR in hardware. Lastly, the fourth approach will only work if bit 4 was initially zero. It’s the fastest one, and does not suffer from the interrupt race condition mentioned above.

Ok, that’s enough flippin’ for one day!

Tomorrow, I’m going to go into, ehm… “fractional bits” (haha!) ;)

Update – see comment below on why “bitSet(PORTD, 4); bitClear(PORTD, 4);” are also interrupt-safe (mostly – but not on every pin of an Arduino Mega!).

Bit manipulation

In AVR, Software on Aug 26, 2010 at 00:01

Today I’d like to go into bit manipulation, ehm, a bit

You need bit manipulation when you’re dealing with the individual bits in a byte, such as on the I/O ports of an ATmega, for example.

First the easy approach – use these predefined macros from the Arduino library:

  • bit(N) returns an integer with the N’th bit set to 1
  • bitRead(X,N) – returns the N-th bit of X as 0 or 1
  • bitWrite(X,N,B) – sets Nth bit of X to B (0 or 1)
  • bitSet(X,N) – sets the Nth bit of X to 1
  • bitClear(X,N) – sets the Nth bit of X to 0

This is why you might see code such as the following:

    bitSet(WDTCSR, WDIE);

This means: “set the Watchdog Interrupt Enable to 1 in the Watchdog Timer Control Register”. The WDTCSR and WDIE terms are predefined constants. WDIE is 6, for example.

Note that some of these routines can be written in terms of the others, i.e.

  • bitSet(X,N) is the same as bitWrite(X,N,1)
  • bitClear(X,N) is the same as bitWrite(X,N,0)

But what does it all mean?

Well, let’s dive in. First make sure that you are comfortable with “bit shifting”. The expression “bit(3)” is the same as “1 << 3”, which in turn is the same as doubling the value 1 three times, i.e. the value eight. So “bit(0)” is 1 doubled zero times (i.e. 1) and “bit(7)” is 1 doubled 7 times, i.e. 128. Bytes have 8 bits numbered 0 to 7, so all you need for (byte-sized) hardware registers is to remember that bits 0..7 map to (specific!) integers with values 1 to 128.

Setting a bit, is like OR-ing the bit with the rest of the value. The following statements are all identical:

    WDTCSR = WDTCSR | bit(WDIE);
    WDTCSR = WDTCSR | (1 << WDIE);
    WDTCSR = WDTCSR | (1 << 6);
    WDTCSR = WDTCSR | 0b1000000;
    WDTCSR = WDTCSR | 0x40;
    WDTCSR = WDTCSR | 64;

This, in turn, can be abbreviated in C as:

    WDTCSR |= bit(WDIE);
    WDTCSR |= (1 << WDIE);
    WDTCSR |= (1 << 6);
    etc...

Or you could write:

    bitSet(WDTCSR, WDIE);

It’s all the same. So OR-ing is about setting bits (to 1).

Likewise, AND-ing is about not clearing bits (to 0). Whoa, that’s confusing. This expression returns a value which is what X was, but only for bit N:

    X & bit(N);

So this will change X to a value with all bits except bit N set to zero:

    X = X & bit(N);

To put it differently: X will lose its original bits, except bit N, which will be left alone. All the bits are set to zero, except bit N.

Usually, you want the opposite, setting only bit N to zero. That too is accomplished with AND-ing, but you have to flip all the 0’s to 1 and all the 1’s to 0 first. Hang in there, it’s a slightly longer story. This sets bit N to zero:

    X = X & ~ bit(N);

Let’s examine what’s going on here. First, “bit(N)” is a value with only the Nth bit set. Now, “~ bit(N)” is a value with all the bits flipped around (“~” is called the complement operator in C), so that’s a value with all but the Nth bit set. Everything is 1, but bit N is 0.

Now we can tackle the expression “X & ~ bit(N)”. Since AND-ing is about “not clearing bits”, that means that the result of this expression is all the bits of X unchanged where “~ bit(N)” was one, which is almost everywhere. The only bit that differs is bit N – it is zero in “~ bit(N)”, therefore that particular bit will “not not clear …” (a double negation!): it will be cleared (to 0) in the result.

Finally, we replace X by that result. So X will change in precisely one bit: bit N. That bit will be cleared to zero, the rest is not affected. In short: we’ve cleared bit N.

Confused?

Well, that’s why the bit/bitSet/etc macro definitions were introduced. These expressions are all identical:

    X = X & ~ bit(N);
    X = X & ~ (1 << N);
    if (X & bit(N)) X = X - bit(N);
    bitClear(X, N);

That last one is clearest by far, because it conveys the actual operation with a well-chosen name: clear bit N, leave the rest alone.

So why would anyone ever choose to use anything but the bit/bitRead/etc routines?

Many reasons. Habit, perhaps. Coming from another environment which doesn’t have these macros. Being so used to this bit-manipulation that the use of words doesn’t really look any clearer. Whatever…

But another more important reason is that you can’t do everything with these bit/bitSet/bitClear routines. Sometimes you just have to go to the raw code. Such as when you need to set multiple bits at once, or flip bits. That’s why the ATmega datasheet has examples like these:

    WDTCSR |= (1<<WDCE) | (1<<WDE);

By now, you should be able to decode such a statement. It’s the same as:

    WDTCSR |= bit(WDCE) | bit(WDE);

In other words:

    WDTCSR = WDTCSR | bit(WDCE) | bit(WDE);

Which in turn is almost the same as these two statements together:

    WDTCSR |= bit(WDCE);
    WDTCSR |= bit(WDE);

That in turn, can be written as these two lines:

    bitSet(WDTCSR, WDCE);
    bitSet(WDTCSR, WDE);

Much clearer, right? Except… it’s not 100% identical!

The problem is a hardware issue: timing. The above two statements will set both bits, but not at the same time! For hardware register settings, that difference can be important. There is a fraction of a microsecond between the WDCE bit being set and the WDE bit being set. Unfortunately, in some cases that causes real problems – your code won’t work as expected.

Tomorrow, I’ll continue on this topic, but it’ll be a bit more fun, because there will be LEDs involved!

Dsc 1385 2

(Please ignore the cable on the left, I snatched the above picture from this post)

Update – see this excellent Wikipedia article for more details about bitwise operations.

Simplified button interface

In AVR, Software on Aug 24, 2010 at 00:01

The Blink Plug has two pushbuttons and two LEDs. The buttons are simple miniature switches, but nothing is ever simple in microcontroller-land: reading out the state of a pushbutton reliably can be deceptively hard, due to mechanical bounce issues.

The Ports library has had a BlinkPlug class for some time now, including a “pushed()” function to do all the debouncing. Unfortunately, that function turned out to be a bit harder to use than I originally intended.

Time to add some more code to the BlinkPlug class!

I’ve added a new “buttonCheck()” member, which returns events instead of state. That makes it a lot easier to detect when people press a button, which is usually all you’re after anyway.

Here’s a new button_demo.pde example sketch, which illustrates the new functionality:

Screen Shot 2010 08 23 at 17.44.21

Sample output:

Screen Shot 2010 08 23 at 17.44.39

As you can see, it’s now a lot simpler to detect when people press or release one of the two buttons on a Blink Plug. Each time you call buttonCheck(), you’ll get one of the following events:

ALL_OFF, ON1, OFF1, ON2, OFF2, SOME_ON.

You have to keep calling “buttonCheck()” reasonably often, at least 10 times per second, if you don’t want to miss any events. Calling it all the time in the main loop is fine. Keep in mind that ON1, etc. will be returned only once for each actual button press.

You can still call “state()” whenever you want, to check the position of either button. But when you use buttonCheck(), you should not call the old – now deprecated – “pushed()” function, as these two will interfere with each other.

This code is now part of the Ports library (subversion and ZIP). Gory details are in Ports.cpp, near line 230.

Serial communication vs packets

In Hardware, Software on Jul 12, 2010 at 00:01

When you hook two devices up via wires, you’ve got essentially two options: parallel, i.e. one wire for each bit you want to transmit & receive (example: memory cards inside a PC). Or serial, where information gets sent across bit by bit over only a few wires (examples: ethernet, USB, I2C). Parallel can achieve very high speeds with little circuitry, but serial is more convenient and cheaper for large distances.

Serial communication is very common. The model even carries through to the way we think about the “command line” – a stream of characters typed in, followed by a stream of output characters. Not surprising, since terminals used to be connected via RS232 serial links.

Wireless connections are also essentially serial: you rapidly turn a transmitter on and off (OOK), or you change its frequency of operation (FSK), to get the individual bits across.

But there’s a lot more to it than that.

With two devices connected together, you get a peer-to-peer setup with a link which is dedicated for them. This means they can send whenever they please and things will work. The same can be done with wireless: as long as only two devices are involved, one device can send whenever it likes and the other will receive the signal just fine (within a certain range, evidently).

With such a peer-to-peer setup, the serial nature of the communication channel is obvious: A sends some characters, and B will receive them, in the same order and (almost) at the same time.

But what if you’ve got more than two devices? Ah, now it gets interesting…

With wires, you could do this:

Screen Shot 2010 07 11 at 11.20.41

It’s easy to set up, but it’s pretty expensive: lots of wires all over the place (N x (N-1) / 2 for N devices) plus lots of interfaces on each device (N-1). With 10 devices, that would be 45 wires and 90 interfaces!

Worse still, this is very hard to use with wireless, where each “wire” would need to be a dedicated frequency band.

The solution is to share a single wire – called multi-drop:

Screen Shot 2010 07 11 at 11.24.58

Now there’s one wire, a couple of “taps”, and one interface per device. Much cheaper!

Trouble is, you’ve now created a “channel” which is no longer dedicated to each device (or “node” as it is usually called in such a context). They can’t just talk whenever they like anymore!

Whole new slew of issue now. How do you find out when the channel is available? What do you do when you can’t send something right away – save it up? How long? How much can you save up? What if someone else hijacked the channel and never stops transmitting? What if all nodes want to send more than the channel can handle? How do you get your information out to a specific node? Can all nodes listen to everything?

Welcome to the world of networking.

All of a sudden, simple one-on-one exchanges become quite complex. You’ll need more software to play nice on the channel. All nodes need the same software revision. And you’ve got to deal with being told “not now”.

Note that these issues apply to wired solution sharing the same channel (RS485, Canbus, USB, Ethernet) as well as all wireless networks.

Simple OOK transmitters used in weather station sensors just ignore the issue. They send whenever they want to, in an après moi le déluge fashion… (“what the heck, I don’t care whether my message arrives”). This usually works fairly well when transmissions are short, and when lost transmissions are no big deal – they’ll send out a new reading a few minutes later anyway.

Another aspect of this shotgun approach is that it’s a broadcast mechanism. The sending node transmits its messages into the air without interest as to who receives them, or whether there’s anyone listening even. All it needs to do is include a unique code, so that the receiver(s) will be able to tell who sent the message.

For weather sensors, the above is ok. For security / alarm purposes, it’s a bit unfortunate – missing an intrusion alert is not so great. So what the simplest systems do is to yell a bit louder: repeat the alert message many times, in the hope that at least one will arrive intact. No guarantees, yet some very common security systems seem to be happy with that level of reliability.

For more robust setups, you really need bi-directional communication, even if the payload only flows in one direction. Then each receiver can let the transmitter know when it got a packet intact.

There’s a lot more (software) complexity involved to use a channel effectively, to get data across reliably with “ACK” packets, to detect new and lost nodes, to deal with “congestion” and external causes of bad reception, etc.

With JeeNodes and wireless comms via the RFM12B module, the basic RF12 driver is somewhere in the middle between unchecked uni-directional transmission and fully checked self-adapting configurations.

So what does this all mean for the “end user” ?

Well, first of all: wireless communication can fail. A node can be out of range, or a badly-behaved machine can be sending out RF interference to such an extent that nothing gets across no matter what nodes do. Wireless communication can fail, it’s as simple as that! But with bi-directional communication, at least all nodes can find out whether things work or not.

The second key property of communication via a shared channel, is that you can’t just send whenever you like. You have to be able to either save things up until later, or discard messages to let future ones through.

This means that treating a wireless channel as a serial link is really a very bad idea. Keep in mind that the baudrate can drop to zero – this means that you must be prepared to save up infinitely much data for re-transmission. And the more you intend to re-transmit later, the longer you’re going to have to need that channel when it becomes available. That will frustrate all the other nodes trying to do the same thing.

One way around this, is to use a RF link with very high data rates. That way there will be a lot of slack when nodes want to catch up. But then you still need to be able to buffer all that data in the first place. Not a great idea for limited devices such as an ATmega…

The better way is to design the system to work well with occasional loss of packets. Take an energy meter, for example: don’t sent the pulse or rotation trigger, but keep a count and send the current count value. That way, lost packets will not affect the accuracy of the results, they will merely be updated less frequently when the RF link is down.

The RF12 driver used in JeeNodes was designed for the context of a little data, sent on a periodic basis. The difference with a serial link, is that you don’t get garbled text on the other side, but packets (i.e. chunks of data). All you need to keep in mind is that occasionally an entire packet won’t make it.

This design also deals with multiple nodes. Each incoming packet can have a “node ID” so receivers can tell everything apart. Packets never get mixed up or combined or split in any way. Each packet is a verified and consistent amount of data.

Couldn’t we implement a virtual serial link?

Well, yes – there are well-known techniques to implement a virtual circuit on top of a packet-based communication channel.

But doing so would be a bad idea, for reasons which have hopefully become clear from the above. A virtual circuit would either have to act as perfect channel (not feasible with finite data storage) or drop characters in unpredictable places. It is far more practical to impose a packet / chunk structure on the sender, and then be allowed to drop chunks with clearly-defined boundaries when the RF link is out of service or overloaded.

The moral of the story: think in packets when using JeeNode wireless comms – you’ll get a lot more done!

Update – see some good comments by John M below, about IP, UDP, TCP, and the OSI model which describes all the levels of abstraction involved with networking, and all the standard terminology.

RF12 communication

In Hardware on Jul 11, 2010 at 00:01

(This weblog post seems to accidentally have escaped into the wild a few days early…)

The RFM12(B) wireless radio modules from HopeRF, as used on the JeeNode, uses “Frequency Shift Keying” (FSK) as the way to get information across a wireless channel on the 433, 868, or 915 MHz band.

With wireless, there’s always a trade-off between speed and range. More speed, i.e. a higher data rate, lets you either get more data across in the same time or the same amount of data in less time, which might reduce battery consumption. But higher data rates require a larger frequency shift in the transmitter for the receiver to still be able to detect all the bit transitions reliably. A larger frequency shift wastes more power though (I think…). And a larger frequency shift means that the receiver has to accept more bandwidth to catch all the signal details.

Btw, another way to extend the range is to improve the antennas – see this discussion.

I’ll leave the narrow-band vs. wide-band details to the EE’s and amateur radio experts in this world, along with all the RF / HF calculations, because frankly I’m at the limit of my knowledge on these topics.

But what the above comes down to is that we’ve got essentially three parameters to fool around with when using RFM12B’s for wireless networking:

  • the data rate, which needs to be identical on both sides
  • the frequency shift on the transmitter side
  • the bandwidth on the receiver to filter out unwanted signals

Here are the relevant sections from the HopeRF RF12B datsheet:

Data rate

Screen Shot 2010 07 08 at 00.58.29

Frequency shift

Screen Shot 2010 07 08 at 01.00.43

Bandwidth

Screen Shot 2010 07 08 at 01.01.15

Screen Shot 2010 07 08 at 01.01.37

The challenge is to find “good” settings, which really depends on what you’re after. The settings used in the RF12 v3 driver are as follows:

  • Data rate = 49.142 KHz (see this discussion)
  • Frequency shift is set to 90 Khz
  • Bandwidth is set to 134 KHz

This was chosen partly on what I found around the web, and partly by pushing the data rate up while verifying that the range in the home would be sufficient for my own purposes (i.e. to reach the office from the electricty meter: a few concrete walls and floors).

It can probably be improved, but since such changes affect all the node in a net group, I never bothered after the initial tests were “good enough”.

The RF12 library now includes a new rf12_control() function, which allows making changes to these parameters. It’s a low-level option, but you could easily add some wrappers and an API to adjust parameters in a more intuitive way.

As mentioned in the forum, there’s a (Windows-only) tool to do the conversion to hex parameter settings. That ought to make it fairly easy to tweak these things, if you want to push the envelope.

Some people will no doubt be quite interested in such optimizations, so if you’ve found an interesting new combination of parameters, please consider sharing your findings in this forum discussion.

Having said all this, please keep in mind that these settings will still lead to fairly low data rates. The default data rate corresponds to ≈ 6 Kbytes/second of one-way data, assuming perfect conditions and 100% utilization (“hogging”) of the frequency band. With the official ISM rules imposed on the 868 MHz frequency band in Europe, each node is allowed to use only 1% of that rate – i.e. about 60 bytes per second of throughput! (there are no such restrictions @ 433 and 915 MHz – but 915 is not allowed in Europe).

There are alternate bands in the 860’ish MHz range, but I’ve never quite figured out what works where, so for now I’m sticking to this simple 1% rule. For day-to-day sensing and signaling purposes around the house, it’s actually plenty.

To put things into perspective: the IEEE 802.15.4 standard used by XBee’s has up to 16 channels of 250 Kbaud each at its disposal, when operated at 2.4 GHz. It’s a whole different ball game. And a different range: 2.4 GHz gets absorbed far more by walls than the sub-GHz bands (which is why mesh networking becomes a lot more important, which requires more resources, making it harder to keep overall battery consumption low, etc).

So you see: speed, range, complexity, cost – lots of tradeoffs, as with everything in this world!

Update – just got an email with a lot more info about the 868 MHz regulations (for the Netherlands, but I assume it’ll be the same across Europe). Looks like cordless headphones get 40 channels to pick from with 100% utilization (in the 864..868 MHz range), so you could pick one of those channels to avoid the 1% rule.

Assembling the JeeSMD, part 2

In AVR, Hardware on Jul 9, 2010 at 00:01

Yesterday’s post was about assembling all the SMD components of the JeeSMD kit.

The last step is to program a sketch into the ATmega. This isn’t as straightforwards as with a JeeNode, because there’s no on-board FTDI or USB serial port hookup.

It’s fairly easy to create an FTDI connection, but even if you do, you’ll still need an ISP programmer to install a boot loader (see this recent post for some background).

So let’s hook up an ISP programmer first:

Dsc 1787

I’m using a somewhat weird setup: first of all, my cable connector was attached the wrong way around, so I always have to use this one in that weird folded-over position.

But a more important issue is that the ISP connection needs to use pins 1..6 of the 2×4-pin SPI/ISP connector on the JeeSMD. That doesn’t work with normal flat cable connectors, which assume 2×3 pins and are too wide to fit in a 2×4-pin header. My solution is to insert wire-wrap pins the wrong way around into the cable header. This effectively extends the connector, but now it won’t be as wide and it’ll fit just fine. Another solution would be to only solder 2×3 pins in the SPI/ISP position – you can always add two more pins later.

Once you’ve passed that hurdle, you can use any ISP programmer you like. There have been several posts about this on the weblog, as listed here.

Now, if you want to use FTDI, then presumably you just uploaded a bootloader into the ATmega, with all the proper fuse settings, etc. The next step then, is to somehow connect to a 6-pin FDTI header.

There are several ways to do this. The one I use nowadays, is through a Carrier Board, which includes the 6-pin FTDI connector:

Dsc 1786

The point about the FTDI connector, is that it’s almost trivial. All you need is 4 wires to GND, PWR, TX, and RX – plus a way to reset the board from the RTS signal. The clever way to generate a reset is to insert a 0.1 µF capacitor between the serial side RTS and the ATmega’s reset pin. Tiny trick, huge implications (does the name “Arduino” ring a bell?).

So how does the Carrier Board implement FTDI? Easy: it adds the capacitor. And you can easily do that yourself without a Carrier Board. Here’s how:

Screen Shot 2010 07 08 at 23.20.52

Note that what FTDI calls “RX” is connected to what the ATmega calls “TXD”, and vice versa. It’s all a matter of perspective… Once you have the FTDI connection set up, you can upload sketches using the Arduino IDE just as with any other board. All you need is a USB-BUB or some other equivalent USB-to-FTDI interface.

Congratulations: that’s all it takes to build and use the Arduino-compatible JeeSMD!

Assembling the JeeSMD

In AVR, Hardware on Jul 8, 2010 at 00:01

The JeeSMD is a kit with tiny “Surface Mounted Device” (SMD) components. SMD was designed for automated assembly with Pick & Place machines, but with a bit of patience it’s fairly easy to assemble a board by hand – see this post for an overview of what you will need.

You won’t be able to do this without at least a fine-tipped (0.4..0.6mm) soldering iron plus the following tools:

Dsc 1773

A magnifier lamp also helps, I know I couldn’t do this without one anymore!

This is a step-by-step guide on how to assemble the SMD kit, which consists of these parts:

Dsc 1772

The tiny ones (don’t sneeze!) are hardest to tell apart:

Jee smd Closeup

(thanks to Steve E for the macro shot – I just added some labels)

There are 2 10 kΩ resistors in there, although you only need one. That lets you get started without having to worry too much – if you mess up completely, just remove it and start over with the other one.

For a fantastic resource with detailed videos about hand-soldering SMD, see this Curious Inventor page.

So let’s get started. First thing to do is to apply flux wherever you’re going to solder things. The flux is essential because the flux in your solder wire will have evaporated longe before you can move your tip from the wire to the part.

I’m right-handed, so that’s how I hold my soldering iron. That leaves only my left hand for the tweezers – and no hand at all for the soldering wire:

Dsc 1774

Use your tweezers for all these parts, and don’t let go – once a 0603 part flies off or drops on the floor, your chances of finding it again are next to zero. Best is to work on a clean flat surface with everything around you removed.

The trick is to place the part and then push it down while you touch it with your iron with solder already on it. The moment a part is soldered down on at least one pin, it becomes a lot easier:

Dsc 1775

The matte “stain” you see around these pads is the flux, which has dried up but is still active.

(Remainder continued after the break…)

Read the rest of this entry »

Uploading without avrdude

In AVR, Software on Jul 1, 2010 at 00:01

In the future, I’d like to upload new firmware to JeeNodes, Arduino’s, and other AVR boards through channels other than a serial port or USB. Uploading to a “slave plug” via I2C would be neat, for example.

That means the standard avrdude won’t do. Besides, after having coded various types of ISP sketches recently, I realized that the upload mechanism is really quite simple. If all you need is STK500 compatibility (as used by several ISP programmers and by the Arduino boot loader itself), then avrdude is overkill.

So here’s a demo “rig” for JeeMon which does the same thing as avrdude, i.e. upload a sketch over a serial port:

Screen Shot 2010 06 27 at 23.19.45

That removes the need to compile and install avrdude. Better still, this should work as is on every platform supported by JeeMon.

(Note: the above code is now part of JeeMon, but the source code can also be found here on the web)

Onwards!

Update – the above code has been integrated into JeeMon as new Upload rig – with dudeLoader renamed to “stk500” and readIntelHex now called “readHexFile”. Here’s a new demo “application.tcl” using this:

Screen Shot 2010 06 28 at 01.46.48

Now works with the Arduino boot loader as well as with the Flash Board ISP programmer (add “19200” arg).

(Reminder: the Jee Labs shop will be closed from July 14th through August 14th)

RFM12B as spectrum analyzer

In Software on Jun 27, 2010 at 00:01

Intrigued by a very interesting post by “loomi” on the discussion forum, I wanted to find out more…

What he did was use the RFM12B as a crude spectrum analyzer: sweep the frequency across its entire range, and use the RSSI threshold and status bit to find out whether there is any signal on that frequency.

First of all, I added an rf12_control() function to the RF12 library, to allow access to the low-level registers of the RFM12B. This allows changing the frequency and RSSI settings, and reading out the RSSI status bit.

On the software side, a sketch is needed for the JeeNode which does the sweeping and measuring:

Screen Shot 2010 06 21 at 00.26.21

This reports one line of 476 digits 0..6 for each sweep. Each digit in this line of text represents the measured signal strength at that particular frequency.

The RSSI readout is very crude. All you can do is set a threshold and read out a single bit in the status register, telling you whether the signal is above or below the threshold. Tedious, but doable.

The fun part starts once you get to plotting this as a graph. I used JeeMon and wrote a Tcl/Tk script for it:

Screen Shot 2010 06 21 at 00.30.15

And here’s some sample output:

Screen Shot 2010 06 21 at 00.21.30

The green line is the default frequency setting of the RF12 driver.

Each sweep will add a set of black dots to the graph, but since the values are being accumulated in a “counts” array, the dots will creep higher and higher, drawing a bar as more sweeps come in. Using the current delays, one sweep takes a few seconds, so to get the above graph I had to keep JeeMon running for a few minutes.

When left running for some 15 minutes, the large bars will move out of range, but the smaller accumulated counts will now become clearer:

Screen Shot 2010 06 21 at 00.30.53

I haven’t figured out whether these values are valid, nor what could be causing all this RF activity. The peaks are fairly sharp though, so it would seem like a reasonable set of measurements.

Remote RGB strip control

In Software on Jun 15, 2010 at 00:01

To continue this series on driving RGB strips with a JeeNode, here is a sketch which allows setting the brightness levels using PWM. It’s a bit long for a weblog post, but I thought it’d be useful, since there is quite a bit of trickery in here. Notes follow below:

Screen Shot 2010 06 11 at 23.17.29

This code does some hefty bit manipulation. The idea is to keep an array of 256 time “slots”, with bits to indicate when an I/O pin should be toggled on or off. The loop then continuously scans these slots at the rate of ≈ 32 microseconds per slot (this corresponds to roughly 120 Hz). The 8 bits in each slot map to the JeeNode’s I/O pins: bits 0..3 = AIO1 .. AIO4 and 4..7 = DIO1 .. DIO4.

A refinement is that only the first 255 of the 256 slots are scanned. This way, 0 is 100% off and 255 is 100% on.

The static “masks” array defines which setting gets mapped to which I/O pin. It depends on the way the output driver is connected to the JeeNode.

The main PWM timing loop is done fully in software. It will run slightly irregularly due to timer interrupts and RF12 driver interrupts, but the effects aren’t noticeable.

The RGBW values are stored in EEPROM, so that the LEDs come back on with the same settings after power cycling. The settings can be adjusted by sending a packet with the new values to node 30, group 5 @ 868 MHz.

This sketch could be extended to support “animations”, i.e. ramping up/down to specific levels, mood lights, etc – I’m not interested in that, I just want to be able to trim the color of my room lighting to a pleasant level of white:

Dsc 1723

All I did to adjust the strip on the right was send the following “RGBW” command out via a JeeLink:

255,140,40,120,30s

The above sketch does PWM, but this whole thing is still being turned on and off with an old-fashioned mechanical switch … if you still remember those :)

FWIW, I’m considering making a dual-channel “MOSFET Plug” – two of those could then be used to replicate this same setup for the LED strip on the left.

Tomorrow: a little GUI front end for all this.

A Happy Ending!

In AVR, Hardware, Software on Jun 10, 2010 at 00:01

The multi-ISP programmer I built and started using some two weeks ago, turned out to be quite a nightmare. Not only were incorrectly programmed ATmegas sent out to about two dozen people, I also had to go through about 70 kits, prepared as new stock just days after I started using this programmer. Twice!

Yes, twice. Because the first “fix” turned out to be insufficient. Doh.

This was a clear case of one bug hiding another, and another, and another. Yep, four bugs: a bug in the MemoryStream class in the Ports library, a timing problem exposed by fixing that bug, and two incorrect assumptions about how the “avrdude” utility works. I’ve now got an explanation for everything that went wrong.

There’s no doubt some interesting psychology at work here … I was so proud of my idea op programming multiple ATmega’s! The main idea was to create an AVRISP-compatible unit which stores everything sent through it and then just replays the saved stream as often as needed. Trouble is, I jumped to conclusions the minute a first “run” worked. Roll the presses! Print it! Tell it to the world!

Anyway. There is a happy ending to all this!

The latest version of the isp_capture.pde sketch in the Ports library has been working properly for over a week now, programming well over a hundred ATmega’s (and it now does auto shut-down a few minutes after use):

Isp Capture Output

The last bug was a very puzzling one: everything worked, but sometimes the fuses wouldn’t get programmed. It turns out that avrdude first reads the fuses, and only sends out commands to program them if the fuses don’t match the new value. Since the programmer needs to work with brand-new as well as previously-programmed chips, the replay mechanism would have depended on the prior state of the chips: not good.

The solution is very simple: I now always program each fuse twice, with two different (valid) values. The second one will remain in force, evidently. Since the replay code was already ignoring fuse mismatch checks, this now means that even if the first setting is skipped by avrdude, the second will always be emitted.

Here is the shell script to prepare the Flash Boards:

Screen Shot 2010 05 30 at 01.35.40

So this has now become a very useful tool at Jee Labs:

Dsc 1432

I love the on-board LiPo battery: I can grab it, use it where I need it, and put it back – no wires = freedom!

For pre-loading the fuses, boot loader, and RF12demo, it already saved me a huge amount of time. Its “burn rate” is up to 500 chips/hour. And Mr. Murphy taught me some valuable lessons along the way…

And now it’s time to move on!

SMD hand-soldering tools

In Hardware on Jun 8, 2010 at 00:01

I’ve been hand-soldering SMD for some time now. The reflow approach is better suited for batches and recurring work, but sometimes I don’t have solder paste stencils, and sometimes I just need to do one-off builds.

There are tons of instructions on the web. Seach for “curious inventor smd” for example, for some great videos.

First thing to note is that it’s not as hard as it seems. Soldering SMDs by hand is still just that: soldering. The only thing you have to take care of, IMO, is alignment. Once a component is soldered on in two or more places, it’s virtually impossible to adjust it. And pushing too hard, or heating far too long, just gives you lifted solder pads and broken traces, because SMD pads are not as robust as a plated-through hole. Once that little piece of copper comes off the board, you’re hosed…

Ok, here are the tools I use:

  • a soldering iron
  • tweezers
  • a flux pen
  • solder wick
  • solder
  • magnifying lamp

That’s it. I hardly ever use my hot-air soldering station. I don’t have a de-soldering unit (a soldering iron with a hole in it and air suction, basically). For lots of SMD-by-hand work, you don’t need ’em. Although for serious “rework”, you probably do.

Here’s the “Ersa i-CON nano” soldering station I’ve been using for a month or so now:

Dsc 1515

The iron is excellent. Small, and due to its construction it’s also incredible cool (yeah, in both senses):

Dsc 1516

The tip is held on with a spring. It’s clever, because that knob shields even more of the heat, and the tip sits around the heat core, absorbing every bit of heat instead of radiating it out:

Dsc 1517

The warm-up speed of this thing is incredible, it’s ready for use in under 10 seconds. The unit will fall back to 240°C when not used for a few minutes, and drops all the way back to about 45°C when left unused yet longer. It’s configurable, but in an odd way: a Windows program can write a file to a µSD card, which you then insert into the base. It’ll pick up the settings, and use them from then on. I’ve configured the presets as 280°, 320°, and 360°, respectively. For manual work (with leaded solder) I usually set it to 320°. For work on reflowed boards (using unleaded solder), I use the 360° setting.

Should have bought this ages ago. I use the smallest tip available, which is a 0.4 mm round tip. It’s slightly too pointed IMO, so the solder tends to move away from the tip due to adhesive forces. But it works well, and the heat really gets all the way to the tip. On the handle, I hardly feel any heat. I check the display to see if it’s on.

The holder is also clever. It’s some sort of shaped rubber, I think. Heavy, so it won’t slide away, nor get damaged from touching it with the iron, and the way it is shaped means that at rest the handle will cool down all the way back to room temperature.

It’s not cheap (Conrad #588374), but to me it’s worth every penny. Having said that, there are no doubt other units at a fraction of the price which should also work really well. What I’d look for: pencil-like, regulated, and no more than 40 to 60 Watt. I fell for the instant startup, the configurability, and the auto switch-off.

Here’s my previous 15 Watt Weller soldering iron, a few decades old by now (tip replaced only once, I think):

Dsc 1518

Also really nice, but slightly underpowered (ground pins and ground planes are a bit tricky, as are thick copper wires). Another difference is the lenght of the tip – I really like the Ersa’s short pencil-like dimensions.

My daughter still prefers the Weller btw, so we’re both happy now :)

Here are the other tools I use:

Dsc 1519

Not much to say about this, other than that you really should get tweezers wich were designed for this work. They become an extension of your hand. They let you pick up grains of sand, even blind-folded. Well, almost. They may not look like much, but use them gently or you’ll bend them and bring the two opposite sides out of alignment – rendering them useless. Can’t remember where I got these – DigiKey and Farnell have ’em too.

The flux pen is essential for SMD work: apply flux, place component, then solder. When fixing things, I often apply flux again. The enemies are corroded solder and corroded pads – flux takes care of both. This is Farnell #876732.

The wick is equally essential. It took me a while to learn an important lesson: shorted pins are no big deal. Just wick the solder away and everything will be fine again. Can’t get the solder off? Just add more solder first, and then use the solder wick. Snip off each piece after use, as solder wick can only be used once. The type I use is not great – it leaves an ugly brown flux residue.

And last but not least, get a magnifying glass with built in circular lighting:

Dsc 1521

Actually, my daughter prefers broad daylight and nothing between her eyes and her work. To which I can only say: 1) yeah, but she’s 21! and 2) I’m a night owl … starlight + moonlight don’t seem to cut it for me! :)

I think I got that lamp from Conrad, but I can’t find it anymore. It was under € 15, ten times cheaper than another lamp I got with a long arm. The short arm is inconvenient, but it’s more solid. The big benefit of this 3x magnifier is that it has a built in 10x close-up section. Great for very close inspection. The one drawback of this small light is that you can’t look through with both eyes, so you don’t get stereoscopic vision with better depth clues. The cover is useless, btw. I haven’t needed a microscope yet, although there are limits to what you can see with this thing.

Anyway, that’s what I’m using. The rest is patience. And practice.

Why use acknowledgements

In Software on Jun 6, 2010 at 00:01

Yesterday’s post showed how to receive packets sent out by the “bmp085demo.pde” sketch, and report them back to the serial port. By reporting in the same format as the sender, this makes it possible to easily switch between a local direct-connect setup and a two-node wireless setup.

But there’s a bit of unnecessary complexity in there.

Note that I’m using the Easy Transmission mechanism, which is indeed easy to use – only four lines of C code is all it takes to turn a local node into a wireless node.

If you try out this setup, you’ll see that the receiver doesn’t always report packets, i.e. not every second. That’s not because it’s missing packets, but because the Easy Transmission mechanism includes logic to avoid sending anything if the data hasn’t changed.

There is a way to force sending packets with rf12_easySend.

But there’s another issue with this setup. The Easy Transmission mechanism uses acknowledgements and timers to make sure that all data arrives at its destination. If no ACK is received within one second, the data will be resent.

Trouble is, this is a bit silly: we’re sending out fresh readings every second anyway, in this setup! Why use timeouts and a retransmission mechanism, just to resend potentially outdated information in the first place?

This setup doesn’t need a retransmission mechanism, it adds no value.

In fact, we don’t even need ACKs: what’s the point of an ACK if the proper arrival at the receiver end doesn’t really matter to the sender? In this setup, the sender is going to send updated readings once a second anyway.

This is a clear case of overkill, so let’s simplify. Here is the improved sender:

Screen Shot 2010 05 26 at 22.11.57

It’s not shorter than before, because the rf12_easy* functions really are indeed quite easy and compact. But the code size is reduced a fair bit:

Screen Shot 2010 05 26 at 21.33.00

vs.

Screen Shot 2010 05 26 at 22.13.12

(Much of the remaining code size is due to the code needed for the BMP085 pressure calculations)

And here’s the receiver, which no longer sends ACKs:

Screen Shot 2010 05 26 at 22.08.15

The difference in size is due to no longer loading any transmit code:

Screen Shot 2010 05 26 at 21.38.27

vs.

Screen Shot 2010 05 26 at 22.07.28

More importantly perhaps, the receiver now reports readings once a second, including unchanged ones:

Screen Shot 2010 05 26 at 22.22.32

… and of course, that – without the ACKs – there will be about half as many packets flying through the air.

The moral of this story is: don’t include functionality if it doesn’t serve a need!

Switching from direct serial to wireless

In Software on Jun 5, 2010 at 00:01

The recent weblog post about the BMP085 sensor used on the Pressure Plug sends its readings out to the serial port once a second. It also included a few extra lines of code to send the results wirelessly via the RF12 driver. I added that code to illustrate how easy it is to go from a wired hookup to a wireless hookup with JeeNodes:

Screen Shot 2010 05 26 at 21.36.01

Sending it is only half the story – now I need pluck it out of the air again. I could have used the RF12demo sketch with – say – a JeeLink to pick up these transmissions, but then I’d get something like this back:

Screen Shot 2010 05 26 at 21.33.22

I.e. 6 bytes of raw data. One way to deal with this is to write some code on the other end of the serial port, i.e. on the receiving workstation, to decode the reported temperature and pressure values. That’s what I’ve been doing with JeeMon on several occasions. Then again, not everyone wants to use JeeMon, probably.

Another way is to simply create a special-purpose sketch to report the proper values. Such as this one:

Screen Shot 2010 06 24 at 18.42.53

Sample output:

Screen Shot 2010 05 26 at 21.27.22

I used the same format for the output as the “bmp085demo.pde” sketch, but since the two raw data values are not included in the packets, I just report 0’s for them. As you can see, the results is more or less the same as having the Pressure Plug attached directly.

The ACK logic in this sketch looks a bit complicated due to the bit masking that’s going on. Basically, the goal is to only send an ACK back if the sending node requests one. And since we assume that the sending node used a broadcast, we can then extract the sending node ID from the received packet, and send an ACK just to that node.

Tomorrow, I’ll describe how all this can be streamlined and simplified.

Good news and bad news

In Hardware on May 26, 2010 at 00:01

You’re supposed to tell the bad news first, so…

The Carrier Board described a few days ago works nicely, along with the box, Carrier Card, and EtherCard.

BUT… I completely goofed with the optional DC power jack connector :(

With the current board, the center pin is connected to ground. Whoops! I’ve hacked it for now by rewiring stuff a bit, and leaving 2 of the 4 solder pads on the power jack unsoldered:

Dsc 1475

If you look very closely, you’ll see some black electrical isolation tape between the board and this side of the power jack. The other side of the jack looks like this:

Dsc 1476

Warning – the wires are not attached correctly in this picture. The PWR line on the top side needs to be hooked up to the rightmost pin on the DC power jack.

It all looks worse than it actually is, because the whole thing gets mounted into a box and is also held into place by the cutout in the outer wall. So the power jack can’t really move around, despite the fact that it’s only held by two solder joints on a single side.

Oh well – s…tuff happens.

And the good news is…

There are actually two much simpler and stronger workarounds for this problem, because only a single copper pad is causing the problem. The copper pad in the very corner of the board is the only one that needs to be fixed!

The first workaround is to cut the two thin traces connecting this pad with the surrounding ground plane:

Dsc 1483

Since the traces are right next to the edge of the board and very thin, all you need is a Stanley knife to cut those two traces. A V-type cut is the way to do it:

Dsc 1482

Use a multimeter or continuity tester to verify that the pad is indeed no longer connected to ground.

Voilá! Now that the corner pad has been isolated, the power jack can be soldered on in the normal way, and wired into the rest of the circuit as needed.

The second workaround is to cut the pin off from the jack itself, so it can’t touch the exposed corner pad:

Dsc 1481

Better safe than sorry, so it’s probably best to also use insulating tape, as was done above.

IOW, the incorrect power jack connection is a major glitch, but there are several effective ways to work around it. Given that not everyone will even want that DC power jack option, I’m going to stick with these PCBs.

Communication 101 – Networks

In Hardware, Software on May 4, 2010 at 00:01

After yesterday’s introduction, let’s tackle the more advanced version of communication: networking.

For example a set of machines connected via Ethernet:

Screen Shot 2010 04 26 at 13.48.00

(often, the hub will also be connected to a router and/or internet)

While similar to “point-to-point” communication on the surface, and sometimes even at the hardware level, this is completely different in terms of software.

The difference between dedicated links and networks, is that the network is shared for use by multiple nodes. It can’t deal with everything at once, so again, “time” is a crucial aspect of networking.

First of all, you’ve got to take turns talking, How do you take turns? Not so easy. In face-to-face conversations, we use an intricate system of pauses, expressions, gestures, and behaviors to effortlessly take turns in a conversation. On the phone, it’s a little harder, on a phone with a noticeable time lag, it can be frustratingly hard.

With computers, one thing you have to do is break each transmission into chunks – i.e. “packets”. Then it’s a matter of getting all the packets across, one by one, in the correct order, until the task is done. Or abort between packets, and allow changing roles so that other nodes can start to transmit.

Ethernet is a common way to connect computers, and TCP/IP is the standard used to do make communication possible. The whole internet is built on it. On a network, everyone has to play by mutually agreed rules. These can be fairly complex, which explains why microcontrollers need some extra hardware and code to be able to act as Ethernet node – and it can easily reach the limits of what they can do.

The nice bit about networking is that once you’re “on” a network, any node can communicate with any other node. But again, there is quite a bit of machinery and logic needed to make that possible.

The two key features of the serial communication described yesterday, is that they are reliable and throttled. Well, not so on a network. Packets get lost or damaged in transit. Someone could be messing with the cable while two units are exchanging packets, and disrupt the whole process. Even if two nodes are working 100%, they can still fail to exchange a packet!

So with networking, you have to deal with things like timeouts, acknowledgement packets (which can get damaged as well), and re-transimssions. You also have to deal with flow control, to avoid sending more data than a receiver can handle. Imagine sending a steady stream of packets: if one of them gets lost, we have to detect the failure (takes time), re-send the packet (takes more time), and catch up with the new incoming data!

Before you know it, yesterday’s “Serial.println(value);” example turns out to require a lot of logic and error-handling.

It gets worse: what if the receiving node isn’t even connected right now?

The transmitter should to be able to detect this so it can decide what to do.

Sometimes, there is no alternative but to ignore it. Sometimes, that’s not even such a big deal – for example with a sensor which is periodically sending out new readings. It’ll fail, but once the receiver is back online, it’ll all start to work again.

If you’ve ever ever looked into a “networking stack” (i.e. its code), or even implemented one yourself, you know that writing this sort of code is a complex task. It’s not easy to communicate reliably when the communication channel is unreliable.

This daily weblog is a nice analogy, btw. I send out daily “posts” (packets), but this one for example continues where yesterday’s discussion left off. In a way, by assuming that you, dear reader, will follow along for more than one post, I’m creating a “story” (virtual circuit), based on daily chunks. It’s a fairly robust communication stream operating at a constant rate. Based on nothing but web pages. So now you can think about packets, next time you watch a YouTube video :)

What about wireless networks?

Screen Shot 2010 04 26 at 13.48.08

The good news is: wireless networks have to deal with most of the same issues as wired networks, and these issues are well understood and solvable by now.

The bad news is: wireless networks have to deal with a lot more unreliability than wired networks. All it takes to disrupt a wireless network is some RF interference from an old arcing motor, or even just walking out of range!

It’s possible to maintain the illusion of a serial connection over a network – it’s called a virtual circuit. TCP/IP does that: when you talk to a web server, you often exchange a lot more than will fit in one data packet. So TCP/IP sets up a mechanism which creates the illusion of a serial link – a virtual pipe between sender and receiver. Put stuff in, and it’ll come out, eventually.

Except that this illusion can break down. There’s no way to maintain the illusion of a permanent “connection” when you unplug the receiver, for example. Or walk out of range in a wireless network.

There’s no way even to give guarantees about communication speed. You might be at the very edge of a wireless network, with the system frantically trying to get your packets across, and succeeding perhaps once an hour. Oof – made it – next – yikes, failed again – etc.

In a wireless sensor network (WSN), which keeps sending new readings all the time, the illusion of a virtual circuit – i.e. virtual serial connections – can break down. And what’s worse: when it does, it’ll do exactly the wrong thing: it will try to get the oldest packet across, then the next, and so on. Which – if the network is really slow – is just going to lead to a permanent backlog.

What you want in a WSN, is to occasionally drop the oldest readings, which are more and more obsolete anyway, if that helps recover and obtain the most recent readings again.

A bad WSN’s can give you lots of data you don’t want, whereas a good WSN will give you the latest data it can. The trick is to stop trying to send an old value as soon as you’ve got a new and more up-to-date reading.

This is the reason why the RF12 driver used with JeeNodes uses a packet delivery model, instead of a virtual circuit model. In networking terms: UDP instead of TCP. Also: best effort instead of (semi-) guaranteed in-order delivery.

Nice bonus: packet delivery is far simpler to implement than virtual circuits.

What has worked so well for teletypes and serial consoles for several decades, is not necessarily a good idea today. Not in the context of networks, particularly WSN’s. For another example, you need look no further than the web: part of HTTP’s success is based on the fact that it is “state-less”, i.e. not fully connection-based.

So all I have to do now, is to convince the whole world that thinking of communication as serial connections can be awful in some scenarios!

Heh, piece of cake: just make the whole world read this weblog, eh?

Communication 101

In Hardware, Software on May 3, 2010 at 00:01

Triggered by a recent post on the discussion forum, it occurred to me that I may be taking way too many concepts for granted. My problem is that I’ve been around computers too long. Not so much in being socially inept (just a bit introvert :) – no, the issue is that I seem to have become pretty familiar with how they work, from silicon to database to web-server.

This is a huge hurdle when trying to share and explain what I’m doing, and it probably makes this weblog harder to dive into than I intended, as a friend recently pointed out – an insight for which I’m very grateful.

So after this little bit of personal communication, let me get to the point: what I’d like to do from time to time on this weblog is to go into some key topic, relevant to the projects here at Jee Labs, but doing it in such a way that will hopefully bring more insight across to people who share the enthusiasm for all this Physical Computing stuff, but not necessarily all that techie background.

Not to worry – this is not the start of a textbook :) – nor a treatise. Just trying to clarify some stuff. Succinctly, as much as I can. If you know all this, or if it bores you – bear with me for one or two posts, I will go back to other topics again in the next posts. When I make mistakes, or say nonsense, please do correct me. I live to learn.

Today, I’ll kick this off with Communication (Wikipedia) – in the context of information exchanges between computers and peripherals, and within the hardware of various types of systems.

First off: why communicate? Because that’s what computers do, essentially. Number crunching (i.e. the literal sense of “to compute”) is secondary by now.

Examples, in the context of physical computing:

  • sending a measurement value to our PC
  • sending information to a display
  • sending data to an attached chip or device
  • sending a control signal to turn things on or off
  • sending packets by wireless to another node

How can information be sent? In short: as bits, from a software perspective, or as electric / magnetic / optical / mechanical signals from a hardware perspective. You could say that Physical Computing is about bridging those software and hardware perspectives. Sensing “stuff” and making “stuff” happen. With the fascinating part being that there is computation and change awareness (state) and decision-taking involved via the microcontroller which sits in the middle of it all.

This is a big topic. Let’s narrow it down. Let’s focus on communication in the form of bits, because ultimately just about everything goes through this stage.

Screen Shot 2010 04 26 at 13.38.51

Let’s take that first example: sending a measurement value to our PC. How do you do that? Simple, right? Just put something like this in your sketch:

Serial.println(value);

Whoa… not so fast. There’s a lot going on here:

  • we select an interface (Serial)
  • we fetch the measurement value from a variable (value)
  • we convert that data to a text string (println)
  • we transmit the text, character by character, over a serial link
  • somehow that serial link uses electrical signals to travel over a wire (hint: FTDI)
  • somehow this finds its way into a USB port (hint: device driver)
  • somehow this is picked up by an application (hint: COM or /dev/tty port)
  • somehow this appears on the screen (hint: serial console)

And what’s probably the most complex aspect of this entire process: it takes time. What appears to happen in less than the blink of an eye to us, is in fact a huge time consumer as far as the microcontroller is concerned.

If we ignore the details, you have to admit that this works pretty well, and that we can indeed easily get results from a microcontroller board to our PC.

That’s due to two key features of this comms channel:

  • the connection is reliable: what is sent, will arrive, eventually
  • the connection is throttled: sending is slowed down to match the reception speed

It’s easy to take this for granted, but not everything works that way. When you send data to an attached LCD display for example, the software has to take care not to send things too fast. LCD displays need time, and there are limits to how fast information can be presented to them. The Arduino “LiquidCrystal” library is full of little software delays, to avoid “overrun”, i.e. sending stuff faster than the LCD can properly handle.

The trouble with delays is that they hold everything up!

Well, that’s the whole point of delays, of course, but in a microcontroller, it means you don’t get to do anything else while that delay is in progress. Which has its own problems, as I’ve described earlier.

If you think about it for a moment, you might see how delays in fact make communication virtually impossible: because if you’re supposed to wait for all the little steps to complete, how can you possible handle incoming data, which has no idea that you’re currently waiting in delays and not listening to anything?

I won’t go into the (hardware-based) solutions which work around this issue, but it has been solved, to a certain extent. This is why data coming in from the USB link will usually arrive just as expected, yet at the same time sending out data usually slows down the microcontroller in clearly noticeable ways. Try making a blinking LED, and then sending a string with Serial.println in between each blink. Sure enough, either the blinking or the serial output will become slower or even irregular.

Communication of of data takes time. We don’t have infinitely fast connections. Even something as simple as “Serial.println(value);” is full of nasty side-effects and details, especially on a microcontroller such as the ATmega.

It’s good to keep this in mind. This is one reason why something as seemingly simple as collecting data from multiple serial streams requires a fairly advanced library such as NewSoftSerial, and why even that clever code has its limitations.

Tomorrow, I’ll talk about packets, networking, and wireless communication.

Software- and hardware-I2C

In Software on Mar 15, 2010 at 00:01

Until now, the Ports library supported software I2C on the 4 “ports”, whereas hardware I2C required the Arduino’s Wire library. The unfortunate bit is that the API for these two libraries is slightly different (the Ports library uses a slightly more OO design).

See the difference in resulting code with and without Plug shield, for example.

Triggered by an idea on the forum, I decided to extend the Ports library, so that “port 0” gets redirected to Analog 4 (PC4) and Analog 5 (PC5), i.e. the hardware SDA/SCL pins on an ATmega.

So now you can use the Plug Shield with the same code based on the PortI2C class as with JeeNodes. Simply specify port zero when using this with an Arduino and a Plug Shield!

Here’s the Arduino example again, which no longer requires the Wire library:

Screen shot 2010-03-13 at 19.27.28.png

Note the use of port 0 to connect to hardware I2C pins.

The benefit is that all plugs for which code exists based on the Ports library (Pressure, UART, LCD, etc) can now be used on an Arduino with a Plug Shield.

A nice extra is that this will also work on an Arduino Mega, without requiring two extra patch cables to hook up to the hardware I2C pins.

Long live simplicity!

Pushing data as web client

In Software on Mar 3, 2010 at 00:01

Another way to get data to another place is via push HTTP requests, i.e. acting as a client for another web server somewhere on internet.

The Pachube site is one place to send measurement results. I’ve set up a test feed to use with JeeMon, with two data streams. One of the nice features is that you can easily produce and embed graphs from that system:

Screen shot 2010-02-26 at 02.45.13.png

Here’s how I implemented it in JeeMon. First, I created a configuration section:

Screen shot 2010-02-25 at 22.37.12.png

(I’ve omitted most of my private API key, but anyone can sign up and get their own …)

Next, the code, in the form of a “pachube.tcl” file / module / rig:

Screen shot 2010-03-02 at 03.11.12.png

Note how the configuration file in fact contains “settings” which are simply evaluated as a script: the two lines starting with “Fetch …” are handled by the Fetch proc in pachube.tcl, using the JeeMon notification mechanism to get called back whenever either of the “meter1” or “meter2” readings changes. The Update proc then updates the “latest” variable accordingly. Lastly, SendToWebSite periodically does an HTTP PUT request in the background, supplying all the info needed to successfully submit to the Pachube website.

Btw, this simple example is flawed, in that it does not calculate averages – it just sends the last reading. But things like averaging require a sense of history, and persistence. Haven’t added that to JeeMon yet…

The missing link, as usual, is a line in the application file to start the ball rolling:

Screen shot 2010-03-02 at 03.03.46.png

There’s a lot of flexibility at the protocol and network levels. Such as creating an XML request via templates, if the remote server needs XML. This isn’t limited to HTTP requests, or to using port 80 – send emails, FTP files, etc.

There are quite a few details in the above code which I won’t go into. Again, I’m doing this mostly to show how little code it takes to initiate periodic HTTP client requests to an outside website.

Apart from sites such as Pachube, this could also be used from a tiny embedded Linux running JeeMon, to submit incoming data to a different setup elsewhere on the LAN, for example. IOW, JeeMon can be a front-end for other systems. It’s not about lock-in. It’s a switchboard. It can glue systems together. It bridges gaps.

JeeMon as web server

In Software on Mar 2, 2010 at 00:01

JeeMon comes with a nifty built-in HTTP/1.1 web server (using coroutines to handle requests in parallel).

So let’s create another real-time status display, but as web application this time:

Screen shot 2010-02-25 at 17.56.25.png

As before, I’m leaving out the lipstick – just showing the core functionality.

Here’s what I did to generate this page. First of all, I’m re-using some functionality which is already in the GUI version, to dynamically construct a matrix with the proper columns and rows, so the “statusWindow” code has to be running for the above to work. For real use, that code should probably be reorganized into its own rig.

The main task was to create a HTML template in a file called “statusPage.tmpl”:

Screen shot 2010-02-26 at 00.41.23.png

This uses a simple templating mechanism based on Tcl’s “subst” command. The meta tag sets up the page to self-refresh every 5 seconds. The last line tells the Wibble web server to deliver the page as HTML.

To launch the server, this line was added to the main application file (port 8080, current dir is doc root):

Screen shot 2010-02-25 at 18.00.15.png

That’s it – there’s very little to activating a web server in JeeMon, as you can see. Which is important, because on tiny embedded Linux systems, an HTTP server will probably be the only option to present information.

Creating a full-blown site with CSS, JavaScript, and Ajax is a matter of adding more files – the usual stuff…

Did I mention that it’s all 100% open source, so you can browse / extend / change all of this? – I did? Oh, ok ;)

Node discovery

In Software on Mar 1, 2010 at 00:01

This post isn’t about GUIs, but about the dynamic behavior behind them.

And about wireless. Let’s take the voltmeter setup I described recently, and make it send its readings over the air – it’s a JeeNode after all, so it has wireless connectivity built in.

I extended the original sketch to send readings out roughly once per second:

Screen shot 2010-02-25 at 15.57.03.png

Extending the host side to also work from received packets is trivial:

Screen shot 2010-02-25 at 16.49.26.png

The problem now, is that this node will start sending out packets, but JeeMon has no idea what to do with them. One day, automatic node type registration would be great to add, but for now I’ll just use the configuration file to associate nodes with sketches. Here’s an extract of my current “config.txt” file:

Screen shot 2010-02-25 at 16.00.30.png

For this example, I added the “3/ { type analogPlug }” line to the “868:5/” entry (i.e. 868 MHz, netgroup 5). And sure enough, JeeMon will now launch the GUI for this node, displaying results coming in over the air in real-time:

Screen shot 2010-02-25 at 16.03.00.png

Here’s the relevant part of the log output:

Screen shot 2010-02-25 at 16.04.34.png

The fun part is that a timeout mechanism has also been implemented. If no packets come in from the analog node for 5 seconds, the connection is torn down and the window is closed again. The timeout value is sketch-specific and is included in the above configuration info.

In other words: node is ON => window appears, node is OFF => window goes away. Look ma, no hands!

Note: it’s not good UI practice to open and close windows like this. In a more refined setup, these windows would be panels in a larger window, or they could just be grayed-out to signal their disconnected state, for example.

The analog node is surprisingly convenient this way. I can use it anywhere, turn it on, look on the screen, and turn it off when I’m done. It wouldn’t be hard to add another wireless node to show the results on an LCD screen, btw. If this is extended with a way to assign node IDs ad-hoc, and a way for a sketch to identify itself over the air, then a whole family of wireless “instruments” could be created – using nothing but a JeeNode and some plugs.

This illustrates a guiding principle in JeeMon: don’t ask, just do it! – JeeMon tries to be as dynamic as possible, adapting to changes in status and configuration by adjusting everything it does accordingly. In the case of nodes, new ones are discovered and stale ones are weeded out, in real-time. With objects getting created and destroyed automatically. What these objects do, as side-effect, is up to each one of them,

Right now, this dynamism in JeeMon is still fairly messy. There’s too much state in too many places which needs to be managed carefully. But with a bit of luck, this can eventually be simplified further to create a very consistent and predictable behavior.

Secure transmissions

In Software on Feb 23, 2010 at 00:01

For some time, I’ve been thinking about adding optional encryption to the RF12 wireless driver. I’ll leave it to the cryptography experts to give hard guarantees about the resulting security of such a system…

The basic goal is to provide a mechanism which lets me get messages across with a high level of confidence that an outsider cannot successfully do the same.

The main weakness which most home automation systems such as FS20 and KAKU completely fail to address is message replay. The “house code” used by FS20 has 16 bits and the address has 8 bits, with the naive conclusion being that it takes millions of attempts to send out messages to which my devices respond. Unfortunately, that’s a huge fallacy: all you have to do is sit outside the house for a while, eaves-dropping on the radio packets, and once you’ve got them, you’ve figured out the house code(s) and adresses in use…

I don’t care too much about people picking up signals which turn the lights on or close the curtains. You don’t need technology to see those events from outside the house anyway. I do care about controlling more important things, such as a server or a door opener.

Here are my design choices for optional encryption in the RF12 driver:

  • The cipher used is David Wheeler’s XXTEA, which takes under 2 Kb of code.
  • The keys are 128 bits, they have to be stored in EEPROM on all nodes involved.
  • All nodes in the same net group will use either no encryption or a shared encryption key.
  • A sequence number of 6, 14, 22, or 30 bits is added to each packet.

To start with the latter: XXTEA requires padding of packets to a multiple of 4 bytes. What I’ve done is add the sequence number at the end, using as many bytes as needed to pad to the proper length, with 2 bits to indicate the sequence number size. Encrypted packets must be 4..62 bytes long. It’s up to the sender to decide what size packets to send out, and implicitly how many bits of the sequence number to include. Each new transmission bumps the sequence number.

To enable encryption, call the new rf12_encrypt() function with a pointer to the 16-byte key (in EEPROM):

Screen shot 2010-02-21 at 18.37.36.png

Encryption will then be transparently applied to both sending and receiving sides. This mechanism also works in combination with the easy transmission functions. To disable encryption, pass a null pointer instead.

The received sequence number is available as a new “rf12_seq” global variable. It is up to the receiver (or in the case of acks: the originator) to ascertain that the sequence number is in the proper range. Bogus transmissions will decrypt to an inappropriate sequence number. To make absolutely certain that the packet is from a trusted source, include some known / fixed bytes – these will only be correct if the proper encryption key was used.

This new functionality has been implemented in such a way that the extra code is only included in your sketch if you actually have a call to rf12_encrypt(). Without it, the RF12 driver still adds less than 3 Kb overhead.

I’ve added two sample sketches called “crypSend” and “crypRecv” to the RF12 library. The test code sends packets with 4..14 bytes of data, containing “ABC 0123456789” (truncated to the proper length). The receiving end alternates between receiving in encrypted mode for 10 packets, then plaintext for another 10, etc:

Screen shot 2010-02-21 at 22.42.27.png

As expected, the encrypted packets look like gibberish and are always padded to multiples of 4 bytes. Note also that the received sequence number is only 6 bits on every 4th packet, when the packet size allows for only one byte padding. The strongest protection against replay attacks will be obtained by sending packets which are precisely a multiple of 4 bytes (with a 30-bit sequence number included in the 4 bytes added for padding).

So this should provide a fair amount of protection for scenarios that need it. Onwards!

Declarative configuration

In Software on Feb 22, 2010 at 00:01

Things are starting to come together quite nicely with JeeMon – all in less than 1000 lines of Tcl code so far. That’s not to say that it’s perfect yet – I’m still massively rearranging some of these code structures – but there is already some functionality to play with, which helps expose the strengths and weaknesses so far.

The basic approach is declarative in nature: specifying what should happen, but not necessarily in the same place as when or how all the work should be done.

Here’s my first tentative configuration file:

Screen shot 2010-02-21 at 14.31.44.png

For each serial interface, an action is specified when that interface comes on-line. For example I can now plug in my A8007Up6 device (a USB-BUB with a JeeNode) and JeeMon will open a terminal window showing the output stream. Unplugging closes the window again (and automatically cleans everything up).

Right now, I’m editing this config file by hand, but at some point that could be augmented with a set of GUI or web-based configuration panels (even remotely).

There are a lot of challenges ahead. One of them is decoupling things so that code and data end up organized in the most convenient, logical, and flexible way. I’m trying very hard to avoid any calls to the “Config” module from generic JeeMon code. This allows you to set up the configuration file structure and hierarchy in any way you like.

My application file currently looks as follows:

Screen shot 2010-02-21 at 15.42.58.png

As you can see, I’m setting up a periodic port scanner and supplying a “SerialEvent” callback to decide what to do when serial ports get added or removed. In those decisions, most of the logic is driven from the “config.txt” configuration file. I’m about to implement a similar callback for radio events, i.e. RF12 nodes coming online and dropping off again.

In a way, the above two files are the entire application.

The rest is utility code to make the above convenient and concise. That rest might well be 99% of the code, but it’s all structured as loosely-coupled “rigs” (modules), which can be used / overridden / ignored at will.

The way I see it, there can be three sources for such utility code: your own code, plug-ins from others, and a couple of rigs included in the JeeMon core itself. The above two files require just the core, using the built-in rigs (Config, Gui, JeeSketch, Log, Serial, and SysDep).

Serial USB devices

In Software on Feb 21, 2010 at 00:01

For JeeMon, I’d like to be able to auto-detect USB device insertions and to identify FTDI serial numbers (with names like “FTE54GKC” and “A9007CNE”).

Cross-platform… i.e. on Windows, Mac, and Linux!

It looks like it can be done.

On Windows, we can browse the good ol’ registry, as follows:

Screen shot 2010-02-20 at 15.24.16.png

Sample output:

Screen shot 2010-02-20 at 15.24.41.png

This was tested on Win2000 and Windows 7, in the hope that everything in between works the same.

On Mac OS X, it’s easiest (for a change):

Screen shot 2010-02-20 at 15.26.02.png

Sample output:

Screen shot 2010-02-20 at 15.26.37.png

On Linux, it’s a bit messy because there are so many different distributions:

Screen shot 2010-02-20 at 15.27.43.png

Sample output:

Screen shot 2010-02-20 at 15.28.01.png

Tested on Debian 5 (Lenny), Ubuntu 9.10 (Karmic), and Gentoo.

The above code can be found in the subversion repository, i.e. here.

I’m testing this all at once on a single machine btw, courtesy of VMware Fusion. And using JeeMon’s built-in self-update mechanism to quickly get new versions across while debugging and tweaking things.

By calling “SysDep listSerialPorts” periodically, we can automatically detect a change and see which plug was inserted (FTDI only for now). Without depending on any external libs or executables.

Onwards!

RF transport independence

In Software on Feb 20, 2010 at 00:01

With the basic serial interface access and dispatch in place in JeeMon, it’s time to move on to JeeNode / JeeLink packet processing.

What I want is to be able to forget all about how readings got to JeeMon. It shouldn’t make a difference whether I’m using a directly connected Room Node and grabbing the readings off its serial USB connection, or whether they came in through the air via the RF12 wireless driver – or by any other means. Take a snapshot with your cell phone, send the picture to EverNote, have it OCR’d, and then grab the readings off the web … whatever!

The way I’ve tied this into JeeMon, is to let the interface to RF12DEMO act as de-multiplexer. This is purely a decision at the RF12DEMO listener level. Each incoming packet is examined to determine which node it came from. Then we need a way to map from nodes to listener class – i.e. find out what sketch is running on the remote node. This is hard-coded for now:

Screen shot 2010-02-18 at 22.57.41.png

What this does is actually a bit more elaborate: a RF12DEMO listener will set up and manage a bunch of extra “RF12_PacketStream” listeners, one for each node. When packets come in, they will simply be dispatched to the corresponding stream. Each packet stream can process its packets in a different way.

The fun part is that these packet streams can use the same listener classes as with direct serial interfaces. All they need is an extra “decode_RF12” method:

Screen shot 2010-02-18 at 23.02.04.png

The task of decode_RF12 is to re-cast the incoming packet as messages of the same structure as what would come in over a serial connection.

Here’s the “rooms” listener as example:

Screen shot 2010-02-18 at 23.04.34.png

This one class encapsulates all the protocol details of room nodes, both serial and via RF12. When a 4-byte data packet comes in via RF12 (as $a..$d), the bits are decoded and an “onMessage” call is generated with a “ROOM” message id and the 5 decoded readings.

Here is a log of this code in action, one message per line:

Screen shot 2010-02-18 at 22.37.15.png

The way to read these messages is as key-value pairs, i.e. id = OK, type = RF12, name = usb-A900ad5m, etc.

The first two lines show an incoming OK message from node 21 (53=21+32), which is then turned into a ROOM message, tagged as coming from the “rf12-868.5.23” packet listener.

The next 3 lines are more involved: first an EM10 message came in over USB, then an OK message came in which got dispatched again, as the same EM10 message. That’s because I’m running JeeMon with a direct connection to the ookRelay board, even though it transmits all its information over RF12. So everything from the ookRelay comes in twice (great for debugging).

The point is that the two EM10 messages have the same info. It no longer matters how the message got here (but it is traceable, using the remaining details). And all the code to accomplish this is in a single source file, right next to the sketch running on the ookRelay board.

This design makes it possible to develop an application using only the serial USB connection, and then later add logic to send the information via RF12 (or not). Infrared, XBee, Twitter, anything: transport independence!

Note that nothing is done with these decoded messages at this stage. This messaging framework is independent of higher-level application decisions, such as where to store / send / show msgs, or even when to process them.

Serial port encapsulation

In Software on Feb 19, 2010 at 00:01

This post continues to look a bit into the new JeeMon design.

Let’s focus on serial interfaces first, mostly USB. There’s a “Serial” module which does all the basics. On the Mac, if I want to open device /dev/tty.usbserial-A900ad5l, then the following call with do everything:

Serial connect usb-A900ad5l 57600
The name of the device would be different on Windows and Linux (COM5, or USB1), but that’s all.

By default, this creates a new Serial object, which logs all incoming text to stdout. To send a command out, we need to keep a handle to this object:

set conn [Serial connect usb-A900ad5l 57600]
$conn send "some text"
Nice, but not very exciting…

Let’s take it one step further. The “JeeSketch” module does the automation described in the previous post, i.e. detect the running sketch, associate it with a class, instantiate an object for it, and call the methods of that object whenever text lines come in over that serial port. Here’s a complete JeeMon custom “application.tcl” program:

Screen shot 2010-02-18 at 14.44.59.png

First, all the sketch drivers are made available with one or more “register” calls. This lets the appropriate classes take over for each new serial connection – depending on what sketch is running. That’s all it takes. Servicing such serial ports now becomes an event-driven background activity.

The listeners are defined in separate files, one for each type of sketch:

Screen shot 2010-02-18 at 15.05.58.png

The ookRelay/host.tcl file looks like this, for example:

Screen shot 2010-02-18 at 14.51.58.png

This structure makes it easy to manage stuff that belongs together. Projects can be exchanged (or archived, or revision-controlled), with all the pieces needed to use them in one place. And as far as I’m concerned, it won’t be limited to JeeNodes etc, or Tcl, or a specific platform. This has to remain general enough to hook up to any hardware and use with any language (via networking, files, and direct launching of executables/scripts). My goal for JeeMon is not to limit anyone’s options, but to create a simple switchboard between whatever is needed.

(I’m still mucking around with the organization and naming of code and files, as you can see)

Nice, but still not very practical…

The problem with the above is that it doesn’t deal with devices getting plugged in or unplugged. Well, unplugging is the easy bit – the above code will automatically clean up after itself on connection loss, so that part is covered.

Wouldn’t it be nice if we could just plug in new devices and get them to automatically start doing something?

I implemented such a mechanism in a recent revision of the code, but I’m hesitant to add it again – because it was Mac OS X specific, where USB devices connected via the FTDI driver include a unique code. On Windows and Linux, you just get COM<n> and USB<n> devices, where “n” seems to be related to the order and number of device insertions.

I haven’t looked into “libusb” yet. Should I? Will it help for Windows too? Do I need to start learning about USB enumeration? What OSS-compatible tools and libs are there?

Update – on Linux, it looks like /sys/bus/usb/devices/* has all the info needed to identify USB devices. So that only leaves Windows – good: at most one lib or dll to deal with.

In the kitchen

In Software on Feb 18, 2010 at 00:01

Ok, so maybe it’s time to start describing some of the new stuff cooking in the kitchen lab.

I’ve been exploring software designs to use as basis for JeeMon, that new switchboard-type application for physical computing devices. The idea is to treat the system as one big message-passing machine – a bit like Message-Oriented-Middleware (MOM), which has been around for ages, but without getting sucked into any heavy-weight conventions.

Messages can be passed around, queued, stored, duplicated, filtered, transmitted, returned, ignored, sorted, etc. After all, living organisms do nothing but send chemical messages around, so why not do the same for an infra-structure focused on physical computing (sensors, displays, actuators) and home automation?

If you’ve been following this weblog for a while, you’ll have noticed that almost all the output I generate in sketches is of the form “identifier arg1 arg2 …”. So for example, packets received by RF12demo look like:

OK 61 9 2 8 79 243 87 13 0 15 0
A barometric reading from the BMP085 sensor on the pressure plug may get reported as:
BMP 233 10019
And so on. One or more upper case letters, optionally followed by digits. Then the payload (which may also include floating point and string values, not just ints).

Another convention I’ve been sticking with is to report the name of the sketch on the serial port during startup:

[ookRelay]
Or an identifier plus the current configuration settings:
[RF12DEMO] W i23 g5 @ 868 MHz
These two conventions can be used for an object-oriented design. With a bit of preparation, the name of such sketches can be automatically associated with a class, and each line treated as a method call.

Here’s the skeleton of the first level of code I use for decoding RF12demo.pde output:

Screen shot 2010-02-16 at 00.15.27.png

Another class definition, for the ookRelay.pde sketch:

Screen shot 2010-02-16 at 00.16.11.png

Or to put it differently: with JeeMon running, you can hook up a device (JeeLink, JeeNode, Arduino, etc) containing some sketch and the matching class will automatically be associated with that serial port, once the sketch has been identified. A new object is then created, and its methods will be called whenever the device sends new output (messages?) over the serial line.

Simple!

What I would like to do, is manage the sketch (C/C++) and the class (Tcl) together, perhaps as two files in a common development directory for that project. That way the interface between the two pieces of hardware essentially vanishes into the background. The point being that each class can then do all sorts of things, such as storing results in a database, sending it to another system over the network, updating web server pages, popping up a GUI window to show incoming data in real-time, etc.

This mechanism is very simple, even under the hood. This matters, because it has to work even when JeeMon is running on low-end embedded Linux hardware. But at the same time, such a MOM + OO design will allow considerable flexibility for abstract / high-end use later.

PS. If you’re familiar with Tcl, you might be surprised to see all this “oo” stuff in here. That’s because JeeMon uses Tcl 8.6 with built-in object system. Multiple inheritance, mixins, filters, dynamic class change support, delegation, prototypes / per-object methods, it’s all in there. I’m also using ensembles and there’s an interesting coroutine-based web server waiting in the wings (called wibble).

For the record: nothing ever gets added just to be buzz-word compliant. If a feature simplifies application-level concepts and leads to a cleaner implementation in JeeMon, it’ll be used, else I’ll pass. Life’s too short to jump on bandwagons.

New date / time / RTC library

In AVR, Software on Feb 5, 2010 at 00:01

Not so long ago, I had the opportunity to work a bit on something which has bugged me for a long time – the lack of date and time handling in connection with RTC chips. There are a few libraries out there, but I think I could do better – i.e. make it simpler, smaller, yet sufficiently powerful for real day-to-day use.

Seeing where this was going on the Arduino developer mailing list (and disagreeing with just about everything that happens over there), I decided to put my money time where my mouth is, and build my own library.

Here’s the header file of the new RTClib Arduino-compatible library:

Screen shot 2010-02-04 at 13.52.13.png

This lets you do date / time calculations, and it provides two different ways to implement a clock: via a hardware chip or using the built-in millis() timer.

RTClib has been checked into subversion, see the CODE page for details on how to get it.

It includes four example sketches:

  • datecalc illustrates how to do calculate with dates and times
  • ds1307 interfaces with a DS1307 RTC chip, connected via the Wire library
  • plugrtc interfaces with the RTC Plug, connected via the Ports library
  • softrtc demonstrates how to do the same with just software

One fun trick I added, inspired by a comment from Limor Fried, is to allow initializing a DateTime object from the DATE and TIME strings generated by the C compiler. That means you can run that “softrtc” sketch without hardware support, and it’ll automatically have its clock set to the compilation date of the sketch, i.e. fairly close to correct. Not good enough for general use, but great during quick debug cycles when you’re re-compiling your sketch all the time anyway.

Note that to use RTClib, you need to include the “Wire.h” library – even if you don’t use it!

The inability to properly deal with libraries, particularly in a resource-constrained embedded processor context, is one of the aspects of the current Arduino direction which irritates me – see an older post for more details.

More C++ template trials

In AVR, Software on Jan 18, 2010 at 00:01

Here are some more results in trying to use templates for embedded software on JeeNodes. This time, it’s about using Jee Plugs with bit-banged I2C on one of the 4 ports and with built-in TWI hardware on pins A4/A5.

Let’s start with an extract of the Latest JeeLib.h header:

Screen shot 2010-01-17 at 11.50.27.png

I’ve omitted all the actual code, for brevity in this example. The full code is here. The previous version of JeeLib has been moved to the avr-jeelib-first branch in subversion.

The Port<N> class is essentially the same as in an earlier post. It generates efficient code for raw pin access, given fixed pin assignments known at compile time.

The above code adds a templatized PortI2C<N,W> class, where N is again the port number (1..4) and W is a constant which determines the bus speed. As with Port<N> class, this leads to a class definition which requires no state and which can therefore consist entirely of inlined static members.

A HardwareI2C<W> is also defined, with the same API as PortI2C<N,W>, but using the hardware TWI logic in the ATmega. The point is that in use, PortI2C<N,W> and HardwareI2C<W> objects are interchangeable.

You can see how all this templating stuff complicates the naming of even simple classes such as these.

The final class implemented above is DeviceI2C<P,A> – it does the same as DeviceI2C in the original Ports library, but again using templates to “bring in” the underlying port classes and the device I2C address.

Here is an example sketch built with all this new code:

Screen shot 2010-01-17 at 12.48.05.png

It supports two bit-banged I2C devices on ports 1 and 2, respectively, as well as a third I2C device driven by the built-in TWI hardware.

This compiles to 980 bytes (using Arduino IDE 0017).

The good news is that this generates pretty efficient code. It’s not 100% inlined – but quite a bit is, especially at the lower levels, so the result looks like a pretty good implementation of a high-level I2C driver which can be used for both bit-banged and hardware-supported I2C, all by changing one declaration at the top of the sketch.

But there are quite a few inconveniences with this approach…

First of all, note that the declarations at the top are fairly obscure. I did my best to simplify, but all this template mumbo-jumbo means you have to understand pretty well how to declare a port, and how to declare an I2C device object for that port. The “typeof” keyword in there is a GCC extension, without which these declarations would have looked even more complex.

The major trade-off is that the above example essentially generates separate code for each of these three I2C devices. There is virtually no code sharing. This can lead to code bloat, despite the fact that each version generates pretty good code. In practice this might not be so important – it is not likely after all that you’ll need all three types of I2C buses in the same sketch. Just keep in mind that you’re trading off efficient hard-wired coding against the need to generate such code for each type of I2C access you might need.

So would this be a good foundation to build on? I don’t know yet…

C++ templates do seem to get a lot more of the logic “into” the compiler. Instead of passing say an I2C device address as constant to an object, we generate a highly customized class which is special-cased to implement just one device at just one address. With the result that the compiler can perform quite impressive optimizations. In the above example, there are lots of cbi and sbi instructions in the generated code, just as if you were to dig in and hand-craft an optimized implementation for exactly what you need. All from a small general-purpose library!

But it comes at a price. There is no (non-template) “DeviceI2C” class anymore. Writing a class on top to implement access to the UART Plug for example, means this class has to use templates as well. It’s a bit like “const” – once you start on the path of templates, they will start to permeate all your your code. Yikes!

The other “cost” is that all templates have to end up in header files. The size and complexity of the “JeeLib.h” header file is going to increase immensely. Not so great if you want to get to grips with it and just use the darn stuff. On the plus side, I was pleasantly surprised that error messages are fairly good now.

These drawbacks may be acceptable if all the template code can indeed remain inside the library – I sure wouldn’t want to impose the need for all library users to learn the intricacies of templates. So maybe it’s all doable – but this approach has major implications.

Is it all worth it? Hm, big decision. I do not like big decisions, not at this stage…

New library experiments

In AVR, Software on Jan 14, 2010 at 00:01

Encouraged by the previous post, I started writing a bit more code based on C++ features such as templates and namespaces, in the form of an Arduino library (compatible with it, not depending on it):

Screen shot 2010-01-09 at 02.08.27.png

Things to point out:

Less #defines (none so far), I turned bitRead() and bitWrite() into template functions (getBit and setBit), so that they can be used with 1..4 byte ints, just as the macros.

The Port class is inside a new “Jee” namespace, so there is no conflict with the existing Ports library.

Atomic access hasn’t been solved. Some of this won’t work if I/O pins get changed by interrupt code. I’m hoping to solve that at a higher level.

There are compatibility definitions for pinMode(), digitalRead(), and digitalWrite() using the normal Arduino pin numbering conventions (only Duemilanove so far). These are in their own namespace, but can also be used without any qualifiers by adding “using namespace Jee::Arduino;” to the sketch.

Totally incomplete stuff, untested, and not 100% compatible with Arduino in fact.

The other thing I want to explore, is to treat the choice of what a pin does as a pin initialization mode. Hence the enum with Mode_IN .. Mode_PWM definitions. The underlying code is PortBase::modeSetter(), but it hasn’t been written yet. It’s all steamy hot vapor for now.

Update – I’ve placed the code in subversion, but the API is going to be in flux for a long time.

Update #2 – Atomic access is actually better than I thought. With constant pin numbers, setBit() will use the cbi/sbi instructions, which are atomic.

C++ templates

In AVR, Software on Jan 12, 2010 at 00:01

A recent post described the performance loss in the Arduino’s digitalRead() and digitalWrite() functions, compared to raw pin access.

Can we do better – i.e. hide the details, yet still get the benefits of raw I/O? Sure.

If you’ve used JeeNodes and in particular the “Ports” library, you’ll have noticed that there is a C++ class which hides the details of each port (i.e. JeeNode port, not ATmega port). Let’s look at that first:

Screen shot 2010-01-06 at 12.40.09.png

I’ve omitted the implementation, but there are still lots of secondary details.

The main point is that this is now C++, and uses a “Port” object as central mechanism. Each object has one byte of data, containing the port number (1..4).

Due to heavy inlining, there is almost no additional overhead for using the Port class over using digitalRead() and digitalWrite(), on which they are based. I verified it by running similar tests as in the recent post about pin I/O:

Screen shot 2010-01-06 at 12.48.50.png

Using the definition “Port orig (1);” – and sure enough the results are nearly the same.

There are two issues which make this approach sub-optimal: using the slow digital read/write calls, and storing the port number in a memory location which needs to be accessed at run time. There is no way for the compiler to optimize such calls, even “orig.digiRead()” should be the same as writing “bitRead(PORTD, 4)” in this example.

That’s where C++ templates come in. Check out this definition of a new “XPort” class (named that way to avoid a name conflict) and an example of use for port 1:

Screen shot 2010-01-06 at 12.54.02.png

(As you can see, I’m switching to a different, and hopefully clearer, API along the way)

There’s some funky <…> stuff going on. We’re in fact not declaring one class, but a whole family of classes, parametrized by the integer included in the <…> notation on the last line.

The big difference, is that each class now has that integer value “built-in”, so to speak. So we can define member functions which directly pass that value on to the corresponding bitRead() and bitWrite() macros. And then all of a sudden, all the overhead vanishes: since the member needs no access to object state, it can be made static, and since all the info is known in the header, it can be made inline as well.

So the above template is C++’s modern way of doing far more at compile time, allowing the optimizer to generate much better code.

Note that templates come with some pitfalls: first of all, it’s very easy to inadvertently generate huge amounts of code, so very careful inlining and base class derivation is essential. The second problem is that templates tend to be “instantiated” as late as possible by the compiler, which can lead to confusing error messages when the templates are wrong or used wrongly.

I’m still just exploring this approach for embedded use. The potential performance gains are substantial enough to give it a serious try. My hope is that the hard work can be done in a library, so that everyone else can just use it and benefit from these gains without having to think much about templates, let alone implement new ones. The “one” object declared above acts like any other C++ object, so using it will be just as easy as non-template objects.

Does the above lead to fast code? You bet. Here’s a test sketch:

Screen shot 2010-01-06 at 13.05.29.png

And here’s some sample output:

Screen shot 2010-01-06 at 13.06.14.png

As you can see, values 5 and 6 are virtually the same as values 7 and 8. We’ve obtained the performance of direct pin access while using a high-level port-style notation to access those pins. This is why templates are so attractive for embedded use.

The timings are different from the previous post because the loops are coded differently. In this case, only the relative comparisons are relevant.

Wasting time

In Software on Jan 2, 2010 at 00:01

No, not your time or mine… the topic I want to go into here is how to let a micro-controller deal with things that happen over time.

The way an Arduino sketch works is as follows:

Screen shot 2009-12-28 at 14.59.53.png

The little slashes represent real code, where “stuff happens” so to speak. You can see that there is a clear flow of control, from the start of the code to the end, with delays and calls to other bits of code.

The trouble with this is that it can’t deal with multiple events. You can’t count pulses on an input pin and keep a LED blinking at the same time, for example.

In the old days, interactive computer interfaces were written in this same way. You’d enter a couple of values and then you got some results back. If you made a mistake halfway in the sequence, you had to restart and enter everything all over again.

Then “command menus” were invented: elaborate decision trees of the form “do you want to do X, Y, or Z?” – now at least it was possible to go back just one step. Today, user interfaces are event driven, responding to whatever interaction the user initiates instead of presenting choices. The user is in control, not the computer.

The traditional event-driven logic uses a dispatch / switchboard approach:

Screen shot 2009-12-28 at 15.00.03.png

The more modern approach is to use callbacks and closures. Fancier still is to use coroutines or continuations.

Could we use something similar for micro-controllers? Sure.

Callbacks are not that convenient in C, since it does not support the closures concept which would make it convenient (nor coroutines or continuations). Also, callbacks would have to be represented by pointers to C functions (2 bytes), whereas dispatch codes can probably be represented as a single byte (as long as we don’t have more than 256 of them). Efficiency and memory space matters on small 8-bit chips such as the ATmega.

This requires a change in mindset when writing sketches. A good way to get into such an event-oriented style is to imagine that there is no delay() function. In event-oriented code, you’re not allowed to wait for time to pass.

So how can we blink a LED if we can’t wait to switch it on and off? Well, instead of delays, we have timers. We’re still not allowed to loop until the timer expires, but we can tell this new type of timer to generate an event when its time has come.

For a low-end implementation, an event can simply be a small integer dispatch code.

So instead of waiting until we can turn the LED on or off again, we tell the system to wake us up again at the right time. This approach is also a perfect match for low-power nodes, by the way: sleep and wake-up on events can be taken literally to enter power-down mode and start up again.

The code might look something like this:

Screen shot 2009-12-28 at 16.00.55.png

Sure, it takes some more work than this delay-based code:

Screen shot 2009-12-28 at 16.03.06.png

… but it has the huge benefit that it’s now fairly simple to deal with other activities at the same time. What you need to do is turn each of those activities into one ore more events as well. They can then be added right next to the blink event code, as additional case(s) in the above “switch” statement.

An event-based dispatch mechanism adds a lot of flexibility. Tasks such as counting pulses, blinking LEDs, serial communication, wireless packet reception and transmission – these can all be processed as they occur. With as nice bonus that low-power sleep modes can become fully automatic: when there are no events, go to sleep!

The trick is to ban all uses of “delay()” and “delayMicroseconds()” and to never use “busy loops” to wait for something to happen. This has far-reaching implications, because all libraries must also play by these rules.

I’m going to explore this approach further. Maybe one of the existing nanokernels could be used as foundation. To qualify, it must be truly minimal (and I’m picky!) – i.e. it has to fit into an ATmega328 without claiming too much flash or RAM memory space.

Update – here’s a web page by “inthebitz” which illustrates the problems described above. It’s an absolutely genuine attempt to help people get started, and it no doubt does a very fine job at that, but it also shows how everything gets serialized time-wise. For slow things, it’ll be just fine of course.