Computing stuff tied to the physical world

Lessons from history

In Musings on Aug 12, 2015 at 00:01

(No, not the kind of history lessons we all got treated to in school…)

What I’d like to talk about, is how to deal with sensor readings over time. As described in last week’s post, there’s the “raw” data:

raw/rf12/868-5/3 "0000000000038d09090082666a"
raw/rf12/868-5/3 "0000000000038e09090082666a"
raw/rf12/868-5/3 "0000000000038f090900826666"

… and there’s the decoded data, i.e. in this case:

sensor/BATT-2 {"node":"rf12/868-5/3","ping":592269,
sensor/BATT-2 {"node":"rf12/868-5/3","ping":592270,
sensor/BATT-2 {"node":"rf12/868-5/3","ping":592271,

In both cases, we’re in fact dealing with a series of readings over time. This aspect tends to get lost a bit when using MQTT, since each new reading is sent to the same topic, replacing the previous data. MQTT is (and should be) 100% real-time, but blissfully unaware of time.

The raw data is valuable information, because everything else derives from it. This is why in HouseMon I stored each entry as timestamped text in a logfile. With proper care, the raw data can be an excellent way to “replay” all received data, whether after a major database or other system failure, or to import all the data into a new software application.

So much for the raw data, and keeping a historical archive of it all – which is good practice, IMO. I’ve been saving raw data for some 8 years now. It requires relatively little storage when saved as daily text files and gzip-compressed: about 180 Mb/year nowadays.

Now let’s look a bit more at that decoded sensor data…

When working on HouseMon, I noticed that it’s incredibly useful to have access to both the latest value and the previous value. In the case of these “BATT-*” nodes, for example, having both allows us to determine the elapsed time since the previous reception (using the “time” field), or to check whether any packets have been missed (using the “ping” counter).

With readings of cumulative or aggregating values, the previous reading is in fact essential to be able to calculate an instantaneous rate (think: gas and electricity meters).

In the past, I implemented this by having each entry store a previous and a latest value (and time stamp), but with MQTT we could actually simplify this considerably.

The trick is to use MQTT’s brilliant “RETAIN” flag:

  • in each published sensor message, we set the RETAIN flag to true
  • this causes the MQTT broker (server) to permanently store that message
  • when a new client connects, it will get all saved messages re-sent to it the moment it subscribes to a corresponding topic (or wildcard topic)
  • such re-sent messages are flagged, and can be recognised as such by the client, to distinguish them from genuinely new real-time messages
  • in a way, retained message handling is a bit like a store-and-forward mechanism
  • … but do keep in mind that only the last message for each topic is retained

What’s the point? Ah, glad you asked :)

In MQTT, a RETAINed message is one which can very gracefully deal with client connects and disconnects: a client need not be connected or subscribed at the time such a message is published. With RETAIN, the client will receive the message the moment it connects and subscribes, even if this is after the time of publication.

In other words: RETAIN flags a message as representing the latest state for that topic.

The best example is perhaps a switch which can be either ON or OFF: whenever the switch is flipped we publish either “ON” or “OFF” to topic “my/switch”. What if the user interface app is not running at the time? When it comes online, it would be very useful to know the last published value, and by setting the RETAIN flag we make sure it’ll be sent right away.

The collection of RETAINed messages can also be viewed as a simple key-value database.

For an excellent series of posts about MQTT, see this index page from HiveMQ.

But I digress – back to the history aspect of all this…

If every “sensor/…” topic has its RETAIN flag set, then we’ll receive all the last-known states the moment we connect and subscribe as MQTT client. We can then immediately save these in memory, as “previous” values.

Now, whenever a new value comes in:

  • we have the previous value available
  • we can do whatever we need to do in our application
  • when done, we overwrite the saved previous value with the new one

So in memory, our applications will have access to the previous data, but we don’t have to deal with this aspect in the MQTT broker – it remains totally ignorant of this mechanism. It simply collects messages, and pushes them to apps interested in them: pure pub-sub!