Computing stuff tied to the physical world

Sending Bencode data

In AVR, Software on Sep 30, 2012 at 00:01

As mentioned a while back, I’m adopting ZeroMQ and Bencode on Win/Mac/Linux for future software development. The idea is to focus on moving structured data around, as foundation for what’s going on at JeeLabs.

So let’s start on the JeeNode side, with the simplest aspect of it all: generating Bencoded data. I’ve set up a new library on GitHub and am mirroring it on Redmine. It’s called “EmBencode” (embedded, ok?). You can “git clone” a copy directly into your Arduino’s “libraries” folder if you want to try it out, or grab a ZIP archive.

This serialSend sketch sends some data from a JeeNode/Arduino/etc. via its serial port:

Screen Shot 2012 09 29 at 20 20 32

Note that we have to define the connection between this library and the Arduino’s “Serial” class by defining the “PushChar” function declared in the EmBencode library.

One thing to point out, is that this code uses the C++ “function overloading” mechanism: depending on the type of data given as argument to push(), the appropriate member function for that type of value gets called. The C++ compiler automagically does the right thing when pushing strings and numbers.

Apart from that, this is simply an example which sends out a bit of data – i.e. some raw values as well as some structured data, each one right after the other.

Here’s what you’ll see on the IDE’s serial console monitor:

    5:abcde3:123i12345eli987eli654eei321ee
    i999999999ed3:onei11e3:twoi22ee4:bye!

(the output has been broken up to make it fit on this page)

It looks like gobbledygook, but when read slowly from left to right, you can see each of the calls and what they generate. I’ve indented the code to match the structures being sent.

You can check out the EmBencode.h header file to see how the encoder is implemented. It’s all fairly straight-forward. More interestingly perhaps, is that this code requires no RAM (other than the run-time stack). There is no state we need to track for encoding arbitrarily complex data structures.

(Try figuring out how to decode this stuff – it’s quite tricky to do in an elegant way!)

Tomorrow, I’ll process this odd-looking Bencoded data on the other side of the serial line.

  1. I know a few haskell libraries which would really make this a breeze. see: http://hackage.haskell.org/package/bencode However, I don’t think that a lot of people are going to learn Haskell just to do this.

    • FP languages such as Haskell take a different mindset. I’ve never been able to flex my mind far enough to “get it”. The package you mention intrigues me, but it’s still 340 lines of code. And I literally can’t make heads or tails of it – though sometimes I wish I could…

  2. Hi JC

    Instead of requesting the user to define it’s own Bencode::PushChar(), I would suggest making the EmBencode a pure abstract class with PushChar being a pure virtual member function. User whould can easily create derived classes where they can define their implementation of PushChar.

    It might be interresting to make EmBencode compatible with any class derived from Stream in order to directly use all Function functions.

    Ideally being able to define MySerialBencode class as derived from HardwareSerial and EmBencode would make it easy to use.

    • Yes, that’s indeed an option. C++ templates would support even more flexibility. I just wanted to explore a different path this time. Virtual classes add some runtime and memory overhead, although I agree that in this case it’s not such a big deal.

      Note that standard Bencode only supports strings (including binary data with 0-bytes in it), signed integer numbers (any precision) and dict/list structures. I’m not sure how much we could benefit from a mix-in with the Stream class hierarchy. Also, I’m trying to replace the serial character-based comms with packetised comms, so sending out anything other than these packets over the same line would confuse the other (decoding) end.

    • Also, I’m trying to replace the serial character-based comms with packetised comms, so sending out anything other than these packets over the same line would confuse the other (decoding) end.

      That would be nice, using bencode all the time!

      I already wanted to use this to interface my JeeNodes to HomeSeer, as it standadizes the interface independent of the nodes number of sensors etc…

  3. Functional languages take a “more mathematical” mindset. The basic idea is to program a description of what you want to get done instead of programming how to do it.

    FP and especially Haskell, work “almost” the same way that real math works. Still I wouldn’t recommend diving into it for control applications on the scale of JeeLabs because it’s just to complex an plain weird at a first glance. Also, if you want the rest of the world to be able to write extensions to you programs, FP is definitely not the way to go right now. Also IO and everything that uses “states”, is basically non-existent in “a traditional way” (Like in C, Java, Python etc.). Not very comfortable for a home control system.

    But then again, you (JCW) might actually like it, because if you put on some pink sunglasses TCL is somewhat like Functional Programming.

    However, if you want materials to dive into it, there is an introduction course on Utrecht University and the materials are freely available here: http://www.cs.uu.nl/wiki/FP/WebHome (warning, most of it is in dutch) and the book used is here: http://www.cs.uu.nl/wiki/bin/viewfile/FP/CourseLiterature?filename=fp.pdf

    If you are really just getting started, try the very basic interpretor Helium first, because it provides very helpfull answers. Description here: https://en.wikipedia.org/wiki/Helium_%28Haskell%29 downloadable from here: http://www.cs.uu.nl/wiki/Helium

    To understand the package I linked to above, you will need to have a firm grasp on the basics of Haskell, but you’ll also need to study the material from the “Languages and Compilers” course. Course website is here: http://www.cs.uu.nl/wiki/TC/WebHome and the course “handbook” is here: http://www.cs.uu.nl/wiki/pub/TC/CourseMaterials/TC-20111101.pdf (at least that one is in English)

    I think I’ve posted enough materials around here to keep anyone who’s willing to dive into it busy for at least two months… and I’ve barely scratched the surface of what’s possible. For example: You can write a (very) basic Java to machine code compiler on your own in under two weeks if you master this stuff.

    Happy hacking! But it’s definitely not for starters……

  4. I’ve written a bencode decoder (possilybly incomplete) in python a few weeks ago to decode .torrent files and generate the equivalent python data structure. It’s only 40 lines long. You can take a look at http://weber.fi.eu.org/tmp/bt.py if you want. Its major drawback for embedded systems is that it is recursive so it takes stack space, which is equivalent to maintaining a state.

  5. This is great! I wanted to send structured data wirelessly from some Arduinos to a Ruby app and tried using a subset of the JSON spec. The implementation exceeded my memory requirements so I ended up with a custom encoding scheme which I’m not that happy with.

    I’m going to rebuild and give embencode a try instead, Thanks!

  6. Is it reasonable to allow the character generality of UTF-8 rather than be stuck with 7bit ASCII encoding for strings? UTF-8 maps to ASCII in that code page and is self-announcing otherwise – count then becomes glyph_count rather than byte_count.

    Bonne idée?

    • The integer sent in front of each string has to be a byte count, because that’s the only way to treat strings as opaque data. IOW: yes, some strings can be assumed to be UTF-8 encoded by the receiver, but Bencode cannot know which ones are and hence cannot switch to using glyph counts. They need to be recomputed on the receiver end (or passed along as separate int). Any variable-sized encoding will have the same issue – only raw bytes can be transmitted.

      Note that there is nothing ASCII or 7-bit specific in this protocol. It wraps bytes and anything having more structure than that needs to apply that knowledge either implicitly or by sending the encoding name as separate string (in ASCII, so you don’t end up with a chicken-and-egg problem!).

Comments are closed.