Computing stuff tied to the physical world

Bencode in Lua, Python, and Tcl

In Software, Linux on Oct 4, 2012 at 00:01

Ok, let’s get on with this Bencode thing. Here is how it can be used on Linux, assuming you have Lua, Python, or Tcl installed on your system.

Let me show you around in the wonderful land of rocks, eggs, and packages (which, if I had my way, would have been called rigs … but, hey, whatever).

Lua

Lua’s Bencode implementation by Moritz Wilhelmy is on GitHub. To install, do:

    luarocks install bencode

(actually, I had to use “sudo”, this appears to be solved in newer versions of LuaRocks)

LuaRocks – a clever play on words if you think about it – is Lua’s way of installing well-known packages. It can be installed on Debian using “sudo apt-get install luarocks”.

The package ends up as as a single file: /usr/local/share/lua/5.1/bencode.lua

Encoding example, using Lua interactively:

    > require 'bencode'
    > s = {1,2,'abc',{234},{a=1,b=2},321}
    > print(bencode.encode(s))
    li1ei2e3:abcli234eed1:ai1e1:bi2eei321ee

To try it out, we need a little extra utility code to show us the decoded data structure. I simply copied table_print() and to_string() from this page into the interpreter, and did:

    > t = 'li1ei2e3:abcli234eed1:ai1e1:bi2eei321ee'
    > print(to_string(bencode.decode(t)))
    "1"
    "2"
    "abc"
    {
      "234"
    }
    {
      a = "1"
      b = "2"
    }
    "321"

(hmmm… ints are shown as strings, I’ll assume that’s a flaw in to_string)

Python

Python’s Bencode implementation by Thomas Rampelberg is on PyPi. Install it as follows:

    PKGS=http://pypi.python.org/packages/2.7
    easy_install $PKGS/b/bencode/bencode-1.0-py2.7.egg

It ends up as “/usr/local/lib/python2.7/dist-packages/bencode-1.0-py2.7.egg” – this is a ZIP archive, since eggs can be used without unpacking.

This depends on easy_install. I installed it first, using this magic incantation:

    wget http://peak.telecommunity.com/dist/ez_setup.py
    python ez_setup.py

And it ended up in /usr/local/bin/ – the standard spot for user-installed code on Linux.

Now let’s encode the same thing in Python, interactively:

    >>> import bencode
    >>> bencode.bencode([1,2,'abc',[234],{'a':1,'b':2},321])                        
    'li1ei2e3:abcli234eed1:ai1e1:bi2eei321ee'

And now let’s decode that string back into a data structure:

    >>> bencode.bdecode('li1ei2e3:abcli234eed1:ai1e1:bi2eei321ee')                  
    [1, 2, 'abc', [234], {'a': 1, 'b': 2}, 321]

Note: for another particularly easy to read Python decoder, see Mathieu Weber‘s version.

Tcl

Tcl’s Bencode implementation by Andreas Kupries is called Bee and it part of Tcllib.

Tcllib is a Debian package, so it can be installed using “sudo apt-get install tcllib”.

Ok, so installation is trivial, but here we run into an important difference: Tcl’s data structures are not “intrinsically typed”. The type (and performance) depends on how you use the data, following Tcl’s “everything is a string” mantra.

Let’s start with decoding instead, because that’s very similar to the previous examples:

    % package require bee    
    0.1
    % bee::decode li1ei2e3:abcli234eed1:ai1e1:bi2eei321ee
    1 2 abc 234 {a 1 b 2} 321

Decoding works fine, but as you can see, type information vanishes in the Tcl context. We’ll need to explicitly construct the data structure with the types needed for Bencoding.

I’m going to use some Tcl trickery by first defining shorthand commands S, N, L, and D to abbreviate the calls, and then construct the data step by step as a nested set of calls:

    % foreach {x y} {S String N Number L ListArgs D DictArgs} {
        interp alias {} $x {} bee::encode$y
    }
    % L [N 1] [N 2] [S abc] [L [N 234]] [D a [N 1] b [N 2]] [N 321]
    li1ei2e3:abcli234eed1:ai1e1:bi2eei321ee

So we got what we wanted, but as you can see: not all the roads to Rome are the same!

  1. Every time I look at Bencoded data I think “Why not JSON?”. Did you already answer this question?

  2. Perhaps you could expand on your rather terse dismissal of JSON? Yes I have read your link above. thank you.

  3. JSON does not handle binary data. Sending floats as text requires conversion, which can be expensive. Quoted strings means the upper bound on sending arbitrary data doubles. Encoding and parsing JSON is (somewhat) more involved. Scanning a string is not O(size of string) but O(complexity of string).

    The most important one to me is probably the binary data. With quoting, you have to deal with two representations for the same thing, which makes it hard to implement zero-copy mechanisms (passing pointers and refs to avoid copying data all the time). The parsing comes back at every corner, whereas with raw data you can just hand it to the consumer and get out of the way.

    Anyway – whatever I choose, people will disagree.

  4. Thanks for the insights. Personally I don’t see binary data as all that important since I am mostly doing serial port comms. And floating point is ‘expensive’ on the AVR already since it is emulated in software. I wonder if there might be a subset of JSON that would be useful yet practical to parse — it seems most of your concerns are in the parsing. It seems like the upside of JSON might allow non-geeks to access Jeenodes from their browsers.

    I hope i don’t appear disagreeable, that was not my intent at all, but rather to get some insights. thank you.

    • It’s a tough trade-off, for sure. But I’m trying to cover a very large range: simple use for 8-bit ┬ÁC’s, yet also sending code images / sketches for uploading, and in fact even multi-MB image data (no longer in the 8-bit realm, clearly).

      On the high-end, Bencode will even let me do random access in vectorised data, which could go all the way to acting as database container with memory-mapped files. But that’s uncharted territory, and not something I’ll be exploring any day soon…

  5. I’m getting to like bencode more and more due to its ability to handle binary ‘blobs’!

    I’m aiming on using bencode for all my JeeNodes and trying to ‘kill’ specific JeeNode JeeNodes like the Roomboard into generic sensors (temp, humidity and movement in the case of the roomboard) so I can use generic decoders/interpreters on the PC side…

    This is however a bit of work, but with the binary blobs, can pack the payload data from ‘old’ JeeNodes (those not using the bencode messages) into al simple bencode formatted envelope (contains type of message for the message diaptcher on the PC)

    Now I can use a simple dispatcher on my PC with as little effort as possible: little changes to existing JeeNodes, and no change at all for the decoders that interpret the actual (binary in this case) message!

    Bencode rocks

  6. Did you look at MessagePack?

    • Yes, I’ve seen it, thx. Resembles BSON, from what I can tell.

      Personally, if I had to design a format, I’d choose a variable int format instead of 10 different types. For scalar ints anyway, whereas for vectors, I’d prefer an even wider range of 1/2/4/8/16/… bit widths.

Comments are closed.