The JET data store, take 2 Feb 2016

This article supersedes this one - but it’s probably still useful to read that original article first.

While many of the design choices remain the same, the API has changed a bit. The description below matches what is currently implemented on GitHub and included in the latest release.

Terminology and semantics

What hasn’t changed, is the way the data store is tied into MQTT. The hub listens for messages matching “!/#” and “@/#” topic patterns and interprets these as stores and fetches, respectively.

Here is how to store the text “abc” in an item “c” inside a bucket “b”, which is in turn inside bucket “a” at the top level:

jet pub '!/a/b/c' abc

(the topic has to be quoted, because “!” has special significance in the shell)

The term bucket is from the underlying BoltDB package. As you can see, this looks very much like storing text in a file “c” in the directory “/a/b/”. From now on, we’ll stop calling these nested level buckets, and use the term directories and directory paths instead. But to avoid confusion with real files, let’s also continue to call “c” an item, and not a file. It’s not accessible via the file system after all, it only exists somewhere inside the BoltDB data file.

There is one important limitation in BoltDB: it can only store directories at the top level. Items must be placed inside a directory, i.e. “!/c” is an invalid item reference, since it’s not in a dir.

Another convention added in this redesign, is that directories must always be specified with a trailing slash, whereas items may not end in one. So “!/a/b/” and “@/a/b/” refer to the (nested) sub-directory “b”, while “!/a/b” and “@/a/b” refer to the item “b”. Also, empty names are no longer allowed, i.e. a path cannot contain two slashes next to each other (“…//…” is invalid).

Extracting / fetching data is a two-step process: you send a message with a topic corresponding to the item of interest, and as payload a “reply topic”. Like this, for example:

jet pub @/a/b/c '"abcde"'

Note the extra double quotes, “abcde” is a JSON string which names the topic where the reply will be sent (it should not normally start with “@/…”!). To see what’s happening, we have to subscribe to that topic before sending out the fetch request, i.e. by keeping a separate terminal window open and running this command:

jet sub abcde

No double quotes this time, the topic is always a plain string, not JSON. If we now re-send that “jet pub '@/a/b/c' '"abcde"'” request, we’ll see this output appear in the subscription:

abcde = abc

In summary: to store a value, send it as payload to the proper “!/…” topic. To fetch a value, set up a subscription listener to pick up the reply, then send a message to the proper “@/…” topic and specify our listener topic as payload, formatted as JSON.

This approach turns MQTT into an RPC mechanism for the data store. Any MQTT client can use the data store if it adheres to the above convention, this is not limited to JetPacks. As long as the hub is active, the store will process these requests.

Payload considerations

MQTT topics are always plain text strings, with “/” to segment the key space. Null bytes and control characters should be avoided, but UTF-8 is fine.

MQTT payloads can be anything: plain text, JSON-formatted text, or binary data. The same holds for the data store: it takes a number of bytes, whatever their format might be, and returns them as is. There is no hard limit for the size of a payload.

Note that in JET, many parts of the system do expect JSON-formatted payloads. For numeric values, there is no difference, but strings will need to be double-quoted when this is the case.

Storing data

As already shown above, you store an item by sending it as “!/…” message:

jet pub '!/a/b/c' 123

If the item exists, it will be overwritten. If the directory “/a/b/c/” exists, you’ll get an error - items and directories cannot have the same name.

All intermediate directory levels are automatically created if necessary. Again, this will fail if any of the directory names already exist as item names.

You can also store multiple items in one go, by storing a JSON object to a directory. The above could also have been written as:

jet pub '!/a/b/' '{"c":123}

Same effect, and to store multiple items, we could have done:

jet pub '!/a/b/' '{"c":123,"d":456}'

This creates (or overwrites) two items in the “/a/b/” directory. This is an atomic operation: all the items are saved as part of a single transaction.

Multi-stores can be convenient to “unpack” an object into separate items, but since the request uses JSON, you can only use it to store JSON-formatted data. To store arbitrary text or binary data, you have to use the single version.

The following request is a no-operation, except that it will create “/a/b/” it it didn’t exist:

jet pub '!/a/b/' '{}'

Note that a multi-store does not affect other items in the same directory. Items are “merged into” the directory, leaving the rest unchanged, it does not delete anything. Speaking of which…

Deleting data

Deleting an item is done by sending it an empty payload:

jet pub '!/a/b/c' ''

Or, equivalently:

jet delete '!/a/b/c'

Note that this cannot be done via a multi-store:

jet pub '!/a/b/' '{"c":""}'

This will store the empty JSON string (with its double quotes), not a zero-length payload, which may not be what you had in mind.

You can also delete a directory and everything it contains, including any sub-directories, by sending the empty payload to the directory:

jet pub '!/a/b/' ''

As before, the item vs. directory distinction is made through the trailing slash.

Empty payloads

As you can see, empty payloads play a special role. This is not the same as the empty JSON string (“”) or even JSON’s “null”, which consists of a small number of bytes, even if they represent “nothingness”.

Storing empty payloads deletes stuff from the store. But since fetching a non-existing item also returns the empty payload, you can often ignore this behaviour. The only difference is in directory listings, as described below.

Fetching data

The fetching behavious of the store has already been described above, but for completeness, here is a quick example anyway:

jet pub @/a/b/c '"abcde"'

This is what will happen when this message is sent:

If the item did not exist, an empty payload will nevertheless be sent. But in case “/a/b/” doesn’t exist, the hub will report an error on its log instead, and not send anything back.

Listing directories

One request type has not yet been presented. The data store also offers a way to scan its contents, allowing you to enumerate all items in either the top level or any existing directory.

This again, uses the “@/…” notation, with a reply topic as payload. The difference is that now the topic refers to a directory. An example:

jet pub @/a/b/ '"abcde"'

The result, as reported by “jet sub abcde”, might be something like:

abcde = {"a":2,"b":0,"c":4}

Here, “/a/b/” contains items “a” and “c”, with payloads of size 2, and 4, respectively, as well as a subdirectory “b”. Subdirectories always have zero size, which is never the case for normal items.

Names are stored in sorted order (sorted as raw bytes that is, not UTF-8 or anything fancy), but JSON object attributes aren’t always kept in order (they’re usually implemented as hash tables).

More advanced searches - such as ranges and globs - can be implemented later, by passing in more information than just a reply topic string. This could also be used for on-the-fly statistics, i.e. scanning and summarising data on the hub, and reporting only the resulting metrics.

Reply topics

As you can see, all accesses require some reply topic to get the results back to the requesting app. These topics should be unique, to avoid confusion about which reply relates to which request.

The plan is to have a convention for any JetPack to easily come up with such reply topic names, and to add some utility code which will wait for a reply and timeout if nothing comes in quickly. Since each JetPack has a unique name when it connects to MQTT, and since the hub manages these names when it starts them up, we can probably choose topics with the following structure:

packs/<packname>/replies/<seqnum>

This way, each pack can easily track and issue its own sequence numbers. Other (non-JetPack) applications will have to come up with their own unique reply topics.

For the time being (early Feb 2016), reply topics are not yet automated.

Weblog © Jean-Claude Wippler. Generated by Hugo.