DSLs and expressiveness Feb 2016

The KEY design choice in Forth is its dual data- and return-stack. Forth is a stack-oriented programming language. Another characterisation is that Forth is a concatenative language. Here’s what this means:

Data is manipulated on a stack - this example pushes “1”, then “2” on the stack, then applies the “+” operation on those two values, replacing them with their sum, and ends by running the “.” operation which pops-and-prints the top of the stack, i.e. “3”:

1 2 + .

Suppose we have this sequence, using the “/” division operator:

10 2 / . 20 2 / . 30 2 / .

The output will be “5 10 15”. What about this, then?

10 2 / 20 2 / 30 2 / . . .

A little mental exercise will tell you that it will print out “15 10 5”. Now try this:

10 2 20 2 30 2 / / / . . .

In this case, the output will be: “0 2 10” (whoops, that was a division by zero!) . One more:

10 2 20 2 30 2 / . / . / .

Output: “15 10 5”. Hey, it’s the same as two examples back! What happened?

It may seem like silly mental gymnastics, but there’s something very deep going on here, with far-reaching implications. The first thing to note is that operations (they tend to be verbs) take input (if any) from the top of the data stack and leave output (if any) on the stack in return. Not just numbers: strings, pointers, objects, anything.

That’s where the concatenative aspect comes in: operators can be combined into new ones without having to know what they do to the stack! - we can define a “2/” operator for example:

: 2/ 2 / ;

Looks quirky, eh? What this means is: “when you see a “2/” operator (yes: “2/” is a valid name in Forth, see below), execute “2” followed by “/”. Now that first sequence above can be written as:

10 2/ . 20 2/ . 30 2/ .

Not very useful, but now we could make it more wordy - for example:

: half 2 / ;
10 half . 20 half . 30 half .

We could even define things like “: 3div / / / ;”, or “: triple-dot 3 0 do . loop ;” !

Here’s another example (let’s assume that “delay” expects milliseconds as input):

: seconds 1000 * ;
12 seconds delay

You can see how carefully chosen (natural?) names can lead to fairly readable phrases.

It’s still not a very sophisticated example, but the point is that you can look at any Forth source code and simply scan for repetitions without having a clue about the inner details. Any sequence occurring more than once is a hint that there’s something more general going on. Turning that into a definition with a more meaningful name is a very Forth’ish thing to do. And guess what? As you implement more code in Forth, the process becomes second nature as you write!

And that’s really what Forth is about: creating a Domain-Specific Language (DSL) while writing code. As you formulate your logic in Forth words, you invent steps and come up with names for them. Don’t look back too much, just keep on writing. At some point, any (near-) repetition will show through. Then you can redefine things slightly to make the same phrases become more widely usable. And before you know it, you’re in a universe of your own. Writing in terms uniquely adapted to the task at hand, combining very small existing pieces into larger ones.

The big surprise is that this effort coincides with getting to grips with the logic of a problem.

It’s not uncommon to write dozens, or even hundreds of new word definitions, as you progress. One thing to keep in mind is that Forth is essentially bottom-up: you can only use words which have been defined before it (there are ways around this, writing recursive calls is in fact easy).

There’s another intriguing property of this stack-based approach: there are no local variables! This might seem like a disadvantage, but it also means that you don’t have to invent little names all the time - the code becomes breath-takingly short because of this. All emphasis should go to defining lots of words with well-chosen action names, each operating on only a few stack entries.

As you can see, Forth code can look a bit unusual to the untrained - C/C++ influenced - eye:

These define the words used to perform low-level I/O pin operations. The line “: io ...” defines “io” as a new function, and text enclosed in parentheses is treated as a comment.

In Forth, there is only one syntax rule: a “word” is anything up to the next whitespace character. Some words can take over and read stuff after them, in which case the effects can be different. That’s why the “\” word can act as a comment: it eats up everything after it until the end-of-line.

These definitions are not so much an implementation (although, that too) as they are about defining a vocabulary and a notation which fits the task at hand - in this case defining I/O pins and getting/setting/flipping their value - “@” is a common idiom for fetches and “!” for stores.

And what to make of this implementation of bit-banged I2C, using the above?

Yes, this will look daunting at first, but keep in mind the concatenative nature of it all. You can essentially ignore most of these lower-level definitions. The example at the bottom illustrates this - all you need to look at, are these two lines, consisting mostly of comment text:

: rtc! ( v reg -- ) \ write v to RTC register
  ... ;
: rtc@ ( reg -- v ) \ read RTC register
  ... ;

The “( reg -- v )” comment documents the “stack effect”, i.e. what this functions expects on the stack (reg) and what it puts back in return (v). Note that in a way, local variable names have crept back in as comments, but only for documentation purposes (and often as a data type, not a name). The code itself still runs off the stack, and is extremely concise because of it.

This is how to read out the seconds value from register 0 of the RTC over I2C and print it out:

0 rtc@ .

Everything else can be ignored - as long as the underlying code works properly, that is!

Is Forth low-level? Is it high-level? It definitely seems able to bridge everything from super-low bare silicon hardware access to the conceptually abstract application-level logic. You decide…

Weblog © Jean-Claude Wippler. Generated by Hugo.