Forth in 7 easy steps Mar 2016
Forth is a simple language, with simple rules and a simple execution model. It’s also self-hosted and interactive - you can type this at the prompt:
1 2 + .<cr>
What you’ll get as response is this (details may differ, everything below is for Mecrisp Forth):
1 2 + . 3 ok.<lf> ^^^^^^^^^^^ added by Forth
All code is parsed into a sequence of “words”, which is anything between
whitespace. The above code consist of 4 words:
. - these
words operate on a data stack. Numbers “push themselves” onto the stack, i.e.
2. The word
+ is predefined, it replaces the two stack
entries by a single one, their sum. The word
. pops the top value off the
stack and prints it as number followed by a space.
Since there are no more words left, Mecrisp then prints
ok. followed by a
linefeed, and waits for a new line of text to parse, so it can process more
That’s it. The essence of Forth. A few more steps like this, and you’ll have the complete story.
To define this as a new
demo word, type this (let’s omit the obvious
<cr> from now on):
: demo 1 2 + . ;
: word starts a definition and
; ends it. Nothing interesting
happens, but now this works:
demo 3 ok. ^^^^^^ added by Forth
New words must be defined in terms of existing ones. Here’s another definition:
: JC's-triple-demo demo demo demo ;
Word names can be anything other than whitespace (including UTF-8, and even
2 - but that could lead to major confusion).
That’s step 2, now you know how to extend the language.
The definition of words is an important mechanism. You’ve already seen the data stack, but there is also a stack for words and their compiled code, which is called the “dictionary” in Forth. New words are added to the end, and words are looked up in reverse order, so that the last one will be used when words are re-defined.
: word is quite special: it will parse the next word in the input
stream and add it as a fresh definition to the dictionary. It also sets a
“state” flag to compile mode. And then it returns to the main loop in Forth to
process all remaining words.
And here’s The Big Trick, part 1: the main loop will parse more words, but since the state flag is set, it will append calls to these words to the dictionary instead of executing them.
At some point, Forth will need to finish the definition and return to run mode.
The Big Trick, part 2: a word can be marked as “immediate”. When this is the
case, it overrides the state logic in the main loop, and gets executed right
away, even in compile mode. So there’s an immediate
; word in Forth, which
does two things: add a “return” statement to the end of the dictionary, and
reset the state flag to zero.
Immediate words enable magical behaviour in Forth, because they’ll switch back to run mode during compilation. They can do anything (or to be more precise: they are in fact the compiler).
That’s step 3. This is how Forth unifies “run mode” and “compile mode”.
Forth has conditionals and loops. Here’s a rewritten version of the above:
: looping-demo 3 0 do demo loop ;
This should be readable by now:
0 just get added to the stack,
pops them as loop limits (in funky reverse order), then comes the loop body,
loop which presumably knows how to count and repeat, and the
closing semicolon to finish the definition of
It should come as no surprise that
loop are immediate words. They
append code to the dictionary to implement the do loop and use the data stack
(while in compile mode) to track branch offsets. There’s also a word called
to push the current loop value on the stack. As you can see, common words in
Forth tend to have very short names.
This example loops until a key is pressed (
key? pushes a flag on
the stack, which
: boring-demo begin demo key? until ;
begin, etc generate code, they can only be used inside
word definitions. You can’t use them interactively, i.e. in run mode, but you
can enter a definition and call it, all on one line.
Immediate words are also used to implement
then, and several
other jump-based words. The funky order of
then takes a little
getting used to, but it’s trivial stuff:
: even-demo 123 2 mod 0 = if ." EVEN!" then ; : even-odd-demo 123 2 mod 0 = if ." EVEN!" else ." ODD!" then ;
Several new words have been used here. The
mod word is used to calculate
= will compare two values, and
." ..." prints a string. Note
the space after the opening quote: in Forth, everything is a word, so the
print-string word is called
." and must be enclosed in spaces. Note also that
the closing quote does not need a space in front, because the
plays special tricks with parsing.
Such unconventional syntax details come from the fact that Forth uses stacks and treats everything as words - the price of a simple uniform data + parsing model.
Congratulations, this was step 4, with a little peek inside the compiler!
Words can call other words, and do loops can be nested (it might help to view do loops as a special way to “call” their body repeatedly).
This nesting is actually what all other languages also have, using a “return” stack. In Forth, the return stack is separate from the data stack. This is what gives the language its concatenative properties (a term coined decades after Forth was invented).
When a defined word is executed, it pushes the current instruction pointer on the return stack, and starts executing its own code. When it returns, it pops the instruction pointer back off the return stack and resumes where it left off. Do loops also use the return stack to store some state.
In day-to-day use, the data and return stacks are the only ones that matter. The dictionary (i.e. code stack) and a stack to allocate RAM variables from can safely be ignored most of the time.
What about other data? Here is a constant and a variable definition:
123456789 constant MY-CONST 987654321 variable my-var
Constants are just that, they can be used wherever a value is needed, and push their value onto the data stack when executed. The convention is to write them in uppercase, but Forth is case insensitive (for ASCII characters, not for UTF-8!), so it won’t matter during use.
A variable pushes its address. To fetch (and then print) the value, you need to
my-var @ . 987654321 ok. ^^^^^^^^^^^^^^^ added by Forth
To store a value, there’s the
! word, which expects a value and an address on the stack:
123 my-var ! my-var @ . 123 ok. ^^^^^^^^^ added by Forth
For allocating larger memory areas in RAM, there’s the
200 buffer: my-buffer
This sets aside a 200-byte word-aligned area in RAM. It remains available
as long as
my-buffer is in the dictionary. Executing
my-buffer will push
its buffer address on the (data) stack.
You’ve made it through step 5, now you know all about stacks and memory.
As you can imagine, there are a large number of words in Forth, each with their
own behaviour and stack effect. Inline comments between words called
) are normally used to document a word, followed by a
\ comment about what
it does. If
! were defined in Forth (it isn’t, it’s a primitive), it could
have been documented as follows:
: ! ( u|n a-addr -- ) \ Stores single number in memory ... ;
u|n means: an unsigned or signed integer, and
a-addr means aligned
address. Everything before the
-- is what is expected as stack input,
everything after is the stack result (nothing in this example). These are just
comments and conventions, Forth will skip all that.
+ might have been defined as:
: + ( u1|n1 u2|n2 -- u3|n3 ) ... ; \ Addition
Many words affect only the data stack, but a few mess with the return stack. Like so:
: >r ( x -- ) ( R: -- x ) ... ;
What this says is: the value on the data stack before calling
>r will end up
on the return stack afterwards: so
>r moves an item from the data stack to
the return stack (better get it off again with
rdrop before the
current word returns, else the code will probably crash!).
These stack effect comments are a critical part of the documentation of each word, since there are usually no local variables in Forth.
Here is the glossary of the pre-defined words in Mecrisp Forth. There are a few hundred of them, but no worries: you can explore and gradually expand your vocabulary - only a fraction of these are needed to start programming in Forth.
Yeay, step 6 - you’re all set to build up your Forth vocabulary!
The last step that remains, is to try things out and look for examples and more documentation on the web. See the “Dive into Forth” series, part one, two, and three for a recent exploration here at JeeLabs. The PWM module (documented here) shows one example of how to implement a hardware feature in Forth.
A last point to make, is that Forth lives extremely “close to the metal”. Any suggestion of high-level coding is purely smoke-and-mirrors. It has just enough machinery to be reasonably useful, and to let you compile and extend it with more definitions to do whatever you need.
And there you have it: Forth in sixteen hundred, ehm… words. Hopefully this intro can help you wrap your mind around an intriguingly powerful and concise programming language.