Let's trace, profile, and patch May 2017
Although Mecrisp Forth “compiles” source code to machine language, it’s not
really a compiler in the traditional sense, as with say C or C++. In Forth, the
compilation process and runtime execution are deeply intertwined. Words like
while”, and even “
(” + “
)” comments are special in the
sense that they affect the compilation process, but they’re still “words”,
i.e. bits of functionality which can be defined (and re-defined) in Forth
itself - even on-the-fly!
This lends itself to all sorts of neat tricks, which can provide new features
with surprisingly little extra code. One such example is tracing, here is a
definition which calls
: ta 2 . ; ok. : tb 4 . ; ok. : tc 1 . ta 3 . tb 5 . ; ok.
Here’s the output it normally generates:
tc 1 2 3 4 5 ok.
And here’s what happens when tracing is enabled:
trace-on ok. tc --> ( 1 ) --> tc Stack: [0 ] TOS: 42 *> 1 --> ( 2 ) --> ta Stack: [0 ] TOS: 42 *> 2 <-- ( 2 ) Stack: [0 ] TOS: 42 *> 3 --> ( 2 ) --> tb Stack: [0 ] TOS: 42 *> 4 <-- ( 2 ) Stack: [0 ] TOS: 42 *> 5 <-- ( 1 ) Stack: [0 ] TOS: 42 *> ok.
Every call, and every exit from these words prints out the word and the contents
of the data stack at that point in time. Note that the definitions of
tc were not changed!
The magic is performed by the
code, which needs to be inserted before the above definitions. This code will
override the “
:” word to insert a call a special
Similarly, a call to
trace-exit will be inserted before “
;”, which is also
re-defined. All words defined after this special trace code will have this
trace capability compiled into them.
The actual mechanics of this trick is fairly advanced stuff, but it’s interesting that a tracing facility can be implemented outside the Mecrisp core itself, just using some clever Forth code!
A similar, but even more elaborate trick is performed by profile.fs, which lets us track how often words get executed, and then later on print out these counters:
: ta ; : tb 0 do ta loop ; Redefine (cnt). ok. : tc dup 0 do dup tb loop drop ; Redefine (cnt). ok.
Given arg N,
tc will be called once,
tb will be called N times, and
tc will be called N * N times:
100 tc ok.
profile, all these counts will be printed out (in reverse order,
and in hex):
profile 200026D4 00000001 tc 2000267C 00000064 tb 20002630 00002710 ta ok.
This is implemented by creating a new
(cnt) variable before each new definition
and inserting code to increment that variable on each entry. Here are the
last words in RAM, latest-to-first:
words Address: 200026D4 Link: 200026B8 Flags: 00000000 Code: 200026DE Name: tc Address: 200026B8 Link: 2000267C Flags: 00000040 Code: 200026C4 Name: (cnt) Address: 2000267C Link: 20002660 Flags: 00000000 Code: 20002686 Name: tb Address: 20002660 Link: 20002630 Flags: 00000040 Code: 2000266C Name: (cnt) Address: 20002630 Link: 20002614 Flags: 00001820 Code: 2000263A Name: ta Address: 20002614 Link: 200025CC Flags: 00000040 Code: 20002620 Name: (cnt) Address: 200025CC Link: 2000252C Flags: 00000000 Code: 200025D4 Name: : Address: 2000252C Link: 200024A4 Flags: 00000000 Code: 2000253A Name: profile [...]
Note that all the variables used to store these call counts have the same
(cnt)” name. This is harmless, since the profiler will iterate through the
entire dictionary anyway, while printing out each count and the name of the
corresponding word defined right next to it.
Profiling can be a great way to decide which words need to be optimised for top performance, but it will prevent the Mecrisp compiler from applying its constant folding tricks and such. You’re likely to need several iterations to identify and then optimise the key words in your app, while switching back to non-profiled code to measure the exact performance improvements.
The last trick described here takes things one step further still, by making all words runtime-patchable. This addresses an issue which occasionally comes up during development:
Say you have an app with lots of code (i.e. words), and at some late stage, it turns out that there is a bug “somewhere” in there. Let’s also assume that the code won’t fit in RAM, and that it’s a bit inconvenient to re-build (and re-flash) the entire application from scratch each time.
Sometimes, you just want to tinker with one specific word, deep inside your application: to make it print out some values, or to try out some code changes. The problem is that although that word can easily be compiled again, even interactively, all the other words using it will still point to the original code. This is a consequence of Forth’s “code as a stack” dictionary design.
As an example, let’s assume
foo is supected to contain a bug and
: foo ... ; : bar ... foo ... ;
To try out a modified
foo, we have to change it and recompile both
bar to see the effect. But what if the rest of the app is built on top and
depends even more on
foo, directly or indirectly? The benefit of fast
interactive compilation won’t be of any help in such a case.
We could do something like this to allow patching after the fact:
: (foo) ... ; ' (foo) variable 'foo : foo 'foo execute ; : bar ... foo ... ;
This would continue to work (with a slightly higher call overhead), while also
allowing the replacement of the original
foo code with a new version:
: foo2 ... ; ' foo2 'foo !
But it gets tedious if you don’t know yet which word may have to be re-routed later on, during debugging, and have to prepare a whole slew of word patches. That’s where routes.fs comes in.
routes.fs package installed – just as with
profiler.fs – the “
:” definition is modified to generate a branch at the
start of each word, inserting an extra variable called “
(vec)” to allow
“vectorising” the word.
So for example, with
routes.fs in place, we can define words
tc in the same way as at the start of this article:
: ta 2 . ; ok. : tb 4 . ; Redefine (vec). ok. : tc 1 . ta 3 . tb 5 . ; Redefine (vec). ok.
In normal use, nothing changes:
tc 1 2 3 4 5 ok.
But now we can make changes to words called by
tc, as this example
: ta' 22 . ; route ta ta' Redefine (vec). ok. tc 1 22 3 4 5 ok. : tb' 44 . ; route tb tb' Redefine (vec). ok. tc 1 22 3 44 5 ok. unroute ta ok. tc 1 2 3 44 5 ok. routes tb --> tb' ok.
Note: the “
Redefine (vec).” warnings are caused by the fact that the
is still overriding the “
:” word, so even these new definitions are being made
patchable. It’s harmless.
What you can see, is that although
tc depends on
tb as defined
right before it, we can override those two with new definitions, and
tc will then
happily call the new code patches!
Again, the implementation for this re-routing trick is fairly involved: a new variable is defined,
and then each word checks whether that variable is non-zero. If so, it’ll
call that value instead (using
execute), else the original code is used. The
tricky bit is keeping things matched up, and offering a way to set the proper
variable to a specified value (even when all are called “
Tracing, profiling, and re-routing/patching all use similar tricks (and cannot be combined). The impact on code size and performance will be the subject of the next article…