Let's trace, profile, and patch May 2017

Although Mecrisp Forth “compiles” source code to machine language, it’s not really a compiler in the traditional sense, as with say C or C++. In Forth, the compilation process and runtime execution are deeply intertwined. Words like “:”, “if”, “while”, and even “(” + “)” comments are special in the sense that they affect the compilation process, but they’re still “words”, i.e. bits of functionality which can be defined (and re-defined) in Forth itself - even on-the-fly!

Execution traces

This lends itself to all sorts of neat tricks, which can provide new features with surprisingly little extra code. One such example is tracing, here is a tc definition which calls ta and tb:

: ta 2 . ;  ok.
: tb 4 . ;  ok.
: tc 1 . ta 3 . tb  5 . ;  ok.

Here’s the output it normally generates:

tc 1 2 3 4 5  ok.

And here’s what happens when tracing is enabled:

trace-on  ok.
tc    --> ( 1 ) --> tc  Stack: [0 ]  TOS: 42  *>
1    --> ( 2 ) --> ta  Stack: [0 ]  TOS: 42  *>
2    <-- ( 2 ) Stack: [0 ]  TOS: 42  *>
3    --> ( 2 ) --> tb  Stack: [0 ]  TOS: 42  *>
4    <-- ( 2 ) Stack: [0 ]  TOS: 42  *>
5    <-- ( 1 ) Stack: [0 ]  TOS: 42  *>

Every call, and every exit from these words prints out the word and the contents of the data stack at that point in time. Note that the definitions of ta, tb, and tc were not changed!

The magic is performed by the trace.fs code, which needs to be inserted before the above definitions. This code will override the “:” word to insert a call a special trace-entry word. Similarly, a call to trace-exit will be inserted before “;”, which is also re-defined. All words defined after this special trace code will have this trace capability compiled into them.

The actual mechanics of this trick is fairly advanced stuff, but it’s interesting that a tracing facility can be implemented outside the Mecrisp core itself, just using some clever Forth code!

Call profiles

A similar, but even more elaborate trick is performed by profile.fs, which lets us track how often words get executed, and then later on print out these counters:

: ta ;
: tb 0 do ta loop ; Redefine (cnt).  ok.
: tc dup 0 do dup tb loop drop ; Redefine (cnt).  ok.

Given arg N, tc will be called once, tb will be called N times, and tc will be called N * N times:

100 tc  ok.

When calling profile, all these counts will be printed out (in reverse order, and in hex):

200026D4 00000001 tc
2000267C 00000064 tb
20002630 00002710 ta

This is implemented by creating a new (cnt) variable before each new definition and inserting code to increment that variable on each entry. Here are the last words in RAM, latest-to-first:

Address: 200026D4 Link: 200026B8 Flags: 00000000 Code: 200026DE Name: tc
Address: 200026B8 Link: 2000267C Flags: 00000040 Code: 200026C4 Name: (cnt)
Address: 2000267C Link: 20002660 Flags: 00000000 Code: 20002686 Name: tb
Address: 20002660 Link: 20002630 Flags: 00000040 Code: 2000266C Name: (cnt)
Address: 20002630 Link: 20002614 Flags: 00001820 Code: 2000263A Name: ta
Address: 20002614 Link: 200025CC Flags: 00000040 Code: 20002620 Name: (cnt)
Address: 200025CC Link: 2000252C Flags: 00000000 Code: 200025D4 Name: :
Address: 2000252C Link: 200024A4 Flags: 00000000 Code: 2000253A Name: profile

Note that all the variables used to store these call counts have the same “(cnt)” name. This is harmless, since the profiler will iterate through the entire dictionary anyway, while printing out each count and the name of the corresponding word defined right next to it.

Profiling can be a great way to decide which words need to be optimised for top performance, but it will prevent the Mecrisp compiler from applying its constant folding tricks and such. You’re likely to need several iterations to identify and then optimise the key words in your app, while switching back to non-profiled code to measure the exact performance improvements.

Patchable code

The last trick described here takes things one step further still, by making all words runtime-patchable. This addresses an issue which occasionally comes up during development:

Say you have an app with lots of code (i.e. words), and at some late stage, it turns out that there is a bug “somewhere” in there. Let’s also assume that the code won’t fit in RAM, and that it’s a bit inconvenient to re-build (and re-flash) the entire application from scratch each time.

Sometimes, you just want to tinker with one specific word, deep inside your application: to make it print out some values, or to try out some code changes. The problem is that although that word can easily be compiled again, even interactively, all the other words using it will still point to the original code. This is a consequence of Forth’s “code as a stack” dictionary design.

As an example, let’s assume foo is supected to contain a bug and bar depends on it:

: foo ... ;
: bar ... foo ... ;

To try out a modified foo, we have to change it and recompile both foo and bar to see the effect. But what if the rest of the app is built on top and depends even more on foo, directly or indirectly? The benefit of fast interactive compilation won’t be of any help in such a case.

We could do something like this to allow patching after the fact:

: (foo) ... ;
' (foo) variable 'foo
: foo 'foo execute ;
: bar ... foo ... ;

This would continue to work (with a slightly higher call overhead), while also allowing the replacement of the original foo code with a new version:

: foo2 ... ; ' foo2 'foo !

But it gets tedious if you don’t know yet which word may have to be re-routed later on, during debugging, and have to prepare a whole slew of word patches. That’s where routes.fs comes in.

With the routes.fs package installed – just as with trace.fs and profiler.fs – the “:” definition is modified to generate a branch at the start of each word, inserting an extra variable called “(vec)” to allow “vectorising” the word.

So for example, with routes.fs in place, we can define words ta, tb, and tc in the same way as at the start of this article:

: ta 2 . ;  ok.
: tb 4 . ; Redefine (vec). ok.
: tc 1 . ta 3 . tb  5 . ; Redefine (vec). ok.

In normal use, nothing changes:

tc 1 2 3 4 5  ok.

But now we can make changes to words called by tc, as this example illustrates:

: ta' 22 . ; route ta ta' Redefine (vec).  ok.
tc 1 22 3 4 5  ok.
: tb' 44 . ; route tb tb' Redefine (vec).  ok.
tc 1 22 3 44 5  ok.
unroute ta  ok.
tc 1 2 3 44 5  ok.
tb  --> tb'

Note: the “Redefine (vec).” warnings are caused by the fact that the routes.fs code is still overriding the “:” word, so even these new definitions are being made patchable. It’s harmless.

What you can see, is that although tc depends on ta and tb as defined right before it, we can override those two with new definitions, and tc will then happily call the new code patches!

Again, the implementation for this re-routing trick is fairly involved: a new variable is defined, and then each word checks whether that variable is non-zero. If so, it’ll call that value instead (using execute), else the original code is used. The tricky bit is keeping things matched up, and offering a way to set the proper variable to a specified value (even when all are called “(vec)”!).

Tracing, profiling, and re-routing/patching all use similar tricks (and cannot be combined). The impact on code size and performance will be the subject of the next article…

Weblog © Jean-Claude Wippler. Generated by Hugo.