While writing my introduction to threaded code I suggested that non-leaf routines – colon words, in Forth lingo – can be represented as a list of addresses that get executed in sequence. This suggested that all code executes in a straight line and then returns to its caller. We all know this isn’t true, but threaded code that runs in a straight line is already tricky and tedious to explain; explaining literals and control structures at the same time seemed like too much.
But we are here now, and we understand how threading-in-a-straight-line works, so let’s see how loops, branches, and literals work in a threaded world.
In the real world, code doesn’t always execute in a straight line and then return – and yet, at the moment, which the machinery that we’ve defined so far, that’s all we can do. We also have no way to represent literal data – addresses or numbers that we might want to compute with. We can write a code word called +
that pops two values off the data stack, adds them, and pushes the result, but if we want to add 8 to something, how do we do that?
With just three runtime code words we can add if/then/else, while loops, and literals to our system.
The names are arbitrary, but in many Forth systems the convention is to give them parenthesized names. These are not words that mere mortals normally execute; they exist as part of the hidden “runtime fabric” of the system. The parentheses suggest their “underlying” or “runtime” nature.
Here they are:
(lit) Fetch into W the address pointed to by IP Increment IP by address size (skip literal) Push W onto the data stack Execute NEXT
(branch) Fetch into W the address pointed to by IP Set IP to W Execute NEXT
(0branch) Pop a value off the data stack If the value is zero, execute (branch) Otherwise, increment IP by address size (skip branch address)
The (branch)
above does an absolute branch. If you prefer relative branches, modify it thus:
(branch) Fetch into W the address pointed to by IP Set IP to IP + W Execute NEXT
Let’s assume we have a code word +
that adds two values. To create a word that adds 8 to whatever is on the top of the stack, we would create a colon word with the following (textual) definition:
: add8 8 + ;
which, assuming an ITC system, would get compiled to:
add8 ~~~~ address of NEST ( all colon words start with this) address of (lit) 8 address of + address of UNNEST
The ;
adds the address of UNNEST at the end and ends the compilation.
(This is getting outside the scope of our discussion, which is about the execution machinery, but in Forth systems ;
has two tasks: to compile the address of UNNEST, and to exit “compilation mode” – a special mode of textual interpretation that is the essence of the Forth compiler – and return to normal “interpretation” mode.)
For if/then/else and looping we use flag values – truth values – that we compute and leave on the stack for (0branch)
to test and consume.
To make this even more concrete, let’s add five more code words to our system:
< Pop two values off the data stack If the first is less than the second, push -1 Otherwise, push 0 Execute NEXT
0= Pop a value off the data stack If the value is zero, push -1 Otherwise, push 0 Execute NEXT
dup Pop a value off the data stack Push it back onto data stack Push it a second time Execute NEXT
2dup Pop two values off the data stack Push them back onto data stack, in the same order Push them a second time, also in the same order Execute NEXT
swap Pop two values off the data stack Push them back onto data stack, in the reverse order Execute NEXT
In many Forth systems -1 is the preferred “true” value because it is represented (on almost all machines!!) by an all-ones bit pattern, which can be logically ANDed with other values... but we digress. ;-)
From our definition of (0branch)
above it should be clear that any non-zero value is true, and only zero is false.
Let’s write a simple loop. (There is a better, more efficient, and more idiomatic way to do this, but it is outside the scope of this example. ;-)
-8 begin 1 + dup 0= until
This will get compiled to the following “address list”:
00 address of (lit) 04 -8 08 address of (lit) 0c 1 10 address of + 14 address of dup 18 address of 0= 1c address of (0branch) 20 08 24 ...
We have to dup
the value before we test it with 0=
because 0=
consumes the value before pushing its true or false flag.
The word begin
compiles no code; it “marks” the location that (0branch)
later branches back to.
I’ve added memory addresses in hex (which assume a byte-addressed machine with 32-bit addresses) to make the branch destination easier to describe.
We write if/then like this:
2dup < if swap then
This compiles to the following:
00 address of 2dup 04 address of < 08 address of (0branch) 0c 1c 10 address of swap 1c ...
As before, we 2dup
the values because <
consumes them.
Like begin
, then
compiles no code; it resolves the forward branch that if
compiled. Words like if, then, begin, and until are called compiling words because they execute at compile time and do something special, like resolve a forward or backward branch address.
Adding an else
clause looks like this:
0= if 4 else 8 then +
This code compiles to:
00 address of 0= 04 address of (0branch) 08 1c 0c address of (lit) 10 4 14 address of (branch) 18 24 1c address of (lit) 20 8 24 address of + 28 ...
if
compiles the (0branch)
, else
compiles the (branch)
and resolves the forward (0branch)
, and then
resolves the forward (branch)
.
Remember: (0branch)
means branch if false.