While writing my introduction to threaded code I suggested that non-leaf routines – colon words, in Forth lingo – can be represented as a list of addresses that get executed in sequence. This suggested that all code executes in a straight line and then returns to its caller. We all know this isn’t true, but threaded code that runs in a straight line is already tricky and tedious to explain; explaining literals and control structures at the same time seemed like too much.

But we are here now, and we understand how threading-in-a-straight-line works, so let’s see how loops, branches, and literals work in a threaded world.

In the real world, code doesn’t always execute in a straight line and then return – and yet, at the moment, which the machinery that we’ve defined so far, that’s all we can do. We also have no way to represent literal data – addresses or numbers that we might want to compute with. We can write a code word called + that pops two values off the data stack, adds them, and pushes the result, but if we want to add 8 to something, how do we do that?

With just three runtime code words we can add if/then/else, while loops, and literals to our system.

The names are arbitrary, but in many Forth systems the convention is to give them parenthesized names. These are not words that mere mortals normally execute; they exist as part of the hidden “runtime fabric” of the system. The parentheses suggest their “underlying” or “runtime” nature.

Here they are:

  (lit)      Fetch into W the address pointed to by IP
             Increment IP by address size (skip literal)
             Push W onto the data stack
             Execute NEXT
  (branch)   Fetch into W the address pointed to by IP
             Set IP to W
             Execute NEXT
  (0branch)  Pop a value off the data stack
             If the value is zero, execute (branch)
             Otherwise, increment IP by address size (skip branch address)

The (branch) above does an absolute branch. If you prefer relative branches, modify it thus:

  (branch)   Fetch into W the address pointed to by IP
             Set IP to IP + W
             Execute NEXT

Let’s assume we have a code word + that adds two values. To create a word that adds 8 to whatever is on the top of the stack, we would create a colon word with the following (textual) definition:

  : add8   8 +  ;

which, assuming an ITC system, would get compiled to:

  address of NEST     ( all colon words start with this)
  address of (lit)
  address of +
  address of UNNEST

The ; adds the address of UNNEST at the end and ends the compilation.

(This is getting outside the scope of our discussion, which is about the execution machinery, but in Forth systems ; has two tasks: to compile the address of UNNEST, and to exit “compilation mode” – a special mode of textual interpretation that is the essence of the Forth compiler – and return to normal “interpretation” mode.)

For if/then/else and looping we use flag values – truth values – that we compute and leave on the stack for (0branch) to test and consume.

To make this even more concrete, let’s add five more code words to our system:

  <       Pop two values off the data stack
          If the first is less than the second, push -1
          Otherwise, push 0
          Execute NEXT
  0=      Pop a value off the data stack
          If the value is zero, push -1
          Otherwise, push 0
          Execute NEXT
  dup     Pop a value off the data stack
          Push it back onto data stack
          Push it a second time
          Execute NEXT
  2dup    Pop two values off the data stack
          Push them back onto data stack, in the same order
          Push them a second time, also in the same order
          Execute NEXT
  swap    Pop two values off the data stack
          Push them back onto data stack, in the reverse order
          Execute NEXT

In many Forth systems -1 is the preferred “true” value because it is represented (on almost all machines!!) by an all-ones bit pattern, which can be logically ANDed with other values... but we digress. ;-)

From our definition of (0branch) above it should be clear that any non-zero value is true, and only zero is false.

Let’s write a simple loop. (There is a better, more efficient, and more idiomatic way to do this, but it is outside the scope of this example. ;-)

  -8   begin  1 +  dup 0= until

This will get compiled to the following “address list”:

  00  address of (lit)
  04  -8
  08  address of (lit)
  0c  1
  10  address of +
  14  address of dup
  18  address of 0=
  1c  address of (0branch)
  20  08
  24  ...

We have to dup the value before we test it with 0= because 0= consumes the value before pushing its true or false flag.

The word begin compiles no code; it “marks” the location that (0branch) later branches back to.

I’ve added memory addresses in hex (which assume a byte-addressed machine with 32-bit addresses) to make the branch destination easier to describe.

We write if/then like this:

  2dup <  if  swap  then

This compiles to the following:

  00  address of 2dup
  04  address of <
  08  address of (0branch)
  0c  1c
  10  address of swap
  1c  ...

As before, we 2dup the values because < consumes them.

Like begin, then compiles no code; it resolves the forward branch that if compiled. Words like if, then, begin, and until are called compiling words because they execute at compile time and do something special, like resolve a forward or backward branch address.

Adding an else clause looks like this:

  0= if  4  else  8  then  +

This code compiles to:

  00  address of 0=
  04  address of (0branch)
  08  1c
  0c  address of (lit)
  10  4
  14  address of (branch)
  18  24
  1c  address of (lit)
  20  8
  24  address of +
  28  ...

if compiles the (0branch), else compiles the (branch) and resolves the forward (0branch), and then resolves the forward (branch).

Remember: (0branch) means branch if false.