This file is out of date. It describes an earlier version of muforth, based on a simple native-code compiler for the Intel x86 architecture. Some of the ideas live on in the current, threaded version, so it’s still worth reading.

Why muforth?

Why write another Forth when there are so many around? Because everyone else’s Forth is wrong. ;-)

First of all, muforth is most emphatically not ANS-compatible. It would be silly to write another when there are perfectly good ones available. I wanted instead to build a Forth with specific features, several of which are shamelessly stolen from Chuck Moore’s colorforth.

I think I started with the question, “What would colorforth look like without color?” and went from there. I wanted a substrate for experiments, a good metacompiler, and something small and simple.

Why the name?

From startup.mu4:

The idea is to move as much code as possible out of the Forth kernel. Hence the name: “mu” is the Greek letter often used in engineering to represent “micro”. I had called it “nu” Forth, because it was new, but I like this nu Greek letter better.

Why another Forth?

In other words, why not keep using dforth, an earlier Linux Forth I wrote that I have used successfully on several projects?

dforth has some qualities that I now construe as defects. dforth

I wanted to go the other way. My desiderata included the following:

Map of the territory

The following sections are not really in any order. Some lead into the next one. Some are just the thing I thought of next as I was writing. In any case, if you get bored reading one section, just skip to the next one. It’s bound to be more interesting!

Written in C

muforth is written in C. This was a hard decision, and involved some difficult tradeoffs.

When implementing dforth, I initially wanted to write it in C; but I was also committed to doing an ITC Forth, and doing that in C I found to be very clumsy. I bit the bullet and wrote in assembler, which was quite liberating. The only real downside was the frustration of using m4 as a macro processor – I needed more than the C preprocessor could give me, but I grew to hate m4. I found using it to be a case of programming by trial-and-error, rather than something on which I could bring knowledge and experience to bear.

This time around I was willing to give on a few things in order to be able to write in C.

Of course, muforth is far from architecture-neutral, as it contains a native code compiler that compiles little chunks of x86 machine code. This would need to be changed to run on other architectures. I tried to keep it simple, so this shouldn’t be hard. A few hours’ work, perhaps, once you understand the structure of muforth.

There are several places where writing in C involved compromise or cleverness; I’ll try to talk about them in their structural context. The main concerns:

Since I wanted to be able to write, in C, several utility routines that I to call from Forth, I decided to write much of the kernel of muforth using a Forth calling convention. I wrote such “shareable” words as functions with void arguments returning void, and they “manually” get parameters from, and push their results onto, the Forth stack – which is simply a C array of ints.

Any routine that uses the Forth stack (rather than the C stack) in this way has its name prefixed by “mu_”. By putting an entry for a “mu_” function in one of the arrays that fill the initial dictionary chains for .forth. and .compiler. it is possible to call the function as a word from Forth.

By doing things this way I was able to write, only once and in C, words that otherwise I would have to rewrite in Forth. I wanted to avoid such an error-prone and ugly approach; the cost that I paid was to be forced to generate crappy native code, since the C compiler dictated my use of registers, and in particular refused to allow me to put important global variables in registers.

Another side effect of this decision is that any code that calls into the C library needs to act like a glue function between the two stacks. There are some examples of this in tty.c and time.c. It’s pretty simple – a routine simply moves values between the two stacks as needed. Since the return stack is shared, and since there is no “execution engine” that needs to be fired up on re-entry to Forth – unlike a threaded Forth, which requires this – the only “stack convention” to worry about making sure the right values get moved between the two stacks.

The only place in the code that needs to “worry” about calling conventions is the native compiler code. It needs to know, in particular, which registers the C compiler treats as caller-saved, and which callee-saved, so it can generate code that won’t clobber any of its caller’s values, and, likewise, that its caller won’t clobber. I determined this largely through a trial-and-error process of writing useless and bizarre C code that consumed lots of registers and made lots of calls, but didn’t do anything, and then looking at the assembler output generated by the C compiler. Because C and Forth can call each other “seamlessly”, my code had to play nicely with C’s ideas about register usage.

Oddities and idiosyncrasies

muforth has many differences from more conventional (old-school?) Forths. As I said, it’s emphatically not ANS-compatible. Instead I tried to choose and use the best ideas I could find. Many of them are, perhaps not surprisingly, from two Forths written by Chuck Moore: cmFORTH (for the Novix chip) and colorforth (for the x86 PC).

Though I’ve chosen to be intentionally incompatible, I haven’t done so to be difficult or contrarian, but instead because being compatible would have forced me to give up much of what I wanted to accomplish.

Ideas from cmFORTH

cmFORTH’s main innovation, in my view, was to get rid of IMMEDIATE words. As any Forth old-timer knows, IMMEDIATE words are executed at compile time. But they are also executed at “interpret” time, which causes no end of trouble and leads to horrendously strained games being played with the names of words.

Instead of IMMEDIATE words, cmFORTH has an entirely separate set of words in the dictionary. There are two sets – I call them “chains” – forth and compiler. The compiler words play the role of IMMEDIATE words in cmFORTH and muforth.

In fact, the same idea exists in colorforth, but Chuck changed the names. There is a lot more explanation of this in the section on the dictionary.

Since you need a way to force the compilation, rather than execution, of compiler words, Chuck defined the word \ that searches in the compiler chain for a token, compiles it if found, and complains if not found.

I call this word \c (compile from compiler chain); and I define \ to be more like ANS’s POSTPONE. There is much more explanation of this in the section on the funny compiler word called \.

Ideas from colorforth

In colorforth the color of a word specifies what the interpreter should do with it. Red means define a new word; the token in red is the name of the new word. Yellow means execute; green means compile. There are some other colors as well, but for our discussion these are the most interesting.

In order to be able to calculate constants inside a colon definition, you need only switch to yellow, do the calculation, and then switch back to green. At the yellow-to-green transition the compiler generates a literal with the value of the calculation done by the yellow words.

How was this done “traditionally”? Like this:

  : blog  [ 25 80 * ] literal + ;

The word literal is a compiler word, which gets executed at compile time; it compiles a literal from the value on the top of the stack – in this case, the result of 25 80 *.

But that literal is ugly. In colorforth it goes away. I wanted to make it go away in muforth as well. So what did I do? I made ] always generate a literal. If you need to jump out of the compiler (using [), do something, and then jump back, you can use -]. It doesn’t create a literal.

Our example above becomes

  : blog  [ 25 80 * ] + ;

which is much nicer. A few neat examples, from startup.mu4:

  : 0   [ dup dup xor ] ;
  : -1  [ 0 invert ] ;
  : 1   [ -1 negate ] ;

A side effect is that literal is no longer a compiler word. Instead of being used inside normal Forth words to compile literals, it is instead used inside compiler words, but is itself a normal Forth word. In both cases it runs at compile time, but the mechanism has changed – it now finds itself inside compiler words instead of being one itself.

Since there isn’t really any compiler “state” in colorforth – each word tells the interpreter, by its color, how to treat it – there is no need to bracket definitions with : and ;. A red word marks the start of a new definition; nothing really marks the end. In fact, it’s possible, in colorforth, to write a word whose execution “falls thru” into the next word. This is considered bad style by some, but assembler programmers over the years have used it to great advantage. colorforth gives you this option. (Heh heh. So does muforth.)

If we don’t need to mark the start and end of a word with : and ;, why is there a word called ; in colorforth? What does it do? It exits from, or returns from, the current word, but without it any way marking it as “done”. It is like EXIT in more traditional Forths.

I coveted these qualities for muforth, but had the constraint that my compiler does have state, and does switch back and forth between interpreting and compiling. So I need ; to end my words, like it always has. So I needed two new words:

The first I called ^ to capture the idea of jumping up out of the word. The second I already had. It’s called [, and is used, as explained above, to “temporarily” exit the compiler so that we can do a calculation or something. Great! That’s all we need. Since we do not smudge or hide newly defined words, the only action that needs to be taken to exit the compiler is to switch from compile to interpret state. This is exactly what [ does: it stops the compiler, but doesn’t compile a “return” (or convert a tail-call to a jump).

Then there is the usual ; which does both these things. In fact, it simply calls one (^) and then the other ([). Its definition is basically:

  : ;  \ ^  \ [  ;

where \ is used to force the compilation of a compiler word.

colorforth has the oddball feature that if leaves the top of stack intact. This can be useful or annoying. I thought it would be interesting to have both options – destructive and non- – so I defined =if and =while in addition to if and while. if and while consume the top of stack; =if and =while leave it alone.

And, just as in colorforth, there is no else in muforth! You can always get by without it, and often the code factoring improves. For example, you can rewrite

  <test> if  a  else  b  then  c ;


  <test> if  a c ^  then  b c ;

colorforth doesn’t “smudge” (hide) new definitions until they are complete – and neither does muforth. This makes writing recursive definitions easy, but makes redefinition harder (you have to rename the old function first).

colorforth lacks the looping words begin, while, until, and repeat. (It retains for and next.) In colorforth, to write a loop, you simply “call” a word from within itself. For this to work, colorforth has to be tail-recursive. So muforth is as well. I liked the idea of being able to write iterations as syntactically recursive procedures, as in Lisp; however, I haven’t used it much. ;-) See the section on the joy of tail-recursion for more details.


As part of my effort to strip everything out of the kernel of muforth that is not necessary for bootstrapping – keeping only what is necessary to make colon definitions – I stripped number parsing, number formatting, and most i/o out of the kernel.

There aren’t even any constants defined in muforth! I calculate them, starting with 0, in startup.mu4. From there I get some basic constants I need for deciding if a character is a digit, and at that point I can write the number parsing code.

The number parsing code has some flaws, and isn’t that different from what you’d find in F83 or an ANS Forth. The main thing I added was radix prefixes. Even though you’re in, say, hex, you can enter a number in any of 4 radices by prefixing it with a single character:

  "  -> hex
  '  -> octal
  #  -> decimal
  %  -> binary

This can be a great headache saver.


muforth is like an elephant; it never forgets. ;-)

muforth does not define the common Forth word forget, nor does it define empty, as colorforth does. In the presence of deferred words, any kind of forget is error-prone. I find it better to quit and reload. This has never bothered me, or felt limiting in any way. And it certainly makes memory allocation easier. ;-)


I struggled with how to define and use strings in muforth. I really like length-prefixed strings, but using them with C is a pain. After several false starts I arrived at a simple structure for strings that solves not only the problem of C compatibility, but a host of other problems related to having strings “inline” with code – the traditional Forth way.

Since muforth is intended to run on modern machines with lots of caches, it seemed important to separate code and data. This affects not only strings but also create/does>. I’ll talk about create/does> in a later section.

Except in a few places – throw comes to mind – a string in muforth is represented on the stack by a tuple: (start, length). start points to the first character; length is the count of characters. The string may or may not be zero-terminated; no muforth code depends on the presence of such termination.

But what about saving strings for later use? Now we have to have a structure for strings in memory, and we’d like it to be C compatible, since maybe the muforth word we’re calling is going to call into the C library. Maybe we’re opening a file by name.

My conclusion was to compile the tuple

  (length, string, termination, padding).

length is a cell-sized count, so very large compiled strings are possible. string is an array of the characters. termination is a single null character – always present in compiled strings! And padding gets us to a cell boundary.

Note that it’s possible to “compile” a string but not allot the space for it. This is useful for temporary strings that need to be zero-terminated, such as filenames to be passed down into the C library. Unfortunately it’s easy to clobber the string thusly compiled. I used to have “ld” use a compiled but not alloted string; now it allots it too, so even “temporary” filename strings use up data space. Oh well. ^C, up-arrow, return. ;-)

Unlike traditional Forths, I compile the string into “data” space, not code space. Then I compile a literal that at runtime pushes the address, not of length, but of string. This way I can easily pass it to C code, and I can just as easily convert it to a canonical muforth (start, length) string by executing count:

  : count  ( z" - a u)  dup  cell- @ ;

z” here is a mnemonic for zero-terminated, length-prefixed, compiled string.

One of my subgoals with muforth was to have only one kind of literal. Forths traditionally have had several different kinds of string literals, all of them inline (thereby possibly causing cache problems by mixing code with data) and therefore requiring strange and convoluted code to “jump over” the inline string in order to continue execution. By converting string literals to ordinary literals, lots of problems go away, and the use of strings now becomes neatly postfix as well. If I want a literaled compiled string to push (start, length) I simply follow the literal with a call to count. Look for “string,” in startup.mu4 and you’ll see how it all works.

Note also that this kind of literal can be used for any arbitrary data structure, not just strings; it defines a universal literal. What is a clump of data but a (start, length) tuple? You can compile any kind of data this way, make a literal of it as a counted string, and then call code that knows how to pull bits out of it and do something with them.

Digression about tokenizing, strings and suffixes

Before digging into the interesting details about how muforth’s tokenizer works I want to make a digression that I hope will shed some light on the odd data structure that I use to capture the state of the tokenizer. I should say that these ideas came after I had settled on the structure and written the code – long after. I wanted to find a “formal” (or semi-formal ;-) explanation of the data structure and why it is well-suited to tokenizing. While walking today, I think I found it.

Think about what a tokenizer does. It looks at a string, a character at a time, classifying each character as a delimiter or token character. Its job is to (possibly) skip leading delimiters, collect a sequence of token characters, and (possibly) consume the delimiter following the last token character. It returns a (start, length) string representing the token. Then it somehow “remembers” which characters it has seen, so that the next time it gets called it will skip over them, rather than starting over at the start of the string.

It peels a character off the front, does something with it, and sets aside the rest. At the lowest level, then, a tokenizer looks at successively shorter suffixes of its input string. This is the main insight that leads to our data structure. But before we get there, let’s look at what is commonly done.

Almost all Forths represent the string to be tokenized as a (start, length) tuple, and the count of characters already consumed by the tokenizer as an offset, commonly called >in. When >in is zero, we have consumed none of the string; when >in == length, we have consumed the entire string. If we think of >in as the length of the prefix of the string already consumed, then (length – >in) is the length of the suffix yet to be consumed. (Invariant: len(prefix) + len(suffix) == length.)

We should note that using (start, length) to represent successive suffixes is clumsy. Here are a few such successive suffixes:

  (start, length)          the whole string
  (start + 1, length - 1)  .. minus the first character
  (start + 2, length - 2)  .. minus the first two characters
  (start + length, 0)      the empty suffix

For each successive suffix, as >in goes from zero to length, it is added to start, and subtracted from length.

If we were looking at successive prefixes this would be a great way to represent things;

  (start, length)          the whole string
  (start, length - 1)      .. minus the last character
  (start, length - 2)      .. minus the last two characters
  (start, 0)               the empty prefix

But that’s not what a tokenizer does, so why use this representation? (I’ve heard that at least one Algol 60 compiler read its input backwards, which somehow made parsing easier and more elegant. In that situation some variant of this successive prefixes method might nicely apply.)

Why not instead put our anchor point at the end, which is the same for all suffixes of the original string? Then we would represent the start of the suffix as an offset from the end, like this (setting end = start + length):

  (end, -length)          the whole string
  (end, -length + 1)      .. minus the first character
  (end, -length + 2)      .. minus the first two characters
  (end, 0)               the empty suffix

Note that only the second one part of the tuple is changing. This should be a clue that this representation is a better match for suffixing than (start, length). In fact, if we compare this representation of suffixes with using (start, length) to represent successive prefixes, we see that they are duals of each other. In both cases only the second part of the tuple changes; the anchor remains the same. The second part goes from either length to zero (prefix) or -length to zero (suffix).

Practically this means that there is less work to do on each call to the tokenizer, since we already know which part of the text we’ve seen (the prefix that we’ve “thrown away”), and less work to do for each character, since we are only changing (and checking for exhaustion) one part of our data structure.

With that introduction, we are ready for...

Tokenizing the input

This is one of the cool innovations in muforth.

In muforth I define a notional string type, called “text”, that is better suited than normal (start, length) strings to tokenizing. I say “notional” because it never shows up in the language, only in the C implementation. A text is something a bit odd. I’ll explain.

I discovered, in writing the tokenizer for dforth, that using a string (start, length) and an offset (past what we have seen already) was clumsy and slow. This is what Forths have traditionally done. The state of the interpreter is captured by two variables: source and >in. source is a double variable containing a pointer to the first character, and a length – a normal string. >in is an offset past the characters that we’ve looked at already, in previous calls to the tokenizer.

On every call to the tokenizer you first have to offset the pointer forward (by >in), past the text you’ve seen already, and the count backwards (also by >in), since it counts characters yet to be seen. You’re always offsetting two things. Even when consuming a single character during scanning you have to increment the pointer, and decrement the count. Two operations. Clumsy.

Instead, I thought, why not run everything backwards?

If a string is represented by the tuple

  (start, length)

then a text describing the same sequence of bytes in memory could be represented by the tuple

  (end, -length), where end = address + length.

The type “text” consists of a pointer just past the end of the text, and a negative offset from the end back to the start. When we start tokenizing a chunk of text, we set a global variable called first to “point” to the first character. But first is really a negative offset from the end of the text.

In this scenario, the current state of the tokenizer is captured in two variables: source (a text), and first, an offset running from -length up to 0.

On each call to the tokenizer, first already points past the the text we’ve seen already, so we’re ready to start looking for a new token. Now might perhaps be a good time to mention a little secret: muforth has two tokenizers. One – token – assumes that tokens are separated by whitespace, and skips leading whitespace. The other – parse – takes a delimiter character and scans for a trailing delimiter, but assumes that the new token starts at first – it does not try to skip any leading delimiters.

Here are the details of how this works. Let’s assume we’ve called token, and we’re going to skip leading whitespace. first “points” to the first character that we haven’t seen yet. So, while the character at first is whitespace and first < 0 (not at the end of the text), increment first. Now either we’ve run out of text, or we’ve found a non-whitespace character. Either way, this is the first character of our token.

We leave first pointing to it, set last = first, and then start scanning with last, incrementing it as long as it points to a non-whitespace character and remains < 0. When we reach the end of the token, which happens either if we find a trailing delimiter, or if we run out of text to scan, last is left pointing just past the last character of the token.

Another way of thinking about this is that we either run out of input text (trailing = 0), or we “consume” a single trailing delimeter character (trailing = 1).

We’ve now “bracketed” a token. first points to its first character; last points just beyond its last character. (These “pointers” are actually both negative offsets from the end of the source text.) The (start, length) string tuple for the token is then

  (end + first, last - first)

We consume the token and trailing delimiter (if there was one) from the input stream by setting

  first = last + trailing.

Now we’re ready to look for the next token.

Of course, the astute reader will be wondering about the various boundary cases when the text runs out. By being careful about trailing, we ensure that first is never left > 0. At the end of parsing a token it will be <= 0. If no characters were found between first and last, which can only happen if first = 0, possibly after skipping leading whitespace (think about it), then lastfirst will be 0 as well. At end of text we always return the token

  (end, 0)

and the interpreter code is smart enough to recognize this zero-length token as the end.

So how do we interpret text typed by the user or loaded from a file?

In the first case, simply call interpret with the tuple (start, length) that describes the buffer holding the input text that the user typed in. (It’s called typing in startup.mu4.)

For files, we simply mmap them, and then pass the (buffer, length) tuple to interpret! This make muforth’s interpreter code look much more like a BLOCK-based Forth – which it isn’t – than the file-based Forth that it is.

Nested loading of files is possible because the file loading code actually calls evaluate (rather than interpret directly), which saves the current source and first, calls interpret on the new file’s mmap’ed text, then restores the old source and first, and keeps going from there.

Since nested interpretation of the typing buffer isn’t possible, it is interpreted directly, with no call to evaluate (see quit in startup.mu4).

The dictionary

The dictionary in muforth is very simple. It is a set of intertwined linked lists of words. Each list represents what I call a chain, and consists of related words. The head of each chain is the word most recently defined on that chain.

The dictionary space is linearly allocated. Each entry consists of a tuple:

  (link, code, count, name, padding)

The dictionary can in theory contain an arbitrary number of intertwined chains. muforth starts up with only two chains defined; their names and functions I shamelessly stole from Chuck Moore, but this time from cmFORTH. They are called “forth” and “compiler”. (He uses the same idea in colorforth, but calls the sets of words “forth” and “macro”, and they are arrays rather than lists.)

The forth chain consists of “normal” words: words that should be executed when mentioned outside a colon definition, and compiled when mentioned inside one.

The compiler chain fills the role that so-called IMMEDIATE words fill in more traditional Forths, but without their problems. Because it is a completely separate chain that is not searched outside of colon definitions, it is possible to have a word with the same name in both chains with no confusion. A perfect example is .". Outside a definition it simply scans for a trailing " delimiter and then echoes (types) the string. Inside a definition it compiles the string into “data” space, and compiles code to type it out at runtime. This is intuitively what we expect.

Inside a colon definition the interpreter searches the compiler chain, and if it finds the token there it executes it rather than compiling it. In other words, words on the compiler chain execute at compile time, and tend to be used to build control structures like if/then and begin/while/repeat. Because of this, we often call them “compiling words”.

But that’s not the end of the story. If the token is not found on the compiler chain, we look on the forth chain, and if found there it is compiled.

Ok, so how do we specify which chain a word gets compiled into, and how do we specify “search orders” like the above?

A variable, called current, points to the “current” chain – the one that new definitions go into. By convention chains have funny names: forth and compiler are really called .forth. and .compiler.. I chose those names to indicate their chainness. (The dots on the ends are supposed to look like “links”.) When you mention a chain by name it simply pushes its address. It’s basically a constant.

So, also by convention, we define the words forth and compiler. These do more than simply name the chain; they make it the current chain, so new words get compiled into it.

How does this work? Simple!

  : definitions  ( chain)  current ! ;
  : forth        .forth. definitions ;
  : compiler  .compiler. definitions ;

Any time we define a new chain, .foo. , we also define foo as above.

Then our code can look like this:

  [words compiled into .forth. chain]
  [words compiled into .compiler. chain]

Now we can specify what chain words get compiled into, but what about specifying which chains get searched, and when, and what we do with the words that we find? I’ll first explain the basic dictionary search mechanism, but to fully answer this question I have to talk about the structure of the interpreter – another cool muforth innovation.

find is the workhorse word that searches the dictionary. You supply it a token (start, length), and the pointer to the head of a dictionary chain (such as .forth.). find returns true and a pointer to the word’s code if it found it, or false and the (start, length) tuple you originally gave it if not. This way it’s easy to specify a sequence of searches of chains, using the same token over and over.

Which brings us to...

The interpreter

Like everything in muforth, I wanted the interpreter to be dead simple, but incredibly flexible. Traditionally, Forth interpreters have relied on lots of gross hacks and “tricks” that not only make them hard to understand, but hard to use as well.

The interpreter basically looks like this:

  : interpret  ( start length)
    >text source 2!  first !
    begin  token  dup while  consume ?stack  repeat ;

>text converts (start, length) – the string we want to interpret – into (end, -length) & -length, which go into source and first, respectively. (This Forth code is a bit hypothetical; this really happens in C in interpret.c.)

The loop is simple. Get a token. If its length is 0, we’re done. Otherwise, consume the token and then check the stack for over- & underflow.

Wow. That’s it?

The magic all happens in consume. Depending on the state of the interpreter, consume does wildy different things with the token.

It may not surprise Forth old-timers, but muforth has a variable called state. Its use is much different than the traditional one, though. My state points to a two-cell data structure. The first cell is a pointer to the code to consume a token; the second is a pointer to code to display an informative prompt, so if you’re doing something at the keyboard, you’ll know what state the interpreter is in.

consume is then defined as:

  : consume  state @  @execute ;

I call this (consume, prompt) tuple a mode. muforth has only two modes built-in. Can you guess what they are? Interpret and compile, of course!

Here is the consume function for interpret mode. It has a funny name. Because [ leaves the colon compiler and ] enters it, the interpret-mode consume function is called _[ and its compiler dual is _].

  : _[   ( interpret one token)
       .forth. find  if  execute ^ then  number ;

Look in .forth. for the token. If found, execute it. If not, try to convert it as a number. If that fails, number will complain. Note that there is no mention of .compiler.! Words in .compiler. are invisible to interpret mode.

The colon compiler’s consume function is this:

  : _]   ( compile one token)
    .compiler. find  if  execute ^ then
       .forth. find  if  compile, ^ then  number, ;

Search .compiler. for the token. If found, execute it (it is a compiling word). If not, search .forth.. If found there, compile it. If not found, try to convert it as a number. If that fails, number, will complain. If it succeeds, it will compile a literal (hence the comma in the name number,.)

Super-simple, right?

Again, Forth old-timers will laugh at how simple this is compared to what it replaces: ONLY/ALSO, arrays of weird nibble-encoded dictionary chain indices, etc.

I think the fundamental difference between this approach and the traditional ones is that the traditional approaches allow for – in fact, demand – the run-time creation of search orders. In my humble opinion, that’s simply the wrong time to be creating a search order! When you figure out what you’re trying to do – what your interpreter mode is supposed to accomplish – you’re going to arrive at a search order that is fixed and constant for that interpreter mode.

I should add that the traditional approaches only specify a search order; you still have to have immediate words and other gross hacks in order to have compiling words work. My way specifies not only the search order, but exactly what to do when a token is found in a particular chain. And the “encoding” for it is very simple and easy to read. In fact, it’s the simplest thing that could possibly work: straight Forth code!

What makes this really nifty is that it’s trivial to create more modes for the interpreter. startup.mu4 creates a mode for conditional interpretation, by creating a new dictionary chain called .conditional. and creating a new interpreter mode. It’s a bit hairy, mostly because it allows nested conditionals, but it’s not much code and it’s a nice approach: use the interpreter to scan for tokens for you! That’s what it’s for!

The simplicity and extensibility of this approach wins big for metacompilation as well, which, when I finish working on muforth’s “universal metacompiler”, I’d like to write about.

The funny compiler word called \

ANS Forth has a word called POSTPONE that is horrendously complicated but unfortunately actually useful.

Before I talk about that I want to mention the idea of forcing interpretation or compilation from a different dictionary chain than the current interpreter mode would normally choose. An easy example is that we’re writing a compiling word that does the same thing that another compiling word does, but then does a bit of its own stuff as well. Without some kind of “escape” mechanism there is no way to compile a call to the other compiling word, since any mention of it inside a colon definition will result in its execution rather than its compilation. What to do?

muforth defines several such escape words.

  \f forces compilation of the following token from the .forth. chain
  \c forces compilation from the .compiler. chain

But if you look through startup.mu4, you’ll see very few references to either of these words. Mostly you’ll see references to a funny word called \.

\ is my name for POSTPONE, courtesy of Chuck Moore. (It was part of cmFORTH.)

\ is like \c, but it does more than that.

First it looks for the token on .compiler. and compiles it if found. (This is what \c does, but it stops there – in fact it complains if the token wasn’t found on .compiler..)

If instead it’s on the .forth. chain, \ compiles code into the word that we’re currently compiling, that, when later executed, will compile a call to the word we found in .forth..

It’s kind of a postpone, no matter what. Let’s say we’re defining the word foo, like this:

  : foo  \ bar ;

If bar is a .compiler. word, postpone its execution until foo executes. If bar is a .forth. word, postpone its compilation until foo executes. That’s probably the best way to think about it.

With that in mind, let’s think about...

Structure compiling words

One way to write a loop is like this:

   begin ... <test> while ... repeat

begin marks the beginning. <test> leaves a flag on the stack, which while consumes, falling through if true and jumping past repeat if false. repeat then jumps to the code following begin.

So what gets compiled looks something like this, using labels (like “beginning:”) as jump destinations:

    branch if zero to end
    branch to beginning

begin compiles nothing, but marks the top of the loop by pushing the address of beginning. until compiles the branch if zero. repeat compiles the branch to beginning, the address pushed on the stack by begin.

Let’s compare this to the code produced by if/then.

  <test> if ... then

This compiles

    branch if zero to end

Hmm. This looks just like the code that while is supposed to compile. Do if and while do exactly the same thing? Not quite. But before we look at the difference, let’s think about what repeat has to do. It has to compile a backwards branch to beginning, and then it has to fix up the address of the forward branch that while compiled so that it branches to end.

So what about while? After compiling the conditional branch, the stack has two pointers on it: beginning, and a reference to the forward branch that repeat will fix up. But they’re in the wrong order! repeat needs to see beginning first, then the forward branch fix up address. So we need to swap the addresses on the stack. While in theory repeat could do this, in practice that’s the wrong place, because nothing prevents us, with properly defined words, from doing

  begin ... while  ... until ... then

which is actually a very useful construct. Unless while does the swap, until will break, because like repeat it expects the address of the beginning of the loop on the top of the stack.

All of this explains why while and repeat are defined thus:

  : while   ( dest - src dest)   \ if  swap ;
  : repeat  ( src dest -)     \ again  \ then ;

In fact, repeat’s definition as \ again \ then suggests using until ... then in our own code.

Of course, I haven’t given any good reason for using \ over \c to compile references in compiling words to other compiling words. Oh well.

The joy of tail recursion

Tail-recursion was a motivation for writing muforth. What the heck is it? you ask. Fair enough.

If I write

  : foo  a b c foo ;

What will happen when foo executes? On a normal Forth, the last reference to foo will compile a call to foo, which, when executed, will push the return stack. (Actually, in most Forths, in the presence of SMUDGE, the reference to foo inside foo will fail unless we somehow tell the compiler that foo is “recursive”.)

After foo executes for a while, the return stack will overflow, and the program will crash.

But let’s look closer at foo. Is there any reason to push the stack when we call foo from itself? What’s it going to do when it returns? It’s going to return to its caller! So why return to ourselves if we’re going to immediately return to our caller? Why indeed!

The call to foo is in what Scheme people call “tail position”. Any call in tail position has nothing terribly interesing to do when it returns – it’s going to return directly to its caller. So what was discovered is that calls in tail position can be converted to jumps. This conversion is called “tail-call conversion”, and it’s very simple, but very powerful.

What happens when we do this? Suddenly foo becomes a word that contains an infinite loop, but one that consumes zero stack space. We’ve defined a word that is syntactically recursive, but procedurally iterative. This is a very powerful idea, but one that’s been with us since the dawn of Lisp, way back when only FORTRAN existed.

Using tail-recursion obviates the need for any looping constructs. muforth has them, but really it doesn’t need them.

A last question worth asking is, when can we do this conversion, and when can’t we? At any exit from a word, either via ^ or ; we can do the conversion. There is one case where we have to be careful. If the byte after our converted tail-call is a jump destination, we have to compile a return instruction there.

In the following code:

  : foo  if a b c then  foo ;

“call foo” gets converted to “jump foo”. What about:

  : foo  if a b c foo then ;

Here the conversion happens too, but also a return instruction gets compiled after the “jump foo”. Strange but true!

Simple and tail-recursive native compiler

Since muforth text is compiled into native code, we need to define a compiler. Compilers are usually big and complicated, and take a long time to write. This is because most languages are big and complicated, and also because the compiler writers are interested in the best runtime performance they can get.

Forth is simple, and with muforth I’m really more interested in simplicity, ease of use, and ease of understanding than in raw performance. The code that the compiler generates frankly sucks, but it’s really simple. Amazingly even sucky code can run fast enough to be perfectly usable and useful!

So, how does it work?

Like most Forths, muforth spends most of its time generating sequences of calls to words. Threaded Forths do the same thing: they compile sequences of “execution tokens” (to use the FORTH-83 and ANS term) which are essentially calls with the call opcode stripped off. So, at the lowest level, almost everything that gets compiled in muforth is a call.

The compiler “remembers” the address of the last call instruction that it compiled, and so when we’re about to compile a “return” to exit from the word, if the last instruction compiled was a call (and therefore it’s in tail position), it gets converted to a jump.

That’s pretty simple. But what else is there?

One of the issues with writing muforth was that gcc didn’t let me define a global register variable to use for the stack pointer or the top of stack, both of which, in the absence of gcc limitations, I wanted. I can’t think of a way to generate even reasonably good code unless I have those two global registers. But I can’t have them. Pooh.

So, the stack is simply a C array (of cell_t’s), and the stack pointer (sp) is a normal C global variable that lives in memory, that points to a cell_t. TOP is just sp[0].

If I am generating code that does anything with the stack, I first have to get sp into a register, indirect off of it, maybe modify it, and then write it back. So there are a bunch of routines in the compiler C code that compile these various operations.

There is also code to generate four different kinds of branch instructions, the combinatorial product of (unconditional branch, branch if top == 0) and (pop the top, do not pop the top), and also the goo underlying for/next.

Since getting at the return stack (the true hardware stack) is impossible without low-level support, there are several routines that compile various pushes, pops, and copies that involve the return stack.

That’s about it! For the x86 it’s about 300 lines of code.

All of the smarts for compiling control structures (if/then and friends), and new data structure types (create/does>) is all defined in Forth in startup.mu4.

Towards a new create/does>

First, let’s talk a bit about “old” create/does>, what it does, and how it (used to) work.

Basically, create creates a new word in the dictionary, and does> causes this new word, when later executed, to execute a particular piece of code.

create and does> are used to define “defining words” – words that build new kinds of data structures. The defining word contains code to create the data structure, and code to be executed by all the “child words” that this defining word is used to define.

So a word called “definer” could be written:

  : definer  create  <build the data structure>
              does>  <do something with data structure>  ;

And used:

   definer foo

The code to build the data structure is executed when the definer executes – this is also the compile time of foo. The code to do something with the data is executed when foo executes. So we can create new structures and new behaviors. But how do we link them? In other words, how to tell the behavior code – which lives in definer and is thus common to all words that definer defines – where to find the data of each child word, such as foo?

What used to happen is that the child word, foo, would look like this when compiled:

    call definer_code
    <foo's data>

When foo is executed the call to definer_code has the side effect of pushing the address of foo’s data onto the return stack. (The use of “call” here is confusing. call is being used for its behavior “push address of following byte and then jump”, which is what a call does. But normally the “address of following byte” is construed as a return address. Here it is used as a data address, which is popped and never used as a return address.)

definer_code can then pop the data address off the return stack, and push it onto the data stack, where the definer_code can use it.

But there are a couple of subtleties here. One is that definer_code is Forth, so in addition to moving the address of foo’s data onto the correct stack, an entry into the Forth interpreter (for threaded Forths), including a push of the return stack, has to occur as well. Another is that the body of foo now mixes code with data, something that I am trying to avoid with muforth.

I’m going to avoid explaining the subtleties of how threaded Forths have traditionally accomplished this. It’s one of the most strange and wonderful aspects of Forth, but it’s also subtle and complicated, and the way muforth does it is quite a bit simpler. Maybe when muforth’s way really makes a lot of sense to you you’ll be ready to conquer the “old” way.

In summary then, when the child word executes two things have to happen:

When I started thinking about clean but clever ways to push an address of data, and then jump to a common piece of code, but without mixing code and data in the child word, I realized that it was impossible. The best I could do was to load a register with a constant, and then jump to the common code.

But this, it turns out, is all you need! The constant can be a constant, with intrinsic value of its own, or it can be the address of the child word’s data, if it has any.

What this means in usage is that whereas the old-style does> “knew” what address to push, my new-style does> has to be passed a constant value to compile into the child. If the constant should be an address of a data area, then call ram to get an address, and pass it to does>. Sometimes the stack manipulation is bit clumsy, but it’s a good system in general.

It wasn’t necessary to do things this way. I could have kept the old behavior, that does> compiles code to push the address of the next free byte of ram – in other words, that the “constant” it pushes is always an address. This makes the defining words look a little cleaner, but it is less efficient when you actually want to compile a constant into the data space (you have to comma it into ram when you define the word and then fetch it out again whenever you want to use it), and it doesn’t allow for the address being anywhere but the next available byte of ram. On ROMed systems you sometimes want does> to push a ROM address instead. This new way lets you choose which you want, by simply supplying the proper constant to does>.

Inside the guts of create/does>

With that introduction, let’s look at the gory implementation details. The code examples here are very x86-specific since at present that is the only implementation. But it’s the ideas that I’m interested in conveying, so hopefully this will not be a deterrent.

First of all, does> changes from compiling an address to compiling a constant – which could, of course, be an address. does> splits the code to push a constant into two parts: a load part and a push part. The load part is the code:

  movl $constant,%edx

The push part is the call (or jump!) to push-literal. This is the same code that literal generates; we’re splitting it into two parts to save code space.

Here’s how it works. Let’s say we’re writing an assembler and need words for addressing modes. Like many create/does> words, these consist of a constant and some code. The old way to do this would go something like:

  : md  create ,  does> @  a b c ;

This is a clumsy way to work with constants – compiling them into the word only to fetch them out again – and is inefficient when compiled to native code. My modernized way looks like this:

  : md  create  does>  a b c ;
  octal 004 md deferred

does> by default creates a constant, which gets pushed when the child word – deferred in this example – is executed. The actual code that is compiled looks like this:

  md: call create         ; define-time
      call ;does          ; define-time
      call push_literal   ; run-time
      call a              ; run-time
      call b              ; run-time
      jmp  c              ; run-time
      movl $004, %edx
      jmp md_does

Wait a minute! How did that ;does thing get in there? Well, does> is actually a compiler word. It compiles “call ;does” followed by “call push_literal” into the word being defined – md in this case.

When md is called – to create a child word, or, equivalently (for OO types) an instancemd create’s the new word and then ;does compiles the load constant and the jump to md’s run-time code.

If we hadn’t split the push-constant part, the code in each child word – and there could be lots of them! – would have to have an extra call:

  md: call create         ; define-time
      call ;does          ; define-time
      call a              ; run-time
      call b              ; run-time
      jmp  c              ; run-time
      movl $'006, %edx
      call push_literal
      jmp md_does

What we have essentially done is move the “call push_literal”, since it is common to every child word, out of the child and into the parent.

With that long-winded intro, we can now get into some of the odd subtleties of this brave new world.

The first subtlety is that while does> and constant – as defined below – both create some type of constant, they do it slightly differently. constant creates a word and writes code to push a constant – both the load and push halves of that action – and then forces a tail-call conversion of the call to push_literal, changing it to a jump and thereby ending the word. It can do nothing more than push its constant.

This ends up being syntactic sugar;

  4 constant cell


  : cell  4 ;

produce exactly the same code; namely:

    movl $4,%edx
    jmp  push_literal

does> is different; it leaves open – in fact encourages! – doing something exciting with the constant that has just been pushed. (Remember that this constant could be the address of some data structure.) It splits the constant-pushing task into two parts, as described above, putting the load into the child and the push into the parent, but leaves the call to push_literal as a call that is expected to be followed, as in md above, with code that processes the constant.

To drive home this subtle difference, assume that we defined constant thus:

  : constant  create does> ;
  4 constant cell

Now what do we have?

    call create
    call ;does
    jmp  push_literal
    movl $4,%edx
    jmp  constant_does

This is slightly less efficient; it compiles extra code into the parent word – the run-time code “jmp push_literal” – and to execute this constant we do a jump to a jump, which is both silly and slow.

Note: the trailing ; in our example definition of constant above changed the “call push_literal” to “jmp push_literal”. Sneaky!

The other important subtlety is that these constants – esp. when they are the address of some data structure – can be in the way when compiling said data structure. The constant that you want compiled into the child word has to be on the stack when ;does> executes. It’s a bit clumsy, but you have to do something like this:

  : clumsy  create  ram  push  , ,  pop  does>  a b c ;
  thing1 thing2 clumsy example

I suppose you could write that like this too, with consequent changes in how the data is “fetched” and used in a, b, and c:

  : less-clumsy  create  literal  does>  a' b' c' ;
  thing1 thing2 less-clumsy example

In this case we compile two literals, the second one being “split in two” in the canonical way:

    call create
    call literal    ; compiles code to load & push a literal
    call ;does
    call push_literal
    call aprime
    call bprime
    jmp  cprime
    movl $thing2,%edx
    call push_literal
    movl $thing1,%edx
    jmp  less_clumsy_does

With this new kind of create/does>, you choose how to do it.

Having understood create/does> you’ve tackled one of sublime mysteries of Forth. The other sublime mystery is our next subject...

The structure and interpretation of metacompilers

A metacompiler is what most people outside of Forth call a cross-compiler: a compiler that runs on one machine (the “host”) and compiles code for another (the “target”). We do not here discuss an exciting variation called the Canadian Cross, wherein we compile a cross compiler on one machine (the “compile host”), that executes on a second machine (the “host”), to compile code for a third machine (the “target”). While exciting, there isn’t at the moment the need or application for this when using muforth.

What makes metacompilers hard to write and understand? I think it’s a combination of three things, which interrelate in complicated ways:

I’ll talk about them in turn, but they are intertwingled, so it won’t be an entirely linear discussion.

Time phases of metacompilation

Let’s start with the idea of metacompiler time phases; this should lead naturally to discussion of the other aspects.

This aspect of metacompilation is rather ill-defined, and different folks have pioneered different approaches over the years. Unfortunately (as I’m not much of a fan of Perl or its philosophy) there’s more than one way to do it. One chooses ease over clumsiness, perhaps. But it my experience “ease” often translates into ambiguity and confusion. My discussion here will focus on an approach that chooses explicitness and verbosity over “ease”.

Roughly speaking, there are three time phases of metacompilation:

  1. compiling the metacompiler code;
  2. executing the metacompiler code, compiling the target code;
  3. executing the target code.

(1) and (2) occur on the host machine; (3) occurs on the target machine. (If we are doing tethered debug, (3) needs a little help from the host machine, but mostly the execution occurs on the target.)

We can break this down still further.

In (1) we are compiling code that will later execute on the host to read a text stream of “target Forth code” and build a memory image of the compiled target code. We’ll likely need all of the following:

In the past I have overlapped defining this infrastructure with metacompiling the target code, in order to keep things together that seem to “belong” together. This is a nicety that complicates the system. We will not be discussing such a system here; rather, we will try to “firewall” the phases from each other as much as possible, in the interest of modularity.

Looking at the list above we immediately notice that we will be redefining almost all the basic infrastructure words with metacompiler versions, and we will be creating target versions of most of the .forth. chain. Creating separate chains for these new words will help dramatically to prevent confusion and ambiguity.

In both phases (1) and (2) being able to specify exactly which version of a word we want becomes critical. That brings us naturally to our second big metacompiler subject...

When words collide; or, how to keep aliasing from driving you crazy

You have to be unambiguously clear about which version of dup you mean when you call it by name. This is the idea of “when words collide”: you type “dup”, but because of the design of the system you get the wrong dup (the host’s instead of the target’s, the metacompiler’s rather than the host’s, etc.) and something gets compiled or executed, and a while later everything crashes down, but you have no idea why.

This kind error is a Royal Pain to find and fix. It is much nicer to force disambiguation of “which dup” up front, rather than pay later. The muforth way is to create several dictionary chains, one for each “context” in which a word might appear, and then to create several interpreter modes, each one corresponding roughly to a phase or task in the metacompiler.

John Rible and Bill Muench wrote an elegant metacompiler in which they pioneered the use of the terms “outside” and “inside” to mean “metacompiler words that occur outside of colon definitions” (mostly defining words) and “metacompiler words occuring inside colon definitions” (mostly compiling words). I really like the descriptiveness of these terms and to honor their inventors have named the dictionary chains in muforth’s metacompiler that serve similar purposes .outside. and .inside.. They are just like .forth. and .compiler. on the host, but have the advantage of having names distinct from, and cooler than, the host names.

In addition to these, I define .assembler. to hold the target assembler words, and .target. to hold the words that we’re – finally! – defining on the target.

By putting the words into different chains like this it becomes much easier to know what you are getting when you say “dup”. We define an interpreter mode for each phase of metacompilation, and we try to define each mode to minimize ambiguity – the collision of words, or accidental getting of something other than called for – even at the cost of increasing verbosity.

Instead of building into our interpreter modes automatic searches of all the chains that might contain the word we want – thus dramatically increasing the likelihood of word collision – we define a prefix “escape” mechanism for executing or compiling a word from a chain other than the one that would be “normal” for our interpreter mode. A perfect example is the use of \ and \c to force compilation of .compiler. words.

For every chain in the metacompiler we define an escape, thus:

  \f  compile from .forth.
  \c  compile from .compiler.
  \o  compile from .outside.
  \i  compile from .inside.
  \a  compile from .assembler.
  \t  compile from .target.  (occasionally useful if .target. words are
                              defined using create/does>)

As a concrete example, assume we are writing an outside word that happens to depend on both words from .forth. and .outside.. Since it would be folly (I think – I may address this question further later on) to define an interpreter mode that searches .outside. then .forth. (or vice versa), we instead use the regular Forth compiler to define .outside. words – which means that every use of an .outside. word within the definition must be prefixed with \o. This might seem clumsy and verbose, but it has the advantage of being unambiguous.

A key point to remember is this: while defining all the metacompiler words we are running the normal muforth interpreter and compiler: only .forth. and .compiler. are being searched. If we want to reference an .outside. word that we just defined, we have to be explicit about it: eg, “\o here” .

We can now sketch the possible “consume” functions for various interpreter modes. The .outside. words remote, compile, and number, respectively do remote execution on the target, compilation of a procedure call for the target, and compilation of a literal for the target. (In defining the following modes, I’ve left off the prompts for clarity.)

 -:  .assembler. find  if  execute ^ then
       .outside. find  if  execute ^ then  number ;
 mode meta-assembler
 -:  .outside. find  if  execute ^ then
      .target. find  if  \o remote ^ then  number ;
 mode meta-outside
 -:  .inside. find  if  execute ^ then
     .target. find  if  \o compile, ^ then  \o number, ;
 mode meta-inside
 -:  .target. find  if  \o remote ^ then  number ;
 mode target

Things to note:

Once we have written all the parts of our metacompiler, the only thing left to do to start it up is to change the interpreter mode to meta-outside.

In order to be able to switch among the modes defined for the metacompiler we’ll probably want something like:

  : -]   ( start the meta colon compiler)  meta-inside ;
  :  ]   \o literal  \o -] ;
  : :    \o name  \o -] ;
  : [    ( switch to outside)  meta-outside ;
  : ^    <compile return on target> ;
  : ;    \i ^  \i [ ;

These are analogous to the code in .forth. and .compiler..

If code and end-code bracket an assembler definition, we might define something like this:

  : code  \o name  meta-assembler ;
  : end-code  meta-outside ;

The Janus effect; or, the two faces of defining words

A subtle and terribly confusing aspect of metacompilers is the definition of defining words, like : and variable. The trouble stems from their dual nature: they have two faces, one facing the host Forth, and one facing the target.

All defining words are, by definition, create/does> words. Normally what happens between create and does> is that we write code to build a data structure, and between does> and ; we write code that is executed to do something with the data. The difficulty is that, in a metacompiler, the words between create and does> execute on the host, and the words between does> and ; execute on the target.

We are already confused enough by the aliasing of word names that naturally happens in a metacompiler; now we have to deal with words (defining words) that draw from two (or more) different sets of words, and have several distinct time phases built into them as well! Yow!

This potential for utter confusion led me to move away from my past efforts at “ease” and “naturalness” in the writing of defining words, towards a more explicit way that requires the author to specify the source of all non .forth. words.

The best way to illustrate is with an example, followed by a discussion of each token and its execution behavior.

Let’s define a target defining word for arrays. Like all target defining words, it goes into .outside..

  : array  \o create  \o ram  swap  \o cells  \o allot
           \o does>

Of course here my beautiful system breaks down.

I need a special does> for defining words.

So, you see that meta-compilation is hard. I’ve thought about this many times over many years, and I still can’t get it right. ;-)

The end

I hope this helped to explain how muforth works and why I think it’s cool. I hope you’ll think it’s cool too, and use it to do interesting things!

I’d love to get feedback on the ideas here.