A starting point on virtual machines and architecture

Stack machines

Learning about stack machines is an adjustment for someone like myself, since I’m used to register machines. I tried my hand at it, reading the last chapter of, “Smalltalk-80: The Language and its Implementation,” to look at the description of, and the source code for the Smalltalk VM, and got lost pretty quickly.

I read articles on Stack machines, Burroughs large systems, and the Burroughs instruction set on Wikipedia. I think these are good backgrounders for what one needs to understand before digging into an implementation.

Trivia note: Four years ago, just for exploration, I tried my hand at learning the Forth programming language, reading “Starting Forth,” by Leo Brodie, and doing the exercises. If you know Forth, and look at the Burroughs instruction set, you will see similarities. Charles Moore got inspiration for creating Forth from the Burroughs systems.

I also found the following video helpful in organizing my thoughts on this subject. I was reading “Squeak: Open Personal Computing and Multimedia,” by Mark Guzdial and Kim Rose, and the chapters on the Squeak VM and its memory management got me asking some questions about the basic components of a VM. The title of the video below is “How to build a virtual machine,” but it’s really about “How to build a virtual stack machine.” The term “virtual machine” means a simulation of a computer architecture. In the video, Terrance Parr showed how to build a very simple virtual stack machine in Java. I like the scale of this example. It’s not threatening. If you have basic programming skills, you can do everything you see in this walk-through, and get a stack machine that works! He did not, however, demonstrate how to build a compiler, so you can write high-level source code for the stack machine, nor a garbage collector for memory management, nor a debugger, so you can step through the code, look at values interactively, and trace through the stack. 🙂 However, he demonstrated a trace function that you can use for primitive debugging.

He created some simple programs to execute in bytecode, by writing some extra code as part of the VM project. The set of bytecodes (just a set of numbers) are submitted to the virtual processor by making a function call. The result of each program (a single value) shows up on the virtual machine stack.

Parr’s presentation gets a bit disorganized when he talks about the call stack. Just have some patience. He clears it up towards the end.

Edit 6/22/2020: I realized once I tried following Terence’s presentation, to recreate his VM, that he didn’t clearly show all of his code. You can see most of the code he uses by pausing the video when he has parts of it shown in his IDE, but there’s some he doesn’t show at all, or he modifies it a lot. So, I’m going to paste in some code here to try to clear up the left out, and confusing spots.

He changed the cpu() routine a lot. So, I figured I would show how it ended up in the later part of this presentation, when he added stack tracing, and memory dump capability. Also, I’ve included the code he used for an important routine for tracing, called stackString().

void cpu()
      int opcode = code[ip];
      if(trace) System.err.printf("%-35s", disInstr());
      // case statements for bytecode operators
      if(trace) System.err.println(stackString());

      System.err.printf("%-35s", disassemble());

   if(trace) System.err.println(stackString());
   if(trace) dumpDataMemory();

String stackString()
   StringBuilder buf = new StringBuilder();
   for(int i = 0;i <= sp; i++)
      int o = stack[i];
      buf.append(" ");
   buf.append(" ]");
   return buf.toString();

In the third phase of his presentation, when he showed how to compute Fibonacci numbers with this VM, he didn’t show some bytecode operators that are necessary for making the Fibonacci program work. I’m showing the code here for those bytecodes, which I wrote myself. I must admit, I have not tested this code in Java, as I translated Terence’s code into a different language to try it out. If this doesn’t work, I apologize in advance. I can say the logic is sound. So, any errors this code generates just requires a little debugging on my misunderstanding of how Java specifies things.

These case statements go inside the switch statement in the cpu() function. Note that Terence created a couple enumerations, TRUE and FALSE, which hold the values 1 and 0, respectively, which you should include in your code, as well:

case ILT:
   int val1 = stack[sp—];
   int val2 = stack[sp—];
   if (val2 < val1)
      stack[++sp] = TRUE;
      stack[++sp] = FALSE;
case ISUB:
   val1 = stack[sp—];
   val2 = stack[sp—];
   stack[++sp] = val2 - val1;
case IMUL:
   val1 = stack[sp—];
   val2 = stack[sp—];
   stack[++sp] = val2 * val1;

You may or may not notice this, but in the rush of assembling the code for his presentation, he ended up duplicating code in disInstr() and disassemble() (he shows the code for both). When he’s through putting everything together, these functions end up with the exact same code in them, though his code calls both of them. Of course, under normal circumstances, he would’ve just used a single function to disassemble the bytecodes.

To get some ideas for “where to go from here,” I liked reading How to build a simple stack machine, by Nicola Malizia. Though Nicola’s English is wobbly, what I like is that he sketches out some basic logic for implementing control structures, variables, and procedures in a way that’s mostly compatible with the machine model that Terence laid out. An element he adds is labels for branching, so we can have mnemonics for addresses, instead of just numbers. Nicola defined a language for doing all of this.

Where he departs from Terence’s model is he implements what I’d call an interpreter, not a VM, but these distinctions get blurry. He doesn’t create an assembler for his machine. So, he has no bytecodes to execute. In Nicola’s scheme, the instruction pointer literally points to an array of instruction strings that are parsed, as the program executes. However, he implements a stack, with stack frames.

Getting acquainted with the concept of architecture

In my learning process, I’ve liked working with a low-scale (8-bit) architecture from my youth (in an emulator), based on the 6502 processor. It’s kept things more comfortable, as I’m getting used to ideas about using computing for modeling. I really enjoyed the presentation below, by David Holz, which is about a 16-bit VM that runs on another vintage 8-bit computer, the Commodore 64. The C-64 uses a 6510 processor, a very close relative to the 6502. This 16-bit VM is called ArcheronVM.

If you’re not already familiar with it, you’ll probably want to learn some 6502 assembly language, to understand what Holz talked about, since his scheme uses 6502 code as a kind of microcode for the 16-bit virtual processor. 6502 assembly is a simple CISC instruction set (compared to today’s processors).

What I particularly like is his presentation gets across a basic idea in something Alan Kay told me about many years ago, when I first started corresponding with him:

[T]o me, most of programming is really architectural design (and most of what is wrong with programming today is that it is not at all about architectural design but just about tinkering a few effectors to add on to an already disastrous mess).

What I used to understand about the term “architectural design” had to do with designing data structures and APIs (perhaps abstract data types), but that’s not really what Kay was talking about. I had some experience with it by going through the third chapter in SICP, seeing what’s possible with computing mathematical series using lazy evaluation (I wrote about a few exercises in that chapter, at SICP: Chapter 3 and exercises 3.59, 3.60, and 3.62). What comes through in Holz’s presentation is the importance (and power) of architecture in the design of a VM.

A few years back, I read about the Lisp Machines K-Machine, a computer that Lisp Machines, Inc. never released. The linked article talks about a design feature that Holz also used in his architecture, called “tag bits.” It’s a method of encoding meta-information about an instruction, or an address. It was also used in the Burroughs machines.


I’ve been interested in this topic for years, because I had a “2×4 across the head” experience while I was working in IT services in the mid-90s. I was working on a software project that was part of an in-house data-driven application platform. It was an interpreter that implemented a page description language, which also allowed some “scripting.” It was used for generating reports. I did a lot of wrestling with it, since it ran on MS-DOS, and was written in C. The combination of these was an awful experience. At the time, it inspired me to think about how to make the interpreter’s architecture better (because I didn’t know any other approach was possible). I’m using “architecture,” in the old sense of how I thought about it. The approach I used ended up being like trying to graft onto/augment a badly designed machine. In hindsight, what I was really trying to improve upon was the combination of running C on MS-DOS, without modifying the architecture offered by either. My effort produced some improvement, but the code was a lot of make-work. In hindsight, what I came up with was the kind of coding scheme that justified the use of meta-languages with code generators in the 2000s, as it became very repetitive. What I’d say is when you see that pattern developing, it might actually be a signal that you need to improve the architecture you’re using (here, I’m using the term in the new sense), because what you’re really doing is compensating for a systemic weakness, exemplified by the operating system and/or the programming language you’re using. I had no idea how to do that. It wasn’t part of the curriculum in my computer science education, which I regret.

As a commenter to Terrence Par’s video said, “It’s too bad that this isn’t part of a software engineer’s education anymore.” The commenter mentioned a couple examples of where this design method was used to good effect in commercial software in the 1980s, and mentioned Don Knuth’s book, “The Art of Computer Programming,” which uses a VM, called MMIX.

Though, I can’t help reflecting on something else Kay said back in the ’70s, “People who are really serious about software should make their own hardware.” For me, that’s going to have to wait for another day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s