Code FormatWhile Smalltalk and Java defined virtual machines with dozens of bytecodes, Self traditionally has done its job with only eight (Self 4.1 has added a few more). All of them are designed around the concept of a stack machine, or a zero address machine.
The description of the current code format can be seen in the Oliver page. This is optimized for a hardwired hardware implementation. For example: in software it is cheaper to add than to replace bits but in hardware rearranging bits is just wires and so extremely cheap.
All addressing is done relative to the stack top rather than some frame base, so the memory bandwidth for saving and restoring frame pointers is saved. This requires the compiler to adjust addresses as the stack grows and shrinks and the return instruction must know how much to trim back the stack to eliminate the current frame.
Another optimization that makes the stack simpler in the most common case is the scheme adopted in VisualWorks 5 and Squeak VI4 where blocks don't directly access previous frames. Rather they copy any read-only information they need and store any shared information in special indirection arrays. The example given in the VisualWorks paper is:
inject: thisValue into: binaryBlock
|nextValue| nextValue := thisValue. self do: [ :each | nextValue := binaryBlock value: nextValue value: each]. ^nextValueThis has one shared mutable data (nextValue) and one shared read-only data (binaryBlock). By using an indirection array both shared data become read-only:
inject: thisValue int: binaryBlock
|indVec| indVec := Array new: 1 with: thisValue. self do: [ :each | indVec at: 1 put: (binaryBlock value: (indVec at: 1) value: each)]. ^indVec at: 1Using regular array operations this would be a bit slow and long, so a small set of special instructions take care of the indirections. We translate the main method to these bytecodes:
The code for the block is translated to:
The numbers in the descriptions (like var 2) refer to what the original Smalltalk bytecodes would use. Not only does the varying stack complicate the mapping to what the actual bytecode uses, but the fact that temporaries grow in the opposite direction that arguments adds to the confusion:
A copy of "self" is present in the return stack at a known depth and a special bytecode can fetch it when needed.
The compiler isn't very smart so the above code has a few redudant instructions, like fetching "binaryBlock" from the block data into a temporary even though that is only used once.
Note that this example is the extreme case and most code wouldn't have all these indirections and copying.
Smalltalk-80 ("Blue Book")The original specification for the Smalltalk virtual machine tried to have a very compact encoding with a basic instruction length of one byte. This is described in detail in chapter 28 of the book "Smalltalk-80: The Language And Its Implementation" by Adele Goldberg and David Robson. This was known as the "blue book" and the current version ("purple book") no longer includes the chapters about the implementation, but fortunately these are
Previous Design (presented at OOPSLA 2003)Everything is described in terms of objects and messages. The hardware defines a set of "primitive objects", each of which can understand two distinct messages. The basic bytecode format is
where "rrr" indicates one of the eight primitive objects as the receiver, "m" is the message that is sent to it and "oooo" is a four bit argument.
Two special registers hold copies of the top elements of the stack: T and S. The stack pointer indicates the memory position corresponding to S.
Operations marked in red might cause a trap to the stack cache overflow/underflow software. Operations in green might cause a second level icache miss, which is handled in software (the first level miss is handled by the hardware). Operations in blue might cause a data cache miss, which is usually handled by hardware but might trap to software when the object is not found (either its segment isn't loaded or it only exists in a compressed state).
In addition to the operations shown above, all instructions except the extend ones start with op:=op<<4+ir/\15 and end with op:=0. All instructions except jump and indirect send end with vpc:=pc+. There is an extra fetch cycle that does ir:=icache[vpc]. The "mask" that appears in some instructions depends on the current size of op either as is or converted from words to bytes, depending on the instruction.
The send: and send:to: instructions are particularly interesting because they add a level of indirection. The latter is the regular high level message send and expresses most of the elements in the original source code. The direct send: is interpreted as also sending a one bit message to a selected primitive object but without the extra argument of the basic bytecodes. And the "address space" of these primitive objects is separate from the one in the basic bytecodes (and isn't limited to three bits, thanks to the extend bytecodes), as shown in the following table:
Only 28 of the possible 31 "short instructions" have been defined so far. All 256 instructions are shown in this table (the most significant bits are in the lines):
It should be obvious that several of these instructions are a really bad idea or don't make sense, and so should never be executed. They are marked in red in the above table. The ones marked in blue would be ok except that word 0 in each eight word block in the instruction cache is reserved for the vpc value for that block. Both kinds of instructions are there to make the project more uniform and simpler.
It would seem that an instruction such as "ext 0" would have no effect, but it actually changes the mask value for the following instruction and that could change its meaning. Any instruction not defined above is considered a nop (no operation).
SelfSelf 4.1.2 uses four bits for the op code field and four bits for the index field:
For the extended instructions (opcode 4) the index field has the actual opcode:
In Self 4.1 and older, there were only 8 bytecodes. The bottom five bits were the operand and the top three bits were the opcodes:
Older AlternativesIs it possible to have even fewer than 8 instructions? Well, two of the Self bytecodes implement regular and directed resends (messages to super, in Smalltalk terms) but they are so rarely used that they could be handled as primitives (which use the regular send bytecodes in Self) without much overhead. And the "push self" bytecode is redundant since each method context has a slot named "self" and so sending a message with this selector to it would yield the exact same result.
The "extend index" bytecode is only needed because of the small size of the literal index field in the bytecode (5 bits allows at most 32 literals per methods). If we were to match each instruction directly with its literal operand we could drop this bytecode and would need only 2 bits to encode the remaining four instructions. The problem is that now literals which appear repeatedly in the source will no longer use up a single literal vector entry. Surprisingly, there is a net saving of space with this approach. See:
Can we continue this trimming even further? Sure, if we don't mind some more radical changes in the virtual machine. First we allow slots to contain not only regular objects and methods but "continuations" as well. For more about continuations, see the Lisp dialect Scheme. Now when we create a new context object, not only do we include a "self" slot filled with the receiver but also a "^" slot pointing to the sending context (directly or suitably wrapped if we need to distinguish contexts from continuations). Sending the up arrow message ("^" is the poor ASCII approximation), with some expression as an argument, to the current context (an implicit self send) will give us the exact effect of a non local return in Self or Smalltalk. Simply not having these slots for block contexts (they would inherit them from their outer context) does the right thing automatically.
Now all we have left is message sends and "push literal". Most objects can't be used as message selectors and we could modify that to "no objects can be used" by creating a special selector type instead of using regular immutable strings. Such a change could be a good idea in any case for the persistent object system. I'll take an example from the Smalltalk "blue book":
merge: aRectangle | minPoint maxPoint | minPoint <- origin min: aRectangle origin. maxPoint <- corner max: aRectangle corner. ^ Rectangle origin: minPoint corner: maxPoint
and rewrite it as Self:
merge: aRectangle = ( | minPoint. maxPoint | minPoint: origin min: aRectangle origin. maxPoint: corner max: aRectangle corner. Rectangle origin: minPoint Corner: maxPoint )
Note that there are no literals here (see the part about syntax for more on that) and there is no need to have an up arrow before the last expression (though it would still work if there was one). We still need one bit to distinguish between regular message sends and implicit receiver message sends, so I will use color for that. Green selectors will be sent to the top of the stack while red selectors will be sent to the current context. Vectors will be shown as space separated elements between parenthesis, and code will just be a vector of selectors:
( aRectangle origin origin min: minPoint: aRectangle corner corner max: maxPoint: maxPoint minPoint Rectangle origin:Corner: )
Due to a lack of "pop" or "drop" instructions, this method will accumulate useless stuff on the stack as it executes but that is not really a problem. This looks a lot like Forth:
arguments receiver selector
We might make it more Lisp-like instead by reversing that order. An interesting alternative would be an infix notation similar to that used in the source:
receiver selector arguments
So our vm level code for the example method would look like:
( ( minPoint: ( ( origin ) min: ( aRectangle origin ) ) ) ( maxPoint: ( ( corner ) max: ( aRectangle corner ) ) ) ( ( Rectangle ) origin:Corner: ( minPoint ) ( maxPoint ) ) )
We don't need colors to separate the regular and implicit receiver messages since the latter are always the first selector in each subvector while the former are always the second element.
This is close enough to an Abstract Syntax Tree (AST) representation of the code that it is easy to do all kinds of code manipulation. It isn't too easy to read, however, and the large number of nested vector implies a lot of overhead. One simplification is possible since the first selector in an argument expression can be always considered an implied receiver send even if it isn't in a nested vector:
( ( minPoint: origin min: aRectangle origin ) ( maxPoint: corner max: aRectangle corner ) ( Rectangle origin:Corner: minPoint ( maxPoint ) ) )
This is harder to manipulate since it is closer to the unparsed source, but it is actually just as easy to interpret. One additional step in this direction would be to include a special "." token to separate top level expressions and arguments instead of using nested vectors:
( minPoint: origin min: aRectangle origin . maxPoint: corner max: aRectangle corner . Rectangle origin:Corner: minPoint . maxPoint )
And so we are back to a token stream format for virtual machine code, just like what the very first Smalltalk implementations used. Though this last format is the most compact (after the Forth-like one), the more structured one above it might be the best choice.