dietST

This project was created to fit a whole Smalltalk computer into an FPGA with only 15 thousand gates. The original design for Oliver was like a rather extended version of the MISC Forth chips and it seems that 16 bit versions of these barely fit in this FPGA and certainly don´t leave enough room for the peripherals.

So space is traded off for time whenever possible. Though the architecture is 16 bits wide, all of the data paths are only 4 bits even though that means simple operations take at least four clocks to execute. In addition, there are as few registers as possible with all data being stored in memory or three special caches (described below).

Stack Cache

This is divided into sixteen blocks of 16 words (of 16 bits) each. Each block holds the contents of a Context object. Block zero is reserved for interrupts and traps while the others cache a part of the current stack.

For a context with A arguments and T temporaries, the contents of the corresponding block are:

0	method
1	program counter
2	sender context
3	stack pointer
4	arg0
.	...
3+A	argA
4+A	temp 0
.	...
3+A+T	tempT
4+A+T	stack element 0
15	stack element 11-A-T

Methods which need more than 11 words of arguments, temporaries and evaluation stack can't run on this architecture, but the compiler can rewrite such code as nested blocks to get around this restriction.

Object Table Cache

This is divided into 64 blocks of four words each. Each block holds the first four words of some object. These entries are a subset of the system Object Table, which takes up 128KB of main memory at a fixed address. Each entry has the following format:

0	object ID
1	code object
2	physical address (low)
3	physical address (high) and GC flags

In the actual Object Table, the first field is redundant since the object ID is the same as the entry's index in the table (but with bit 15 set). But it is needed in the cache to check that we have the right entry.

The code object is like a class in Smalltalk or map in Self in that it is used to indicate an object's "type". It is a vector that includes the compiled code for all of the object's methods (both local and inherited from parent objects). At the begining of each code object there is an entry for each possible selector in the system. This takes up a lot of space since about 95% of these entries just call the "message not understood" method. Each entry is two words long, so very short methods can be entirely contained in there. Longer methods can have a jump instruction to the rest of the code after the selector table. The "method" field in the stack cache is actually a pointer to a code object.

The fields of the objects are not cached, but always read/written directly from memory. That is unfortunate, but there is simply not enough RAM on the chip to deal with this.

Instruction Cache

This is divided into 128 blocks of four words each. Instructions are addressed on a nibble (4 bit) boundary. Each entry looks like:

0	code object
1	program pointer (PC)
2	instructions at PC to PC+3
3	instructions at PC+4 to PC+7

The first two words are used to compare with words 0 and 1 of the current context to see if we have the right entry. Instructions have the following format:

s s

where "i" is the instruction opcode (0 = push literal, 1 = send message), "x" is the extra bit that is added to the operand (explained below) and "ss" is the size of the operand in nibbles (but 0 means 4 nibbles, or 16 bits).

For the push literal instruction, the extra bit is used to set bit 15 in the case of operands with 12 bits or less. This allows both small integers (bit 15 = 0) and normal objects (bit 15 = 1) to be stored in the stack.

When a send instruction has an operand of less than 16 bits, then the lower 4 bits are "x 0 0 0" (where "x" is the extra bit from the instruction nibble) and the operand (extended with zeros) specifies the top 12 bits. When the operand is exactly one nibble long then one of these primitive operations is executed instead of an actual message send:

operand	x	selector	stack growth	description
0 0 0 0	0	dup	1	duplicates the element on the top of the stack (TOS)
0 0 0 0	1	pop	-1	eliminates the element from the TOS
0 0 0 1	0	swap:	0	exchanges TOS with the element next to the top of the stack (NOS)
0 0 0 1	1	over:	1	pushes a copy of NOS on the stack
0 0 1 0	0	at:put:	-3	finds the field NOS in object TOS and stores the word in the third element in the stack there. all three words are popped from the stack
0 0 1 0	1	at:	-2	finds the field NOS in object TOS and pushes its content on the stack in place of the previous two words
0 0 1 1	0	.	0	.
0 0 1 1	1	.	0	.
0 1 0 0	0	addInteger:	-1	adds TOS and NOS replacing them with the result
0 1 0 0	1	subInteger:	-1	subtracts NOS from TOS
0 1 0 1	0	shiftLeft	0	multiples TOS by two
0 1 0 1	1	shiftRight	0	divides TOS by two
0 1 1 0	0	andInteger:	-1	bitwise AND of TOS and NOS
0 1 1 0	1	orInteger:	-1	bitwise OR of TOS and NOS
0 1 1 1	0	notInteger	0	inverts all bits in TOS
0 1 1 1	1	xorInteger:	-1	bitwise XOR of TOS and NOS
1 0 0 0	0	.	0	.
1 0 0 0	1	.	0	.
1 0 0 1	0	.	0	.
1 0 0 1	1	.	0	.
1 0 1 0	0	popTemp:	-2	NOS is stored in the field indicated by TOS in the current context
1 0 1 0	1	popFarTemp:	-2	NOS is stored in the field indicated by TOS in the frame also indicated by TOS
1 0 1 1	0	pushTemp	0	the field indicated by TOS from the current context is fetched and replaces TOS
1 0 1 1	1	pushFarTemp	0	the field indicated by TOS from the frame also indicated by TOS is fecthed and replaced TOS
1 1 0 0	0	enter	4	a new frame is allocated and becomes the current one after its first four words are filled in
1 1 0 0	1	grabArg	1 / -1	the TOS from the sender frame is popped and pushed into the current one
1 1 0 1	0	.	0	.
1 1 0 1	1	.	0	.
1 1 1 0	0	jump	-1	TOS is popped into PC
1 1 1 0	1	nop	0	doesn't do anything (any instructon not defined acts as a NOP, but this is the "official" one)
1 1 1 1	0	jumpOnZero	-2	TOS is popped into PC if NOS is zero. NOS and TOS are eliminated in either case
1 1 1 1	1	return	-1 / 1	TOS is pushed on the sender's stack and the sender becomes the current context

Note that a normal message is sent if the operand is 8 bits or more, even if the bit pattern is one from the above table, so the first 32 selectors can be used for regular messages without confusion.

Leaf Methods

Simple methods that don't invoke any other methods (not counting the primitives in the above table) are called "leaf methods" and account for a large part of all message sends. To optimize the small ones, the "enter" instruction can be omitted so that a new stack frame is not created when executing this method. The temporary PC points to code in the new method, but the PC field of the stack frame remains as it was, so after two words of instructions the execution automatically returns to the sender.

The most common use for leaf methods is as accessors for slots in the objects. If we had something like (in Self notation)

   ( | x <- 3.4. z = 7 | )

we would have the following inside the code object:

x	literal 0. swap:.
x+4	at:. nop.
x:	literal 0. swap:.
x:+4	at:Put:. nop.
z	pop. literal 7.
z+4	nop. nop.

Links to this Page

Desenvolvimento do Oliver last edited on 9 November 2008 at 6:08:02 pm by 192.168.1.23
Other last edited on 1 April 2011 at 8:48:28 pm by 192.168.2.3