Oliver

The hardware platform for the Oliver truck terminal includes:

a simple stack processor using a Xilinx Spartan II XC2S50 programmable chip (the board accepts from a XC2S15 up to a XC2S100)
a 256 K by 16 Flash memory (or up to 2 M by 16)
a Xilinx CPLD XC9536 chip to allow the Spartan II to be loaded from the parallel Flash memory
a 4 M by 16 SDRAM chip

When data is being loaded into the FPGA the Flash is configured as 8 bits wide in order not to waste half of the first 140 K (in the top 256K of the memory chip).

Here is the project log (in Portuguese only, sorry): Desenvolvimento do Oliver
And the previous one: Teste do Oliver

Another page has an example showing how the instruction set is used with blocks that share read/write data with their enclosing method.

Objects and Instructions

The processor has 16 bit wide data paths, but virtual addresses for memory are two groups of 16 bits: the object number (normally from the self register, see below) and the field number or offset. The memory controller includes a virtually addressed cache. On a cache miss, a 32 bit physical address is generated and four words are transferred to/from main memory.

Small integer objects have a special representation and are not stored in the cache. The top 15 bits of the integer reference are the value itself, while the lowest bit is the "tag". Since for small integers that tag's value is zero, all 16 bits can be used in additions and subtractions and the result will automatically be the correct small integer. Some other operations, like shift, are more complicated.

bits 15 to 1	bit 0
i i i i i i i i i i i i i i i	0

The first four words of each object are found in an Object Table which takes up the first 128Kwords of memory. Just multiply the object ID by four (only bits 15:1) and add the word number. The first two words contain the high and low 16 bits of the physical address of the rest of the object (if it is more than four words long).

Given an object ID, it is easy to find the corresponding "code object" by just masking some bits. If the bottom bit of the self register is zero, then the object ID is also zero (this is used for integer objects), but if it is one then the mask depends on bits 15 and 14 of the self register:

bits 15 and 14	bits 14 to 1 (mask)	bit 0
0 0	. 1 1 1 1 1 1 1 1 1 1 1 0 0	1
0 1	. 1 1 1 1 1 1 1 1 0 0 0 0 0	1
1	1 1 1 1 1 0 0 0 0 0 0 0 0 0	1

Instructions in code objects are simply a sequence of 16 bit words with a mix of pairs of bytecodes and object references. The garbage collector can't distinguish between the two, but since these same object references are also present embedded in the source code there is no problem.

The main instruction format for a bytecode is a four bit operation field followed by a four bit operand field:

.	??00xxxx	??01xxxx	??10xxxx	??11xxxx
00??xxxx	prefix	constant	specialA	specialB
01??xxxx	send	jump	jumpFalse	jumpTrue
10??xxxx	literal	temp	indirect	var
11??xxxx	return	tempPut	indirectPut	varPut

Here is a description of each instruction (with operation codes in hex):

code	instruction	details
0x	prefix	extends the operand of the following instruction by four bits
1x	constant	pushes a constant value indicated by x (as described below) on the stack
2x	specialA	sends message "a" to the hardware object x
3x	specialB	sends message "b" to the hardware object x
4x	send	sends the message at pc#x (see below) to the object at the top of the stack by setting pc to that value after saving pc and self to the return stack
5x	jump	sets pc to pc$x (see below)
6x	jumpFalse	pops the top of the stack. If that was false then sets pc to pc$x
7x	jumpTrue	pops the top of the stack. If that was true then sets pc to pc$x
8x	literal	pushes the word at pc#x to the stack
9x	temp	pushes the xth word from the top of the stack to the stack
Ax	indirect	pushes word 0 (but see below) of the indirect vector at the xth word from the top of the stack to the stack
Bx	var	pushes the xth word in self to the stack
Cx	return	pops the stack to the xth word from the top of the stack and makes that the new top of the stack. Pops pc and self from the return stack.
Dx	tempPut	pops the stack to the xth word from the top of the stack
Ex	indirectPut	pops the stack to word 0 of the indirect vector at the xth word from the top of the stack
Fx	varPut	pops the stack to the xth word of self

When expanding the operand from 4 to 16 bits in the constant instruction, bit 3 of the operand goes into bits 15 through 5 of the top of stack when bit 0 is zero, and bits 3 to 1 of the operand replace bits 5 to 3 of 0000000000abc0d1 (where d = b or c) and the result is pushed on the stack when bit 0 is one. This allows us to push the following 16 objects: 0, code for smallIntegers, 1, nil, 2, false, 3, true, -4, code for code objects, -3, localMem, -2, specialA, -1, specialB.

The notation pc#x used in the above table means the the least significant bits of pc are replaced by the value of x. The number of bits actually replaced depends of how wide x is, which is normally 4 bits but can be extended with one or more prefix instructions. X is shifted relative to pc in order to address words. The notation pc$x is the same thing but without the shift, so bytes are addressed instead.

The explanations for indirect and indirect: say that word 0 in the array is affected. That is true if x is only 4 bits wide, but if it has been extended with one or more prefix instructions then the lowest four bits select the array in the stack but the higher bits select a word other than 0 in the array.

Those familiar with stack machines might think that the instruction set described so far is not sufficient for fully manipulating the stack. On one hand full manipulation is not necessary for code generated from Smalltalk sources and on the other special cases of the above instructions have exactly the same effect as the "missing" instructions:

normal stack instruction	equivalent sequence
pop	tempPut 0
dup	temp 0
over	temp 1
swap	temp 1; temp 1; tempPut 3; tempPut 1

While these basic instructions do the needed control flow and data manipulation, all actual processing is done by sending a special message to a hardware object. There are two special messages, called "a" and "b", which combined with sixteen possible hardware objects make for a total of 32 processing instructions. Note that with one prefix instruction we can have an additional 480 processing instructions (16 bits wide), but we only use 8 of them here: the first four of each special message type are "raw" equivalents of their corresponding one byte instruction. So if 20 is add then 0120 is rawAdd which operates on all 16 bits instead of just valid 15 bit integers. All of these instructions take any operands they need from the stack and push their result back to the stack. When a hardware object is absent or if it can't execute the requested instruction for some reason (incompatible operands, overflow, etc.) then the instruction is interpreted as a normal send to either the specialA or specialB software object with the object number as the message selector.

These are the one byte stack instructions (where T means the element on the top of the stack and N is the element right under T):

code	instruction	details	code	instruction	details
20	add	adds T and N	30	sub	subtracts N from T
21	mult	multiplies T and N	31	rotate	rotate or shift N as indicated by T
22	and	logical and of T and N	32	andnot	logical and of T and not N
23	or	logical or of T and N	33	xor	logical exclusive or of T and N
24	less	pushes true if T is less than N else false	34	greater	pushes true if T is greater than N else false
25	equal	pushes true if T is equal to N else false	35	int	pushes true if T is an integer else false
26	test	the result of ANDing T with the data from the i/o port is pushed on the stack	36	clear	the result of ANDing the inverse of T with the data from the i/o port is saved to the port
27	set	the result of ORing T with the data from the i/o port is saved to the port	37	toggle	the result of XORing T with the data from the i/o port is saved to the port
28	at	the word indicated by T in self is pushed on the stack	38	atPut	N is stored in the word indicated by T in self
29	next	the four word array (object, index, step, limit) indicated by T is used as a stream from which a word is fetched and pushed on the stack	39	nextPut	the four word array (object, index, step, limit) indicated by T is used as a stream to which N is stored
2A	atEnd	push true if the second element in the array indicated by T is greater than the fourth element else false	3A	rewind	sets the elements of the (object, ?, ?, ?) array indicated by T to (object, 0, 1, object size)
2B			3B
2C	thisCtx	pushes (or makes) a reference to the current context	3C	pushSelf	pushes a reference to the current receiver
2D	jumpInd	sets the PC to the value of T	3D	perform	indirect send of T to object N (not popped)
2E			3E	mkind	makes an indirection array with T elements initialized from the data underneath that on the stack
2F	block	makes a block object which will do a local return with T+2 elements, one extra one N as the initial PC, another is "self" and the rest is initialized from data under N on the stack	3F	retblk	makes a block object which will do a non local return with T+4 elements, one extra one N as the initial PC and another extra one "self", another the data stack pointer, another is the return stack pointer and the rest is initialized from data under N on the stack

A feature of the translator is that it will hardcode popular control structures, such as "ifTrue:" and "whileFalse:", using the jump instructions. This is used in all Smalltalk implementations except for Self and is limiting compared to what is achieved by Self's famous adaptive compilation technology, but it will have to be enough for this simple implementation.

Local Memory and I/O Ports

The internal registers and memory of the processor are mapped into a Local Memory address space which is accessed as a special object.

start	end	use
0000	0007	i/o ports for tasks 8 to 15
0008	01FF	data and low level code
0200	02FF	data stack cache for each task
0300	03FF	return stack cache for each task
0400	07FF	data/instruction cache
8000	800F	receiver for each task
8010	801F	pc for each task
8020	802F	data stack pointer for each task
8030	803F	return stack pointer for each task

In addition, each task is associated with a particular i/o port which is used implicitly by the test, set, clear and toggle instructions. A task still has full access to the i/o ports for other tasks, but that requires a longer sequence of instructions.

Blocks

The code for a block object will be included in the same code object as the method in which it appears (as is traditional for Smalltalk implementations). Some blocks end with a non local return and are created with the retblk instruction while the rest are created with the block instruction instead. Any read-only state needed by the block is copied to it so that it doesn't have to reference its lexically scoped external context. For the case of read/write state shared between one or more contexts, that is placed in an indirection array. The array itself can be seen as read-only state so the reference to it can be copied as before.

Input and Output

Besides the processor, the FPGA implements a set of I/O devices:

ntsc video output - 8 bits per pixel (2 bits for value, 2 for saturation and 4 for hue. This allows 133 unique colors) at 320 by 234 pixels
smart LCD output - a 20 character by 4 line LCD, for example. It can also interface to a small graphical LCD.
serial I/O - can either be one RS232 interface with two control lines or two separate RS232 interfaces.
USB - can't be used with a XC2S15
raw keyboard input - can have from 20 to 70 keys

Links to this Page

Desenvolvimento do Oliver last edited on 9 November 2008 at 6:08:02 pm by 192.168.1.23
dietST last edited on 22 August 2003 at 4:56:49 pm by gandalf.merlintec.com
I/O Processor last edited on 1 December 2011 at 9:06:05 pm by 192.168.2.3
PICMode last edited on 28 December 2007 at 6:43:25 pm by 192.168.1.23