RISC42

This is a simple 32 bit (4 bytes) RISC processor design which uses 16 bit (2 bytes) instructions. The idea is to have an efficient encoding for programs compiled from C, though its new features (skip and cascade) might be a little complicated to include in the most popular C compilers.

For running dynamic object-oriented languages there are several extensions which could be used, either separately or together:

PICMode for fast dispatch and type feedback
BCMode to speed up bytecode interpreters
Object data caches to improve garbage collectors

Registers

RISC42 has 16 visible 32 bit wide registers and three special registers per task: PC (program counter), STATUS and STACK. The PC is a 28 bit byte address that points to a 16 bit word in the virtual (L2) instruction cache that contains the currently executing instruction.

Registers 0 to 14 are the local registers while R15 is always read as having a zero value. There are no global registers but since operations which would normally need these are handled in other threads (with their own locals) this is not a problem.

STATUS

The format of STATUS is:

31 to 24	23 to 14	13 to 4	3	2	1	0
skip0 to skip7	BankA	BankB	N	Z	C	V

The 10 BankA bits in STATUS indicate the bottom block of 16 registers in register memory (which can be up to 16K words in size) while the BankB field indicates the top block. R0 is always in BankA but depending on the alignment (see the STACK registers) a number of registers from R14 down can be in BankB.

The eight skip bits represent the following eight instructions. At the start of each new instruction these bits are shifted left and if what was previously skip0 is 1 then that instruction is turned into a NOOP. All eight bits are cleared by branch or jump instructions.

The N bit indicates that the previous result was negative, the Z bit that it was zero, the C bit that a carry or borrow happened while calculating the result and V that an overflow happened.

STACK

The format of STACK is:

31 30	29 to 15	14	13 to 0
P B	Task	R	SP

Task indicates the currently executing thread.

Bit P indicates that this task is in PICMode and bit B indicates the BCMode. Bit R indicates that the current task is ready to run.

SP indicates the address correponding to R0 in the current task's stack object. The four bottom bits in SP indicate the alignment and so how R0 to R14 should be mapped within the 32 registers pointed to by BankA and BankB.

Instruction Set

The top four bits of the instruction select between different groups (instructions shown in the same color have the same general format):

0XXX - SKIP	1XXX - BRANCH	2XXX - MATH	3XXX - MATH#
4XXX - SHIFT	5XXX - SHIFT#	6XXX - LOGIC	7XXX - LOGIC#
8XXX - JUMPL	9XXX - JUMPL#	AXXX - STB	BXXX - STB#
CXXX - LD	DXXX - LD#	EXXX - ST	FXXX - ST#

BRANCH

15 to 12	11 to 8	7 to 0
0 0 0 1	condition	offset

The following conditions are encoded:

11 to 9	condition	name for bit8=0	name for bit8=1
0 0 0	Z	not equal	equal
0 0 1	C	not carry	carry
0 1 0	N	positive or zero	negative
0 1 1	V	no overflow	overflow
1 0 0	N^V	greater than or equal	less than
1 0 1	Z=1 or N^V	greater than	less than or equal
1 1 0	Z=1 or C=0	higher unsigned	lower or same unsigned
1 1 1	true	never	always

The offset is a signed 8 bit value which is added to PC if the condition is true.

SKIP

15 to 12	11 to 8	7 to 0
0 0 0 0	condition	skips

The conditions are exactly the same as for the branch instructions. If the condition is true, then the eight skip bits in the instruction are ORed with their respective bits in STATUS. This makes it easy to have some instructions skipped when condition1 OR condition2 are true - you just have to take into account how the bits in STATUS will have been shifted between the first skip instruction and the second one.

To skip when condition1 AND condition2 are true you make the first skip set the bit corresponding to the second skip instruction, and then make that set the bit for the actual instructions. If you want to execute some instructions depending on some conditions instead of skipping them, then you have to invert everything using De Morgan's rule.

MATH

15 to 12	11 to 8	7 to 4	3 to 0
0 0 1 #	K A N C	dest	source

Bit # indicates that source is not a register but a 4 bit immediate value. In the assembly source this is indicated by prefixing the source operand with "#" (and not by adding the character to the instruction name, though that is done here since the operands are missing). When an immediate value of 14 (0xE) or 15 (0xF) appears in the source field, the actual value is taken from the following word (16 bit immediates) or following two words (32 bit immediates) respectively. The PC is updated to skip over such extended values and they don't count as instructions when interpreting the SKIP bits.

Bit A indicates that the previous value of dest is added to the result before it is saved back in dest. Bit N indicates that source is bitwise inverted before being used. Bit C indicates the value of the carry in bit. For two instructions, add with carry=1 and add inverted with carry=0, the resulting operation is so rarely useful that the value of the STATUS bit C is used as the carry in instead.

A N C	instruction	name
0 0 0	dest := source	MOV
0 0 1	dest := source + 1	INC
0 1 0	dest := not(source)	NOT
0 1 1	dest := not(source) + 1	NEG
1 0 0	dest := dest + source	ADD
1 0 1	dest := dest + source + C	ADC
1 1 0	dest := dest + not(source) + C	SBB
1 1 1	dest := dest + not(source) + 1	SUB

Cascades

The basic architecture is what we call a "two address machine". Normally two registers are indicated in the instruction with the first being both one of the sources and the destination of the operation.

 dest := dest OP source

The instructions with the "K" bit set are the cascade instructions. The have the same format and operation as the corresponding instructions without the "K" except that the first register is just a source, with the result forwarded ("cascaded") directly into the following instruction.

 dest2 := (dest1 KOP source1) OP source2

It is probably not worth allowing two or more cascade instructions in a row. In that case their dest field would go unused and the whole point of cascades is allowing a more flexible data flow (they aren't faster than normal instructions), so a narrow "pipe" wouldn't be very useful.

When the instruction following the cascade is of an incompatible type (branch, skip, jumpl or memory instructions) then the result is simply lost. The status bits, however, will have been set to the correct values so that a KSUB can have the role of a CMP in other processors when it is followed by a branch or skip.

SHIFT

15 to 12	11 to 8	7 to 4	3 to 0
0 1 0 #	K M S L	dest	source

Bit M indicates that a multiply/divide instruction is executed instead of a shift.

Bit S indicates that the operation should be signed. Bit L indicates the direction of the instruction: left (or multiply) when 1 and right (or divide) when 0. The combination L=1 and S=1 doesn't make sense, so it is used to indicate a rotation instead. Note that rotations can only be expressed indirectly in C:

 dest = (dest << source ) | (dest >> (32 - source))

This has caused designers for some recent processors to eliminate rotation instructions, but gcc is supposed to be able to compile the above expression into one when it is available.

The names of the shift instructions are: SHR, SHL, ASR, ROT, MUL, DIV, MLS and DVS.

LOGIC

15 to 12	11 to 8	7 to 4	3 to 0
0 1 1 #	K A N M	dest	source

The bits from dest and source are logically combined according to the rule specified by A, N and M and the result is saved in dest. The source can be an immediate value if bit i is set.

Bit A=1 indicates that the basic logic operation is AND, otherwise it is OR. Bit M=1 modifies that so A=1 means ANDINVERT (destination is inverted before being used as an operand), otherwise it is XOR. Bit N=1 means that the results should be inverted before being stored in the destination.

 result := N xor [A ? (src and (dest xor M))
                    : (M ? src xor dest : src or dest)]

The names of the logic instructions are: IOR, XOR, NOR, EQV, AND, ANI, NND and NAI.

There are 16 possible logic operations between the source and destination, which can be implemented as (where Rx means the result will be the same no matter what register is used):

rule	operation	RISC42 code	rule	operation	RISC42 code
0	d := 0	MOV Rd,#0	8	d := d and s	AND Rd,Rs
1	d := not(d) and not(s)	NOR Rd,Rs	9	d := not(d xor s)	EQV Rd,Rs
2	d := not(d) and s	ANI Rd, Rs	10	d := s	MOV Rd,Rs
3	d := not(d)	NOT Rd,Rd	11	d := not(d and not(s))	KNOT Rx,Rs ; NND Rd,Rd
4	d := d and not(s)	KNOT Rx,Rs ; AND Rd,Rd	12	d := d	MOV Rd,Rd
5	d := not(s)	NOT Rd,Rs	13	d := d or not(s)	NAI Rd,Rs
6	d := d xor s	XOR Rd,Rs	14	d := d or s	IOR Rd, Rs
7	d := not(d and s)	NND Rd,Rs	15	d := not(0)	NOT Rd,#0

LD/ST

15 to 12	11 to 8	7 to 4	3 to 0
1 1 W #	base	reg	offset

Bit W indicates that reg should be written out to memory (STore) rather than read from it (LoaD). There are three options for base addresses:

 00bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
 11bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

 01ttttttttoooooooooooooooooooooo

 10tttttttttttttttooooooooooooooo

For the first kind the "b" indicates a byte base address (the second variation indicates a negative integer which isn't interesting as a base) and the offset field is added to that. The result is the lowest 32 bits of the 64 bit address, where the top bits indicate the byte vector of the currently executing task (they are extracted from the STACK register).

In the other two options the "t" indicates the task and the "o" the object within that task. There can be up to 32K tasks of which 256 can have 4M objects instead of just 32K ones. Object zero of each task is the task byte vector while object one is the object table for that task. The top bit of "t" is 1 for read-only tasks. The 64 bit address has the base value in the top 32 bits and the offset in the low 32 bits - no addition is performed in this case.

It is an error to ST to an object that is not from the current task or to LD from an object from a read/write task that is not the current one.

STB

15 to 12	11 to 8	7 to 4	3 to 0
1 0 1 #	base	reg	offset

Only the lowest 8 bits in the indicated register are saved to memory. This is done by replicating those 8 bits four times in the data lines and selecting the appropriate enable line based on the two lowest bits of the address.

There is no corresponding LDB instruction, but a LD in which the lowest two bits of the address are not 00 will shift the value before saving it to the indicated register. It is up to the software to clear the top bits or extend the sign as the case may be.

JUMPL

15 to 12	11 to 8	7 to 4	3 to 0
1 0 0 #	base	savePC	offset

This works essentially the same as the memory instruction except that the calculated address is stored in PC. The previous value of PC is stored in the register savePC, which is why this is a "jump and link" ("call") instruction.

SEND

When either the base or the offset have their top bits indicating an object address, the instruction is interpreted as a SEND rather than a JUMPL. The base is used as the receiver (the stack is adjusted so that after the send this is register A0, and the top four bits of the PC are used to save the original address of this register so that return can restore it) and the offset as the selector. An immediate selector must be 32 bits long or it couldn't have its top bit set. A SEND doesn't really get executed but instead invokes the PICMode where an instruction cache entry indexed by the receiver's type and the address of the SEND instruction is used. On a L2 cache miss (or when the receiver belongs to a read/write task different from the current one) software is used to implement the full semantics of message sending.

Extended Instructions

When a JUMPL uses an immediate offset that is not 32 bits long, that offset is interpreted as an extended instruction. These are normally implemented as coprocessors but the programmer can think of them as calls to built-in subroutines.

The value of the base and dest (savePC) registers are sent to the coprocessor at the beginning of the instruction while the dest register is updated from the result generated at the coprocessor at the end of the instruction. When the offset is 16 bits long, some of these bits can be used to indicate registers within the coprocessor itself to be used for the instruction.

The 14 "short" extended instructions should be reserved for the most common operations. Certainly the four basic floating point instructions (FADD, FSUB, FMUL and FDIV) qualify. Tagged addition and subtraction instructions(TADD and TSUB) are also important, even though they ended up never being used on the Sparc. Another common instruction would manipulate the register frames and the stack. This could be done with regular instructions if all registers are addressable (and has to be done with regular instructions when a "no more frames" exception happens), but having a single instruction adjust SP and reload BankA and/or BankB as needed can improve the performance and make the code smaller. This instruction is different from all others since it uses the register addresses rather than their contents to do its job. After it is finished, the register which was originally named by the source address becomes named by the destination address. So "STK R5,R0" slides the stack down (exposing old hidden register) while "STK R0,R4" slides the stack up (hiding registers before a send, for example). The return instruction copies a value from a named destination register to the proper register after restoring the stack to the situation before the send instruction. The source register holds the old PC value/stack change.

9XX0 - FADD	9XX1 - FSUB	9XX2 - FMUL	9XX3 - FDIV
9XX4 - STK	9XX5 - RTN	9XX6 -	9XX7 -
9XX8 -	9XX9 -	9XXA -	9XXB -
9XXC - TADD	9XXD - TSUB	9XXE - 16 bit extends	9XXF - JUMPL#

Tasks

The processor has 16 tasks, each of which has its own PC, STATUS and STACK. The set of R bits in all the STACK register indicate which tasks are ready to run and always the highest numbered task from 8 to 15 which has its corresponding bit set is selected to run its next instruction. If none of these are ready then one between 0 and 7 is selected in a round robin fashion. So tasks 0 through 7 all have the same priority and are called the "software tasks" while tasks 8 through 15 have increasing priority and are the "hardware tasks" since they normally become ready due to external events or timeouts.Task switching is allowed after any instruction except for a cascade and doesn't take up any clock cycles.

BankA and BankB as well as the stack should be set up so that each task has a separate set of registers. For hardware tasks that don't need 15 different registers, some overlap can save physical registers. But since register 15 is actually written to and also used in cascades, this overlap is a little more tricky than might seem at first glance.

After reset only task 15 is ready. It has access to the register memory and all PC/STATUS/STACK groups as well.

Links to this Page

Plurion Architecture last edited on 19 October 2007 at 4:33:05 pm by chfwpr03.ch.intel.com
Other last edited on 1 April 2011 at 8:48:28 pm by 192.168.2.3
PICMode last edited on 28 December 2007 at 6:43:25 pm by 192.168.1.23