RNASmalltalk is about objects and messages, so a hardware implementation of it should be about such things as well.An early exploration of a messages all the way down was a paper design of a sequential implementation. The most intersting feature is that code is represented as a tree of objects (similar to AST - Abstract Syntax Tree) and not as bytecodes. This tree is called a "template" since the process of execution involves copying it. ![]() The left shows the object graph for a simple expression in Smalltalk for the 1984 design linked above. There is a lot of indirection and the low level action mostly happens in the form of #at: and #at:put: style array manipulation. The right of the figure shows the same expression written in the Self dialect of Smalltalk with the low level action expressed in the form of named getter/settter methods. Though the simplification seems dramatic (with the graph being far more readable and closer to the original source) the essence is the same and so the old design still offers insight into what is described on this page. Cells and SlotsEach cell stores three bits, which we will call M (message), N (name) and V (value). A match between M and N can be detected. Data can be transfered between M and N as well as between M and V. The value of M can be swapped with the M in one of the two neighboring cells. A version of the cell that moves M between neighbors would take up about 15 transistors in CMOS technology, while one that swaps M (see more below) would be little larger.Several cells, for example 32, together with some control logic and some extra status bits form a slot. Each slot is connected to two neighbors via the M swap circuit (it can also see some of the status bits in its neighbors), and a very large number of slots (at least in the hundreds of thousands or millions) form a single ring. Obviously this architecture couldn't be implemented before we could fit a hundred million or more transistors on a single chip. And it doesn't make sense to try to implement it using FPGAs since so few slots would fit even on a very large one. ObjectsObjects are stored in the N and V parts of a sequence of slots, with the N of the first one holding an unique ObjectID.
Looking at this fragment of the ring, we can see two objects in the N and V columns. Each starts with a header slot with its object ID in N and some flags in V (which aren't shown here). Going down we see a few more slots with the last one holding the special "delegate" marker in N and a object referece in V. Here this reference is shown as "traits array" (or "traits point") implying a Self-like organization, but that need not be the case. Also note that the array indexes start at 1 like in Smalltalk-80, but again things don't have to be this way. Below the point object we see two free slots. Supposing this is actually the start of a longer run of free slots, then we could allocate a new object there as long as that run is at least as long as the object. MessagesLooking at the M column we see two messages. Their header word is shown in red and their tail, indicating where to return the result, is shown in green. While the objects are thought of going from the top down, the messages go from the bottom up. The most common operation of the machine is to make the messages flow down the ring. This is done by swapping the contents of M in two neighbors, where the lower one is empty. So the header word of one of the messages would go down a slot and there would then be an empty slot between the header and the second word in the message. That is not a problem, for messages don't have to take up strictly neighboring slots like objects do. The next step would be for the second word in the message to move down and leave an empty slot between it and the third word. Eventually the whole message will be one word lower than initially and the process can start over. In practice, the message tends to spread out so that there is an empty slot between each word in the message and that allows it to flow at the fastest rate.The reason why message flow has been described in terms of swapping M and not just overwriting values (which should be simple since one is always empty) is that swapping is a reversible operation and can be designed to reuse electrical charge as much as possible, greatly reducing the power required just to move messages around. Message flow is not a productive operation so any power used for it is just wasted. Matching headersWhen the top message shown moves down three slots, we get a very interesting situation: both the M and N of the header slot of the point object will have the exact same value. This is the key operation in the machine. In this particular case the message will be slightly changed. The second word will become the new header and the original header will be moved back. Now when the changed message goes down two more slots, we will once again have a match since both M and N will have the bits of the object ID of the 'y' string object (or symbol). The message will be changed once more: the original tail of the message will beccome the new header and that will be followed by the V in the matching slot (a reference to the integer 4 object). This is a reply message.When a template tree is copied, the leaves become messages and start to flow around the ring as described above. Eventually they come back transformed into a reply message and the value they carry is stored into the node above the original message. When all subnodes return, then the intermediate node also becomes a message and starts to go around the ring. This is very similar to second generation dataflow architectures. In the example above, the message found a regular object in the V of the slot it was looking for. If it had been a reference to a template tree instead (code) then that would have been copied as described in the previous paragraph. The example had a message selector that matched one of the slots in the receiving object, but if that had not been the case then after the first transformation the message would reach the last slot of the object without any matches. The delegation slot is special and would have transformed the message by adding a new header with the reference in its V. This would have caused the message to seach for the object "traits point" and upon reaching its header it would once again start searching for its selector there. For multiple inheritance the scheme is only slightly more complicated than the one described here. PrimitivesThe circuits described so far do handle all of the message passing features of Smalltalk, but they don't do any actual work. That is done by replacing fragments of the ring by hardwired objects. When a message they understand goes through one of them, the replace it with a reply including the result. So there could be a number of integer adder/subtractors along the ring, for example. Or floating point units. Or any other low level and time critical operation.These units look like regular objects to the rest of the system and normally respond to messages with either a specific object ID (like ALU87) or a generic one (like ALUxx). Normally the generic one is used since you don't care which of the several identical circuits does the operation, but in cases where some state is stored in the unit you might need to send a set of messages to a single one. LayoutA giant ring is not a very efficient way to use silicon, so it is best to fold it into a 2D space filling curve, like the Hilbert Curve. Simpler alternatives, like two or three levels of zig-zags, can also be used.Hyper networkThe whole idea of this architecture is that while a single message send might take a very long time (since it is implemented as a linear search over the whole memory!) we can have such a large number of messages at the same time and the overall throughput of the machine will be better than any Von Neuman alternative. This does not mean that message sending shouldn't be made as efficient as possible. While in theory the average message lookup will be half of the number of slots in the hardware, in practice messages to objects just before the one that generated it will be common making the average far worse.Very early in the development of massively parallel machine it became obvious that the number of dimensions in the hardware should be greater or equal to the number of dimensions in the problem being solved. This led to the popularity of Hypercube designs. Later it was found that by making "smart networks" that didn't use cycles from the processor in each node to do the routing the effect of higher dimensions could be obtained with far less wiring, so 2D and 3D torus networks became dominant. Each slot in an object can be thought of as a dimension, so it is obvious that the one dimensional ring architecture isn't the best match for a typical object graph. An interesting solution is to complement the ring with some kind of hypercube by inserting routing circuits along the ring (every 64 or 128 slots, for example). So messages flowing along the ring can wormhole through this Hyper Network and pop back into the ring very close to their destination in a very small fraction of the time it would normally take them. Given the unidirectional design of the ring, the saving is specially great when the message needs to go to an object right before where it currently is. SoftwareWhile the idea is that existing Smalltalk code should run just fine on this new architecture, the programming style will have a very different impact on performance than on a current PC. Even simple expressions can have a suprisingly large amount of parallelism implicit, as the tinySelf 1 study showed. But a program with low level loops (C/Fortran style) will probably be far less parallel than one using higher level constructs like #select: and #collect: (APL style, which is particularly well developed in the FScript dialect of Smalltalk) and so wouldn't perform as well. On a conventional machine the reverse is true - since the higher level constructs are implemented in terms of the lower level ones, using the latter might give you better control of the processor and so better performance. |