Incremental Porting of the Self/R Virtual Machine ================================================= Jecel M. de Assumpcao Jr. Merlin Computers jecel@lsi.usp.brABSTRACTPorting system software to different platforms is hard work, requiring skillsand experience beyond what many programmers have. One of the main reasons forthis is that a large number of different things must all be working togetherbefore any results at all can be obtained. We will show how a combination ofa reflective architecture and a distributed object system makes porting thevirtual machine for the Self/R system an incremental process, making it possiblefor a large number of developers to attempt it.INTRODUCTIONWhile the traditional "Waterfall" development model is still useful in somedomains, with its style of having large chunks entirely written before theyare tested for the first time, the "rapid prototyping" model is becoming morepopular. The key is to obtain results as early as possible and use them asguides for course corrections - programming is a learning experience. A richand dynamic programming environment such as Smalltalk or Self makes this a muchmore practical alternative with their sophisticated browsers and debuggers.Such tools are generally unavailable when the programs being developed arecompilers or operating systems, however. and virtual machines combine thecomplexities of a language implementation and a platform, further compoundingthe problem.Squeak [Ing97] represented one solution for this problem. Its virtual machine(actually two different ones later on) was written in Squeak itself. Bootstrappingwas possible using another, closely related, Smalltalk implementation. Thedevelopment phase has access to all the rich tools of the Smalltalk world andwhen the system is satisfactorily debugged the virtual machine (VM) isautomatically translated to C and compiled to machine code. The only limitationis that the VM must only use a small subset of Smalltalk in order to keep theSmalltalk to C translator as simple as possible. When the machine code isexecuted for the first time, of course, all of the difficulties of dealing withlow level debugging tools crop up again, but the code should be so close tofinished that the pain won't be too great.The next sections explain why implementing a system in itself is so productive,then how the ideas of remote debugging can be adapted to a distributed objectsystem.SNARFING MAKES THE STEPS LESS STEEPPaul Wilson [Wil99] defines as "snarfing" the act of stealing features fromthe underlying system to implement another. For example, instead of a LISPwritten in C implementing recursion using arrays and explicit stack pointers,it could simply snarf the recursion already available in C. This makes portingharder since it creates dependencies between the implemented and implementingsystems. But it makes development much easier since the programmer can putoff thinking about a lot of details until later and still have working code.An example of just how far snarfing can go is tinySelf 1, an implementationof Self that runs on top of Self 4.0. It snarfed from the underlying Self thewhole range of primitive operations (except for '_Restart'), the parser,memory allocation and garbage collection. Even though it added to Self acomplete concurrent object model with three modes of parallel execution,the full interpreter is less than 300 lines long.One important point is that tinySelf 1 has no more access to the Self 4.0objects that implement it than Self 4.0 has to the C++ code of its virtualmachine (that is to say, they both have very limited access through mirrorand activation objects). In contrast, Self/R has full access to the set ofmeta-objects that implement it using normal message passing. In thisreflective architecture, the meta-objects are also written in Self/R andshare most of the resources of the base-level objects. This system reliesheavily on adaptive compilation [Ass94][Hšl94] to optimize away the impliedlevels of interpretation in order to achieve high performance. Since Self/R is written almost entirely in Self (with some short fragmentsin assembly language), it could run on top of itself. This is more complicatedto do than for Squeak or tinySelf 1 because this system does not have aninterpreter, only compilers, but it certainly is possible. In that case, itmight be interesting to set up a way for messages to flow between objects in theimplementing Self (we will call it the "mother VM") and the implemented Self(the "baby VM"). When an object in the baby VM talks to its meta-objects, itwill be reflection if those meta-objects reside in the baby VM itself butsimply snarfing if they are executed by the mother VM.MOVING OBJECTS OVER THE UMBILICAL CORDWe can imagine a system like the one just described, but where the baby VMis not running directly inside the mother VM but in an entirely separatecomputer instead. The communication channel between the two VMs (we can callit the "umbilical cord") is no longer purely software, but involves somekind of network or serial link. That has no impact in practice, as experiencewith remote debuggers shows.How complete does the baby VM have to be before we can start using it? Theanswer is that it only needs to include enough code to get the umbilical cordworking. Self/R has a distributed object system where objects running inseparate VMs can have references to each other, send messages to each other andeven migrate from one VM to another. The umbilical supports all thesefeatures in addition to a redirection service that allows the mother VM tocontrol resources in the baby VM's host.There are two dimensions to porting Self/R: the new machine can have a new CPUfor which the system has no compiler and/or it can have a different input/outputinterface (Self/R can run on top of other operating systems as well as barehardware, so this interface can be a software API as well) requiring new devicedrivers. A cpuDescription object is created for each CPU and includes informationthat the back end of the compilers need to generate machine code. AplatformDescription object encapsulates the drivers for each style of I/O. Aparticular cpuDescription object and platformDescription object are combinedin a host object that is adapted to each computer.The first step in porting Self/R is to create a new host object for the targetarchitecture (possibly creating a new cpuDescription or platformDescription inthe process). When this is done, the compiler has sufficient information togenerate an executable file for the baby VM side of the umbilical. This is"booted" on the target machine and connects to the mother VM. Initially, allmeta-objects run in the mother VM and the very first application level testobjects are loaded to the baby VM. Any attempt at any system level operation(allocating a new object, for example) is intercepted and sent over to themother VM for actual execution.If a new cpuDescription object was created then all of the early tests aremeant to validate the output from the compilers. Once that is done, all timecritical meta-objects are migrated over to the baby VM one at a time and tested.These are the memory allocation and garbage collection meta-objects, virtualmemory and process switching and, finally, the compilers themselves.In the early stages, no I/O is attempted on the baby VM except for that usedby the umbilical. That hardly matters since objects on the target computer caneasily communicate with displays, keyboards and so on via the mother VM. Afterthe low level stuff is running natively on the baby VM, however, then eachfunction in the platformDescription object is tested and eventually all ofthe local peripherals on the target machine are used. At this point an "image"can be saved directly from the baby VM and an attempt is made to restart itfrom this image. If that works out, the umbilical is cut (it is deselected fromthe preference menu) and the port is finished.Each meta-object represents a single aspect of the virtual machine implementation.As it is moved from the mother VM to the baby VM, this represents a discreetand isolated step in the porting process. The programmer doesn't have to worryabout objects that have been moved before (they have been fully tested andshould work fine) nor about those still to be moved (which have been workingup to now over the umbilical). Thus, the largest hurdle in porting VMs (theneed to make everything work at once before debugging can start) has been removed.CONCLUSIONIncremental porting is both possible and very desirable with a system thathas a reflective architecture and a distributed object system that includesboth remote invocation and object migration.REFERENCES[Ass94] Jecel Assumpcao Jr, "Adaptive Compilation in the Merlin System for Parallel Machines" in WHPC'94 (IEEE/USP International Workshop on High Performance Comptuting), March 1994, pages 155-166[Hšl94] Urs Hšlzle, "Adaptive optimization for Self: Reconciling High Performance with Exploratory Programming" PhD Thesis, Stanford University, 1994[Ing97] Dan Ingalls, Ted Kaehler, John Maloney, Scott Wallace and Alan Kay, "Back To The Future: The Story of Squeak, A Practical Smalltalk written in Itself" in OOPSLA'97[Wil99] Paul Wilson, "An Introduction to Scheme and Its Implementation" draft of the book is available online at ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_toc.html