Background

When Smalltalk was chosen as the foundation for Merlin in 1984, its particular system of persistent storage was considered the major obstacle to making a commercial system. The combination of image/source/change files had a simplicity that made it well suited for the laboratory, but several flaws made it unreasonable to expect normal users to put up with it:

So the answer seemed to be to base Merlin on a distributed persistent object store.

Persistent Objects

File systems are second nature to computer experts and power users but are one of the largest obstacles for new users. To someone who is just learning about computers, it is not obvious that you have to load a file from disk to memory to work on it and save it back when you are done. In fact, the very ideas of memory and disk are often confused.

A virtual memory system can help the user interface present a collection of objects which can be directly manipulated by the user without the normal complications. You should be able to turn on the computer, change a text or drawing and turn it off again knowing that tomorrow it will be exactly as you left it, as normally happens in the real world.

The technology to make this work (in particular to make the object structure as fault tolerant as possible) is not very simple. And it can't always stay hidden from the user - if some objects must be moved to a floppy so they can be carried to another system, then obviously location transparency must go.

Programming

A major advantage of a persistent store is a huge simplification for the programmers. Objects can be directly created and used, rather than writing code to do so when the application starts or coming up with some weird file (the "poor man's persistent store"). Self's object model is very concrete and meshes very well with a transparent storage system.

With persistent objects, the programmer can make full use of all available tools to get the job done. He might make a picture with one or two bitEditors and then execute "Hero pict: Form fromUser" to make it a part of the application.

Implementation

Two different persistent storage systems have been designed for Merlin. They are very similar, but the first one uses "direct pointers" while the second depends on the indirection of an object table. Only this second one will be described here.

All information in the system is stored in a single, world-wide "object system". This is divided into a number of "regions" (or segments, or soups, or...), which introduces a two level addressing scheme. Objects in some region refer to each other by their local addresses, while objects in other regions are referred to by inter-region pointers (explained in more detail below).

Each region has a unique global ID. It is composed of the creator's public key plus a time stamp indicating when the region was created. The chances of two different regions having the same ID is essentially nil. Regions may be replicated to enhance performance and availability, which is why the ID gives no indication of where to find it. When Merlin runs on top of another OS, it stores regions as files with the .MRL extension in that system's native file format. These local files are scanned when Merlin starts up and a table associating global IDs to local file names is created (and the boot region is loaded in automatically). If a region whose ID is not found in this table needs to be loaded, then all Merlin systems in the local network are polled to see if they have a replica. If they don't, then a world-wide directory is consulted (much like the DNS system) and a replica is brought over the net. The sum of all regions in the world forms the equivalent to a single image shared by everyone, but since not all replicas might be equally up to date it will rarely (if ever) be totally consistent.

The creator of a region can issue "keys" to allow other users to have read and/or write access to the objects in that region. These keys identify the user, the region, the permissions, a time limit for which the key is valid and a hash value to prove that the key is authentic. Future implementations will use public key cryptography for this, but initially it will just be a one way hash of the key contents plus a secret number stored in the region itself.

The very first object in every region is the top level of a RegionObjectTable. For small regions, a single level table is enough. As stored on disk, objects point to others in the same region by including their object table index. Each object table entry also includes a generation number which is incremented whenever the entry is reused to detect obsolete pointers (which should never happen). To point to an object in another region, the local region must include a valid key to that region as a local object. The inter-region pointer has the index of this key plus the index of the object in the other's region RegionObjectTable. It is possible for the key on which a pointer depends to expire and render the pointer invalid. Before that happens, the system can request that the key be renewed. This request lists which objects in the remote region will be referred to using the new key. RegionObjectTable entries always keep track of the latest expiration date of any remote pointers to the object so it can perform a kind of very slow global garbage collection.

When an object in memory tries to send a message to an object in a "new" region, this region is either mapped into the local address space (with pointer swizzling) if no one in the local network has it opened with a key that has write permission. Otherwise the message is sent to that workstation via the network. Things are much more complicated when users in different networks update the same region at the same time. These changes propagate to all replicas of the region with time, eventually causing a conflict. Automatic conflict resolution is a topic for future research.

Regions are swizzled in one object at a time. First, the page table pages pointing to each object must be brought into memory (if they aren't there already) in a non movable position. For each page table page loaded, an additional area is allocated for containing the physical addresses in RAM (or a special marker for objects which haven't been loaded). Each pointer (intra or inter-region) is patched to point to the object's entry in this additional area, so access to loaded objects is always done via a simple indirection no matter how many levels the object table has. Every time access is attempted on an object that is not yet loaded, the object tables are followed to find the right pages on disk (or RAM, if that page was already loaded due to another object).

Given a memory (indirect) pointer to an object, its page table entry can easily be found. This allows us to find the object's region and disk address, which is needed during unswizzling. This process sometimes leads to the creation of new inter-region pointers (by the way, this creation of new inter-region pointers is the major unsolved problem in the "direct pointers design", but it doesn't cause any complications with object tables).

All changes to objects are saved in a "write log". That way, a crash right in the middle of a disk write is not catastrophic. This makes it possible, but not easy, to roll the system state back to the most recent consistent state before the crash. The write log plus the fact that the original region data remains intact makes it possible to recreate previous states in the current session, allowing a limited systemwide undo. When an user logs out, any regions that were changed are "commited" and garbage collected at the same time. If this local garbage collection frees and object that still has inter-region pointers to it, then the object is tagged for migration to the next region that "touches" it. This eventually collapses inter-region cycles, making the local garbage collector reclaim it when it becomes garbage.

RAM

All this "regions" stuff is for old objects. New objects exist only in memory - only by placing a reference to them in an already persistent object will they be moved to some region and get saved to disk. New objects are dealt with by an in-memory GC.

On each node in a network, there is one system execution context (called the "boot session") and one or more application execution contexts (called "user sessions"). All sessions on a node share a single 32 bit address space, but each one has different read/write permissions for segments or pages. Only the boot session can directly change the memory map and access the disk based regions. User sessions must request service from the boot session directly (by sending a message to a boot session object) or indirectly (by causing a page fault) in order to manage memory.


see also:
| persist | | reflect | | selfdiff | | selfuse | | gmodel | | gui |
back to:
| intro |
| merlin | | LSI | | USP |
please send comments to jecel@lsi.usp.br (Jecel Mattos de Assumpcao Jr)