Merlin's Technologies

Although the Merlin Project is about people, it is only possible thanks to the development of a series of new technologies. This page lists the technologies used, or being developed for this project. The first thing people normally think when looking through this list is "this is the most ambitious computer project in the world! It is totally impossible." Certainly a lot more resources than currently available will be needed to transform all this into reality. But that is the idea - the items near the end of the list will not be present in the first commercial releases of Merlin, but will have to wait until the right people can be hired to work on them. The important thing is that their place is guaranteed from the start - Merlin won't have to be stretched and patched to make them fit later.

"Old" technologies are also used, of course, for no operating system can do anything without drivers, network protocol stacks (with standards like TCP/IP, HTTP....), boot managers, file format conversions and so on. We will just take them for granted and concentrate on what Merlin brings to the party.

Object Oriented Model

From the very start, Merlin was designed as an object oriented operating system. Treating every software component as an object was the only way to create a system capable of evolving in response to the user's needs. The language used initially, Smalltalk-80, was already well developed back in 1984. The Self programming language, developed at Stanford University and Sun Microsystems, is closely related to Smalltalk but is even simpler. Merlin's implementation of Self, however, differs in a few aspects from the Stanford/Sun Self.

There are many different programming languages, and each has its range of applications for which it is best suited. Some examples are:

programming applications (Basic, Pascal, Clipper)
system programming (C, Assembly)
batch programming (Shell, Rexx)
application automation (macro languages)
text composition (TeX)
graphical composition (Postscript)
3D object textures (Renderman Shaders)
"smart" networking (TeleScript)
hypertext navigation (HyperTalk)
electrical circuit topology (EDIF)

While the Self language is less adapted than the ones just mentioned for some of these applications, it is an excellent "average" language. It is used exclusively in the Merlin project to break down the traditional barriers between novice users, power users, application programmers, system programmers and "specialized" users. The use of a common language also opens up the possibility of combined solutions using what are normally incompatible parts.

The uniform use of such a flexible system and model as Self for everything, from device drivers to user interface issues, is the best way to obtain a transparent system - one where the user's model of the system and the implementation are closely matched. Nearly all OSes "fool" the users with complex translation layers that tend to break down, leaving people stranded. Self's simple object model can be directly understood even by novice users. Learning more is a matter of "progressive disclosure", not revelation. It is not a good idea to design a system by asking people not to pay any attention the old file-system behind the icons!

Parallel Processing

One obvious difference between objects in a computer and objects in the real world is that real objects all work at the same time. When programming in a traditional object oriented language, it is hard to ignore that the single CPU is the only real source of activity and at most one object can do anything at a time. On the other hand, we can use things like preemptive multitasking to create the illusion of more CPUs at the cost of introducing great complexity in the form of locks, semaphores and other kinds of concurrency controls.

In the real world, the ideas of object and activity are closely linked: if I have one microwave oven, I can cook one thing at a time, but with two ovens I can cook two things. Concurrent objects, each one with a single virtual CPU, can bring this simple model to the world of software. By linking such objects with messages using the "wait by necessity" model applications can take advantage of parallelism even if they were written using the sequential execution model.

One key benefit of transparent parallelism is the ability to automatically exploit additional processing power in new multiprocessor computers, or even use idle nodes on a network to help handle heavy duty information processing (making a network operate as a virtual mainframe).

Reflective Architecture

The operating system, up to 1992, was based on the micro-kernel structure that has become so popular recently. In spite of experience with this technology dating back to 1983, it was obvious that this was not very well matched to dealing with objects. The Apertos OS, developed at Sony, showed an alternative: a reflective structure based on object/meta-object separation.

Most modern OSes frustrate users by making them reboot (or worse: reinstall!) for changes in configuration to take effect. That is mainly due to the lack of a reflective model - the system doesn't know much about itself, only the boot or installation program does. Merlin's goal is to be able to undergo a major upgrade while running continuously. This is very important for servers, but also allows cheap Network Computers to boot an older version from its ROM and then fetch updates from the network while the user is getting things done.

This reflective structure allows a separation of concerns. Code for an application can be entirely written at a level with location transparency, for example, and later a separate meta-code used to tweak object migration for better performance. Program logic and implementation details should not be mixed or the system becomes hard to evolve and adapt.

Distributed Shared Virtual Memory

File systems and databases seem strange and scary to novice computer users. Merlin replaces these concepts with persistent objects using virtual memory. Simply put: Merlin's objects live on disk and exist always, even when the computer is turned off. During normal operation they are brought into the machine's memory and sent back to disk without the user noticing it. This simplifies both using and programming the computer tremendously (most of the code and time in traditional systems is spent in loading and saving data to files).

This virtual memory is distributed throughout all of a network's disks. All of a network's resources can be equally shared by all of the users, with the needed protection included in Merlin's multiuser facilities. In fact, the Internet allows all Merlin computers to share (in theory) a single global memory. To cope with limitations in communications, however, the system must replicate most objects locally. This introduces many problems in keeping the replicas consistent with each other. These are solved at the system level whenever possible, but sometimes user intervention is required.

Powerful, but Simple to Use User Interface

Oz, the 3D Graphical User Interface, is built on top of a 3D object oriented graphics model. The user sees an "Artificial Reality" of interconnected 3D "rooms" where Self objects can be placed. All objects have one or more representations which can be directly viewed and manipulated. The rooms, and even the user, are objects that can be moved or cloned (copied) as any other object.

Several users can view the same room and collaborate with each other in various tasks. On the other hand, a single user's view can span several computer screens (on one or more machines) for applications where one monitor isn't enough (even though the 3D views make a much better use of screen space than traditional windowing systems).

Slices

Selections are a common way of indicating objects to be affected by commands in a graphical user interface. Slices are a generalization of this idea to simplify the implementation and enhance the usefulness of GUIs.

In an object oriented graphics framework, on-screen objects are normally the result of a complex structure of basic shapes combined with "wrappers" (modifier objects). If the structure can be an arbitrary graph (or limited to non cyclic ones), then a single object might control the appearance of several distinct on-screen objects. Sharing a color wrapper among three objects, for example, would insure that every time the color of one is changed the others automatically change to match. In a drawing showing several PCs connected in a network, as another example, a change to one of the PCs might make it different from the others or the change can affect all sub-drawings in one swoop. In the first case the PCs would be several copies of an original drawing (three distinct nodes in the object graph) while in the second we would have a single drawing being shown on the screen with several distinct translation wrappers.

This is roughly like the difference between clones and traits or classes and instances. In the object-based spirit of Self, the Morphic graphics framework doesn't work like this, but makes each object responsible for all information related to it (color, position, etc). This makes it impossible to do the kinds of operations described above. It also makes it hard to extend the functionality of the framework - rather than just adding a new rotation wrapper we would have to include a rotation copy-down slot in the basic morph and (possibly) have to rewrite many methods. On the other hand, morphic is more concrete and easier to understand than most other frameworks.

One problem with traditional frameworks is that the kinds of operations that can be done on an object are defined (and limited) by its invisible structure. In the PCs example, we can either change a single part of the drawing or all instances at once, and it is not always clear by just looking at the image which one is the case. If we need to do something different from what the current structure allows, we first have to change that structure and then do the operation. Imagine that we wish to modify in the same way two instances of the drawing but leave the rest of them alone. So expected operations were made easier at the cost of making unexpected operations much harder. And the most common expected operations are handled reasonably well by morphic's embedding structure. So we need a way to make unexpected operations as easy as possible.

The Sinclair ZX81 Basic had a very interesting feature called "slices" (APL had a much more powerful implementation of the idea). It allowed you to operate on a whole subset of an array in a single operation. You could clear the middle three elements in a short vector by executing A(3 to 5) = 0. The idea of working with a subset of a collection of objects is a common one in graphical user interfaces. You might select two paragraphs in a long text to change their font to a larger size or you might create a carpet morph in Kansas to move several objects at once or to dismiss them. Rather than having these kind of operations be ad hoc additions to the GUI, it is better to have them be instances of a more general "slices" concept.

A slice is simply a collection whose elements are a subset of the elements of another collection. So saying "slice do: [something]" is simply a convenient shorthand for doing something to some elements in the original collection but not to the others. The slice must remember what collection its elements are from in case the something you wish to do is removing them from that original collection. Actually, the elements in a slice might come from several distinct collections.

Graphically, a slice is a kind of "lightweight" object. It must be very easy to create or destroy one. You might create one by just dragging the mouse across some text and destroy it by simply creating another one. You might create a slice as a result of an operation (allImplementers, for example) and it might persist until explicitly dismissed. When you pop up a menu on a slice, it creates a menu which is the intersection of the menus of all of its elements. So you can choose an operation that can be done on all the elements (make a slice that includes a drawing and some text, for example, and the menu might limit you to changing their color).

It is interesting to note that the combination slices/persistent objects makes a very nice poor man's database. It has the functionality of the "find" command that all modern OSes have copied from the Mac. You can use slices to choose if a change to a slot should affect only a given object or all of its clone family (this is even more flexible than Kevo or the debugger/outliner distinction in Self 4).

The ability to create and manipulate lightweight slices of objects allow one time structuring of objects and is a convenient replacement for the more static structuring of conventional graphics frameworks. It also has many non graphical applications, being an interesting addition to Self's collection "types".

Adaptive Compilation

An important aspect of the Self system is the use of "multiple customized compilation" to deliver the highest performance of any virtual machine implementation. This virtual machine approach to portable operating systems means that users will have binary compatibility no matter what processor their machine uses. Unlike source compatible portable systems (like Unix, Windows NT or even the new Rhapsody and BeOS) users will not be dependent on software houses' willingness and ability to offer "upgrades" to the newest platforms. This idea is hardly new, but was unknown to most people until Sun released its Java platform.

Virtual machines have always been known for their weak performance, but that is because they were implemented as simple interpreters. Just In Time (JIT) compilation can improve things, but for dynamic object oriented programs adaptive compilation is the key to speed. Developed for Self, this technology is being adapted for Java where it is called "HotSpot". Only code that is actually called is compiled to native machine code and it is instrumented to gather information about execution dynamics. When such code is called frequently (when it is determined to be a "hotspot" in the system) it is recompiled with a second, more sophisticated compiler which uses the accumulated runtime information to generate highly optimized code. The result is even higher performance than what would be possible with traditional static compilation. The only drawback is that the compilers become part of the runtime "image", which increases memory requirements.

The code generated by optimizing compilers has typically been unable to make use of complex CPU features. Mutation scheduling, as developed in the EVE compiler, keeps track of instruction alternatives during the whole code generation phase and is able to move code non-incrementally to explore non obvious options. This is what is required for complex super-scalar or VLIW CPUs and has proved successful with such exotic features as the pipelined instructions in the Intel i860 which were previously restricted to hand assembled code.

One interesting feature of Self's object model is "data parents". An object can inherit from (or delegate to) a set of arbitrary objects, indicated via its parent slots. These parent slots are normally constant, but they need not be. If a data parent is assignable, then which objects a given object inherits from can change at runtime. This opens up some interesting possibilities (such as "viewpoint programming") but it has never been extensively explored due to the poor implementation of this feature up to now. So Merlin introduces "map switching" to allow data parents to become practical. Though objects are presented to programmers as totally independent from each other, the implementation actually groups them together in "clone families" (represented by map objects, which serve as concrete types as well as a space saving trick). Map switching extends this system by associating objects with data parents with a linked list of up to five maps (or concrete types). For each of these maps, all Self optimizations such as inlining and customization can be applied. A write to parent slot must be intercepted and may switch the object's current map if the new parent's map is different than the previous one's. All compiled methods in children of objects with data parents must check that object's parent each time since it isn't notified of a change in its "data grandparent".

Cache Manager

One feature that distinguishes programs by experts from those of novices is the use of caching as a performance enhancement. Saving results for later reuse greatly decreases source code readability, unfortunately, obscuring program logic and making debugging much harder. Reflection allows us to move caching to a separate implementation layer in a cache manager object. So the application can be written and debugged "naively" and, after it works, can be annotated to use the cache manager at critical points to significantly improve performance without having to write a new version.

This is only possible because Merlin uses message passing "at the bottom" and includes a reflective access to such a fundamental operation. In other words, user applications are never made up of mainly "big black boxes" which the OS can do nothing about. Even simple math expressions as '(x * 2 ) + y' are entirely built from messages that are (in theory - the compiler actually eliminates most of the overhead) handled by a set of meta-objects. So all that the system has to do when the user annotates an expression as cacheable is to replace the standard send meta-object with one that looks up its arguments in a table (cache) and returns a previously calculated answer if it is found there. Otherwise it works exactly like a normal send meta-object.

An example of how this works is in rendering text. A given font's glyphs might be given as sets of control points for Bezier curves describing their outlines plus some "hints" for adjusting these points when scaling. We could then draw a character from a font called myFont on aCanvas with the expression:

   aCanvas drawPixmap: ((myFont glyphFor: 'a' Size: 12)

                                asVectors asPixmap)

This should work perfectly, but will be unacceptably slow. For each time some character must be shown in the display its points must be scaled by the 'glyphFor:Size:' method, then the control points must be rendered as short vectors approximating the indicated Bezier curves ('asVectors') and finally these vectors must be used to fill in a picture ('asPixmap') which can finally simply be blasted on the screen for the final result. By marking each of these messages as cacheable, the next time 'glyphFor:Size:' is sent to myFont with exactly 'a' and 12 as arguments it will return the same list of control points without executing the method again. Sending a cacheable 'asVectors' message to the same list of point as before will fetch the same list of vectors as was created the first time, and sending 'asPixmap' to that results in the same character image without actually invoking the filling method once more. So we have replaced three very complex and slow calculations with three simple table lookups. If you think that even that is too much, you are right. The cached control point lists and short vector lists are not really needed. Unfortunately, the cache manager can do nothing about that, but if the user moves multiply cached expression to their own methods like this:

  pixmapForFont: f Char: c Size: s =  ( (f glyphFor: c Size: s)

                                          asVectors asPixmap).

   ....

   aCanvas drawPixmap: (pixmapForFont: myFont Char: 'a' Size: 12)

Now we can make only the 'pixmapForFont:Char:Size:' method cacheable if we want. This will save the final pixmaps without also storing the intermediate (and useless to us) results. This did involve rewriting application code, but actually made it a little more readable, unlike when caching is "hand coded".

Abstract Types: Interfaces or Protocols

Self has no abstract types and hides concrete types from the user. The idea of "protocols" as abstract types has been used in several languages (Objective C and Java). This idea could be even better in Self and help to weaken the coupling between different parts of an application as well as the application and the objects in frameworks or the basic system. As programmers advance separate parts of the system independently things will start to break unless some formal agreement is made to keep interfaces constant.

It would be easy to add protocol objects to reify this idea without any changes at all to the virtual machine. A protocol object could have, for example,

a name: this would allow people to refer to it indirectly using a string. Part of the name would indicate a version number. By convention a protocol version N must be 100% compatible with version N-1, which can be enforced by always making the newer protocol be an extension of the older one.
extension list: this protocol can be used anywhere that calls for any of the protocols in this list. This forms a kind of inheritance structure.
a message list: a dictionary associating message selectors with argument arrays. The elements of an argument array indicate which protocols each argument must implement (or nil, if not yet defined). The zeroth argument protocol is actually the return value.
an implementors list: a dictionary associating prototype objects with attribute dictionaries. Each protocol can define a different set of attributes. For example, a list protocol might have 'fastInsertHead', 'fastInsertMiddle', 'fastRemove', and so on as attributes (associated with true or false for each implementor, though other attributes might have numerical or other values).

A protocol object would implement at least these methods:

'verify:' this takes an object and returns true if it implements this protocol, false otherwise. If it returns true, the object is added to the implementors list (if it wasn't already there) and its attributes are set to the appropriate values

'fetchNew' this chooses one of the implementors at random and asks for a clone of it, which is returned.

'fetchExisting' like the previous method but causes the implementor to return itself.

'fetchNewChoosing:' the arguments are a set of constraints on the attributes which select which implementor should be used to create a new instance.

So where we now write

x: orderedCollection copyRemoveAll.

we could say

x: (protocol named: 'list') fetchNewChoosing: (fastInsertMiddle & true).

Of course, in this case we might not get an orderedCollection since there are likely to be other objects which implement the protocol we need and are better at inserting elements.

It is important to note that this is not at all like the static type declarations in other languages. A new application can be developed without taking protocols into consideration. During the exploratory phase of programming it will probably change considerably, so a protocol framework would only get in the way. As the application matures and the interactions between objects becomes more or less "frozen", protocols can be created and used in strategic places in the code to decouple the various "modules" and even to verify correctness before shipping it to customers. This certainly seems like the best of both worlds.

Agent Platform and Programming by Example

Programming in Self isn't for everyone. Yet the computer's power is severely restricted if a user is limited to using "tools" created by others. Everyone's needs are unique, so unique programs must somehow be created. A key part of Merlin's event system is called the "Agent Platform". It supports the creation and use of special packages, called "agents", that are composed of:

the learning engine: there can be many different ways to program an agent. The user might type in text in some simple scripting language. Or the user may simple ask the agent to watch while he/she performs some task. This is called programming by example, and the user may interactively test and refine the program created by the agent. An agent might have distinct learning/operating modes while others might be continuously updating their programming.
the rules engine: the program generated by the learning engine is expressed as a set of rules, or "triggers". These rules mostly deal with detecting certain messages flowing between objects, so the rules engine uses Merlin's reflective capabilities to do its job by inserting "lightweight probes" in key points of compiled code.
the execution engine: once a certain rule is triggered, the corresponding action is initiated by the execution engine. This is done by inserting messages in the user interface framework, which makes agents look like "virtual users".

While each agent can have its own learning engine, they all share a single system-wide rules+execution engine, also known as the agent platform. The illusion of agents as "little people inside the machine" to both system software and the real users has the advantage of leveraging people's social skills, and makes for a more friendly user interface than the current tool based ones.

Augmented Reality

Many people are now familiar with Virtual Reality, where a person goes "inside" a world created by the computer. This has a lot of interesting applications, but to have the computer add virtual objects to the existing world (which we will call Augmented Reality) is even more practical.

The 3D object depicted by Merlin's GUI can fit right in the real world. A camera allows the computer to "see" the external environment from the user's viewpoint (it must be an extremely lightweight camera attatched to some kind of headphones that the user wears). This allows the machine to detect the user's rough position and orientation within the environment (without costly magnetic tracking devices) in order to adjust the computer generated objects so they appear to "hover" in a single place in space. The computer also carefully tracks the user's hand(s) and recognizes gestures (specially pointing to an object) as a natural replacement for the mouse. There should be no need to wear combersome "cyber gloves" or something similar.

The ability to gesture to the computer to scan in any image you can see will greatly expand the number of applications a computer can be used for. You will be able to "save" the faces and names of people you meet for future reference. Not only will you have a convenient virtual shopping list when going to the supermarket, but simply pointing to a product's price can give you a running total of what you have bought so far and a calculation of price-per-weight relative to the other alternatives around you. The computer can act as an electronic magnifying lens to allow you to read those tiny letters at the bottom of a contract.

Voice Input

A keyboard is not always an option for entering text into a computer. The "no hands" applications described above make voice input for text (and in some cases where your hands are busy with other things, command entry) a requirement. Even so, it is not always a practical solution. It wouldn't do to have a whole class full of students dictating to their computers all at the same time! But when it is convenient, voice should be an integral part of the system and work in all applications instead of being limited to some special or patched programs.

Machine Translation/Universal Text

Only a fraction of the world's population speaks English. The situation is much better among computer programmers, but some still have trouble with languages like Basic or C because of their English based nature. This is nothing, however, compared to languages with "large vocabularies" like Smalltalk, Forth and Self. A version of Self that can be used by people who don't know English is needed.

Most programming languages have very limited vocabularies, so the fact that they are based on English isn't much of a problem. In 1984 (when I first translated Smalltalk-80 into Portuguese) Forth and Smalltalk were the two exceptions. Since then most other languages have acquired huge libraries and APIs, which mostly levels things.

Translating Smalltalk proved to be quite a challenge. The word "self" was the toughest of all. I settled on "auto", which was really awkward. Another guy did a Smalltalk-like language in Portuguese and used three separate words for "self"!

English is gender neutral, so you can say "tree new" and "car new" without any guilt. But I couldn't have "arvore nova" and "carro novo", so I substituted adjective method names with verb-base ones:

arvore crie. "crie == create, make or raise" carro crie.

That led me to another problem - some verbs and adjectives are ambiguous in English. "boat copy" is a command to copy the boat or an adjective describing a copy of a boat? In Portuguese I had to choose between a verb ("copie") and an adjective ("copia"), neither of which worked in all contexts.

Even class names gave me headaches. We have OrderedCollection and SortedCollection but it is practically impossible to find two separate words to express these ideas in Portuguese. String and Array don't translate well either (so I used Vector for the latter).

Once I had finished all this work, there was still a long way to go. You couldn't fileIn/fileOut stuff between a translated version of Smalltalk and an original one, and that was not acceptable.

So I came up with the idea of MultiSymbols. They are easy to implement - several different strings return the exact same symbol when sent #asSymbol, while sending #printString to a multiSymbol will return different strings depending on the value of the global variable PreferedLanguage. So you could fileIn some code written in another natural language and read it in your own.

This was an easy fix for the built-in classes and methods, but what about stuff programmers wrote on top of that? It wouldn't be reasonable to expect a programmer in Russia to think of names for his classes and methods in English, French, German... I would like an automatic solution for this (even though doing it manually proved to be so hard for me), specially if it would handle other strings in the system as well.

Machine translation systems which use an intermediate notation could be described as:

source ---> interlanguage ---> target

The first step is by far the hardest: you have to eliminate a lot of ambiguity from the source. If the interlanguage is rich enough, generating target language texts is easy (as long as you don't expect Nobel winning prose...). So my idea is to let the programmer do the first step (mostly) manually and distribute texts and symbols always in the interlanguage. The users would only have to have code and dictionaries to generate their chosen language from this, so it wouldn't be too much of an overhead. The programmer could check the quality of the first step as even texts in his native language would be generated from that information allowing him to catch at least some "bugs".

One day things will advance enough so that the first step could be done by machine as well, but this proposal is doable right now.

see also:
| new3 |

| faq |

| tutorial | | tiny |

back to:

please send comments to jecel@lsi.usp.br (Jecel Mattos de Assumpcao Jr)