A couple of months ago, at the Lightweight Languages Workshop 2002, Matthew Flat made a premise in his talk: Operating system and programming language are the same thing (at least “mathematically speaking”). I find this interesting and has a lot of truth in it. Both OS and PL are platforms on which other programs run. Both are virtualizing machines. Both make it easier for people to write applications (by providing API, abtractions, frameworks, etc.)
The difference between the two, Matthew continued, is that OS focuses more on non-interference–or isolation between OS processes. The main task of a multiuser OS is to let several users use the computer simultaneously. Thus, it is important that no user can take over the machine or use up its resources permanently. Also, no processes shall be able to terminate other processes, peek into their resources, or do any other things that violate privacy unless it is permitted by the OS security policy.
On the other hand, PL focuses on expressiveness and cooperation. PL provides high level constructs and facilities so
that one can write programs in less time and with less amount of effort. 10 lines of higher level PL code might be equivalent to 100 to 1000 lines of machine/lower level language code. Additionally, PL provides means for people to share reusable code through the concepts of modules, shared libraries, components, etc.
As time progresses, OS’es are becoming more like PL. And vice versa. OS now provides more and more ways for cooperation/sharing: IPC, threads, COM, etc. PL now provides ways to do isolation: sandboxing, processes, etc.
However, in all programming languages that I am currently using (Perl, Python, Ruby), none of them had been designed from the ground up to do isolation. Thus, none of the isolation mechanisms really work well.
This article will focus on above three languages. It would certainly be interesting to also discuss Scheme, Smalltalk, Java, and
Erlang–however since I’m not adequately familiar with any of them I’ll leave the readers to give feedback on these.
Why Isolation In PL?
As people construct more and more complex systems, the need for isolation becomes apparent. Complex systems usually untrusted user-level code that need to be restricted. Several examples follow.
- Database systems usually provide some sort of stored procedure. A remote client can connect to the database and
triggers stored procedure to be executed. It is important that if the stored procedure crashes or loops, other clients can continue to use
the database. - Business applications usually allow users to
specify business rules or constraints. Both are
basically some simplified high level code. Users might specify these
rules incorrectly and the application must ensure that those errors
have any unwanted impact. - Web application servers usually
allow pages/templates to contain code. Since generally the
interpreter itself (e.g. Perl or PHP) is exposed to do the execution
of the code, the application must somehow ensure that no templates can
crash the application. - Other applications might allow users to
specify regular expressions. Regular expressions is actually a
language, though a mini one. Overly complex regexes–either specified
accidentally or on purpose–can cause the regex engine to loop
endlessly doing backtracking.
So, in essence, complex
applications are usually a platform by itself, running
subprocesses/subprograms (in a single OS process). Thus, this requires
that the PL has isolation mechanisms beyond those provided by the OS:
like restricting a piece of code from accessing a certain part of the
filesystem, from using more than a specified amount of memory/CPU time,
from accessing certain functions/modules/variables. Unfortunately, most
PL don’t have enough of them.
Perl
The two main security models in Perl are tainting and safe
compartments. Tainting are mainly for tracing data, so I will not
discuss it here.
In Perl 5.6/5.8 there are about 400 bytecode-level instructions,
called opcodes. All Perl code will eventually be compiled to these
opcodes. print is actually a single opcode. So are
open, sysopen, mkdir, rmdir,
fork, gethostbyname, etc. To see the complete list of
Perl opcodes, see theOpcode
documentation.
Two things are apparent. One, Perl opcodes are higher level than
machine level instructions or even Java bytecode instructions. Two, Perl
is a monolithic beast. Many facilities (like directory manipulation and
even DNS-related stuffs) are built into the language. Perl5 is
monolithic because of historical reasons. Perl6 will also be
monolithic–so I heard–because of speed reasons.
Every single opcode can be enabled or disabled. This is done
in the compilation step. If there is a forbidden opcode encountered by
the compiler, the compiler will refuse it and compilation will fail.
This has the advantage of speed: the cleansed code will absolutely have
no run-time speed impact. The disadvantage: one must be careful to
compile code at run-time–otherwise untrusted code can be compiled with
dangerous opcodes in it.
The Safe.pm is a standard Perl module that allows a piece code
to be compiled with a specified opcode mask (a list of opcodes that are
to be forbidden). In addition to that, Safe.pm will do a “namespace
chroot”. It will make Safe::Root0 (or Safe::Root1
for the second compartment, and so on) as the code’s main::
namespace. This means that the code in the compartment cannot access
variables in the original main:: namespace, so global variables
like $/ is not shared with code outside the compartment (Some
variables like $_ or the _ filehandle is shared,
though).
That’s basically what Perl offers us for security. In practice,
Safe.pm is not practical. Choosing a reasonable set of “safe”
opcodes is not always straightforward. An opcode like open can
range from “rather safe” to “extremely dangerous”.
Perl’s open is so powerful and has many functions: it can open
a file for reading, for writing, it can execute programs, open a pipe,
duplicate a filehandle, etc. You can’t, for instance, make Perl allow
only read in open. Overriding open() doesn’t make it
safe, because the code in compartment can always refer to the builtin
version using CORE::open(). Moreover, Perl can be told to
read/write files without using any opcode at all (for example, using
$^I). Thus it is not possible to restrict an unstrusted Perl
code from accessing filesystem. To do this, one must resort to using OS
facility (like Unix’s chroot or BSD’s jail).
The show-stopper for Safe.pm: most modules don’t work under Safe.pm.
DBI, for example. Embperl 1.x uses Safe.pm but drops it in the 2.x
versions. Virtually no other web application servers uses Safe.pm these
days. Even Perl experts say that Safe.pm is too broken.
Conclusion: Perl has some sort of sandbox, but it works at the
compilation step only. It’s not very flexible and it’s not very useful.
Perl is also monolithic and many functions are built into the
interpreter. Thus, it is harder to isolate functionalities.
Python
The Python language design is very simple and clean. Amongst the
security models of the three languages, Python’s is the one I like the
most. Python security model is capability-based, meaning that: if you
don’t want a certain code to be able to do stuff, you don’t give a
reference to the module/function that provide that stuff. Python is also
much more modular: the core functionality is much less than that of
Perl. For example, OS specific services–like unlink or
rmdir–are located in the sys and os module.
This means we can more easily restrict access to those services by
depriving the code from importing the appropriate modules.
Here’s Python’s execution model: each code runs in a frame (“a
context”). In a frame, there are two namespaces: the local and the
global namespace. A namespace is a mapping between names and objects.
You get reference (=capability) to objects from a namespace. Every time
a variable/function/object/module name is mentioned, Python will look
for it in the namespaces. The local namespace will be searched first,
then the global. If the name is not found in either, Python will give a
NameError exception.
We can manipulate a namespace easily, since it is available as a
dictionary. We can even execute a code and give it our custom
dictionaries to be used as the code’s local and global namespaces. This
way, we can limit what objects are available to the code. That’s
basically how the security model works in Python.
Actually, there’s a third namespace that will be searched when a name
is not found in a local and global namespace: the builtin namespace. The
builtin namespace contains basic functions like open,
exit, execfile. Most of the Python’s builtin
capabilities are provided through this builtin namespace. The rest is
creatures like print or exec which are statements, not
functions/objects.
rexec is the standard Python module to do sandboxing. It
basically does what is explained above: run the sanboxed code with a
custom local and global namespace. Additionally, rexec creates a custom
builtin namespace and provides a safer substitutes for functions like
open or __import__. This way, we can tell rexec to
forbid the untrusted code from opening a file in write mode. Or from
importing dangerous modules.
rexec is pretty flexible and indeed has been used successfully in
several applications. Guido’s web browser Grail, for instance, allows
running Python applets. However, rexec seems to be not flexible or
fine-grained enough, because Zope chooses not to use rexec. Instead, it
uses its own home-growned module to do restricted execution.
There are several things that rexec can’t do. Resource limiting, for example. To do that you need to resort to the OS (like using Unix’s setrlimit). Also, since Python does not have private attributes, you can’t give an object to an untrusted code without the fear that the code
will use the Python reflection mechanism to “peek into the
guts” of your object (and from there gain references to other objects). There are two separate solutions to the last problem: the Bastion and mxProxy C extension modules, which essentially provide private attributes.
Conclusion: Python has a nice and simple security model. However, rexec cannot do all kinds of isolation that one might need, like resource limiting. Guido once also said that rexec is not tested enough and it might contain security holes.
Ruby
One of the main goals of Ruby seems to be “to replace
Perl”. In that respect, it has copied many Perl features. Tainting is one of them. In Perl there are two running modes: tainting mode on (-T, setuid) and off (no -T). Ruby extends this concept a bit by providing four different “safe levels” (indicated by the global variable $SAFE). The different safe levels is as follows.
Safe level 0 (default mode): no tainting is performed.
Safe level 1: tainted data cannot be used to do potentially dangerous.
Safe level 2: in addition to level 1 restriction, program files cannot be loaded from a globally writable locations (e.g. from /tmp).
Safe level 3: in addition to level 2 restriction, all newly created objects are considered tainted.
Safe level 4: in addition to level 3 restriction, the running program is effectively partitioned in two. Nontainted objects may not be modified. Typically, this will be used to create a sandbox: the program sets up an environment using a lower $SAFE
level, then resets $SAFE
to 4 to prevent subsequent changes to that environment.
It’s evident that, as with tainting, the safe levels are primarily concerned with data security and are not very sandbox-like (in the sense of “isolating subprocesses from another” sandbox). Matz confirmed this in the ruby-talk mailing list by saying that Ruby currently does not have any sandbox yet. Running a code in safe level 4
is usually too restrictive to be practical, plus it does not provide enough isolation.
The problem with isolation in Ruby is that all objects are accessible from any code through the ObjectSpace facility (including the code running in safe level 4). This is of course in direct conflict with the capability concept, in that you don’t give a reference/capability unless necessary. However, Ruby does protect an object’s attributes and has a #freeze method to make an object becomes read-only.
Conclusion: Ruby doesn’t have a sandbox (yet).
Other PL’s
Java has a sandbox security model and a bytecode verifier. Tcl basically has the same. Erlang is evolutionary more advanced in providing isolation, in that it has a notion of “PL-level processes” (a process is isolated in all ways from another).
Conclusion
As people construct more and more complex applications in PL, PL’s are required to have adequate security/isolation mechanisms. Current PL’s in mainstream usage do not have adequate security mechanisms, so
programmers are often forced to fall back to using facilities provided by the OS. This has drawbacks such as lack of portability and reduced efficiency. There will perhaps be new PL’s designed with isolation as
one of their main goals–or current PL’s might be
improved/redesigned–so hopefully this requirement of having a “multiuser PL” will be fulfilled in the future.
About the Author:
Steven is a software developer residing in Bandung, Indonesia.
Programming languages have already become OS more than 20 years ago… Remember the LISP Machines? Or Alan Kay’s DynaBook?
I feel the author should have investigated these languages, too (even tho I understand he focused on lightweight languages).
There is also a project, Croquet, that aims to create a 3D Operating Environment using Squeak Smalltalk on top of Windows/Linux/MacOS. You may find more info here: http://www.opencroquet.org/summary.html
… to see how languages are evolving and going to evolve in the near future. Also, like Valkadesh pointed out, the merge between the “OS” and the programming language has been done in the past.
I still doubt, and almost hope, that PL do not entirely become “OS” and neither does “OS”es become only a single available thru a single PL.
Up to now it is possible to contribute applications for almost any OSes using the language of your choice, be it assembly, C, C++, Cobol, Fortran, Pascal or the myriad of others. Maybe there is a way to make the concepts and generally known methods of programming evolve as to provide more isolation while still allowing the programmer to select the language used to develop.
Squeak/Smalltalk anyone? Damn creepy eyes….. I wish I could turn them off :-p
Guido is removing rexec from Python due to security problems. See this message:
http://mail.python.org/pipermail/python-announce-list/2003-January/…
Don’t forget FORTH.
Don’t bash what you don’t know. I am the first to admit that Squeak’s user interface is… primitive at best, but Squeak (and Smalltalk) remains one of the most productive environments I’ve ever used. Do yourself a favor, download Cincom VisualWorks (free for non-commercial), give it a try, and you’ll find that the whole VS.Net vs Eclipse/NetBeans thing is just a childish quarrel.
Well, ain’t FORTH a language for creating OSes? ;-P
This article provides a good summary of why PLs should act
more like OSes — thanks!
I think it covers only a small corner of the features a PL
should provide, though. In particular, does Perl, Python, or
Ruby offer a way to terminate a “process’? That crucial feature
is usually missing. (The Java specifiers famously revoked the
ability to kill threads in Java.)
The Lisp machine and Sqeak are about providing a higher-
level interface to application programmers, and I’m a fan of
both. However, I believe they provide only Python-like control
over “processes” through lexical scope — again, no termination.
The implementors of those systems had the right (IMHO)
bias toward cooperation, but didn’t provide convenient isolation
for those cases where it’s needed.
The article commendably mentions a priori resource limits as
a problem. I’ll point out that the cited LL2 talk doesn’t address
that problem, either, because it’s an area of ongoing research
in our world. (It’s easy to limit resource consumption in an system
biased toward isolation, not so easy in one that is biased toward
cooperation, at least not without reversing the bias. See KaffeOS
for a Java-based example of the former.)
BASIC was The WORD!
OS and Language, hehe.
a language that uses a VM could become an OS if you just make the OS the VM, but how is that diffrent than the old micro computers from the 70’s that ran on a basic ROM? same thing as if you ran a machine on a Java VM or a python Interpretor etc.
Some archives of broadcasts from last year’s workshop are available here:
http://technetcast.ddj.com/tnc_catalog.html?item_id=1295
<blockquote>Programming languages have already become OS more than 20 years ago… Remember the LISP Machines? Or Alan Kay’s DynaBook?
I feel the author should have investigated these languages, too (even tho I understand he focused on lightweight languages).</blockquote>
Thanks, Valkadesh. Perhaps the title should have been “PL Will Become Multiuser OS” or something like that. My focus is on security/isolation.
Hi Matthew, nice to see you here.
<blockquote>The Java specifiers famously revoked the ability to kill threads in Java.</blockquote>
Yes, I’ve heard. Anyone knows the story behind this?
I’d like to add that it seems Perl6 will replace Safe.pm with something ala VMS (with some emphasis on the ability to do resource limiting).
Note quite an OS, note quite a programming language (FRED), note quite an office suite … very powerful and easy to
use.
Runs on HP200LX or a P4 but you can’t order it on the Web!
Go check it out :
http://www.framework.com
I wouldn’t say Squeak’s interface is primitive per se, but in a lot of ways it is different that what people are used to. I’d agree that is visually primitive looking- it doesn’t have the polished widget and window decoration look like Mac OS X has, for example. However, the GUI system is quite advanced and lets you do a lot of things that you can’t do in any other OS that is as available to my knowledge.
But yeah, Squeak is an OS. Most people run it as an OS hosted on top of another OS, but it has the facilities of any OS. Hardware drivers were added as a part of the Squeak OS- drivers written entirely in Smalltalk- imagine the easy of debugging!
Yes, Perl or Python could be created into OSes, but they aren’t such as a natural choice as are Lisp, Smalltalk or Forth- designed to be in such a position, where Perl and Python (and many other “little” languages) were designed to be a part of a greater environment, Unix, Windows, etc. and as such may require a lot more work to be made into an OS. And when that work is done, it still may not feel like it was the best thing to do…
i’ve often thought that (similar to Smalltalk) objects should be fully protected processes. This doesnt make sense for small objects, so some would be inlined as threads and some even smaller ones (Integer, Array, etc…) would be inlined straight to object code.
instead of having objects and protection orthogonal (the current system) combine the two and you’ve got instant protection from most multi-threading bugs because you’ve defined one object to be one thread of execution. any cross-object calling would be automatically protected because the compiler could insert locks automatically.
what i meant was, in smalltalk when you want a process, you subclass Process. I dont think it is possible to create a process any other way. which is sort-of similar to what i was proposing…
I think the article is good in the sense it addresses the fact that most applications need OS kernel like functionality and most languages simply don’t offer an easy way to implement this; and unlike real OS kernels the enermous manpower to overcome these issues isn’t in place.
Some of the examples were simply false. Databases like Oracle most certainly do offer the the features he spoke off in terms of preventing loops and bad code from creating problemss. While safe.pm may not offer sandboxing YACC itself can run in any reasonable VM and thus so can perl; and message passing between Perl’s is not hard (they can pass arbitrarily complex datastructes using datadumper for example).
Has everyone forgotten about Oberon? Anyway, I don’t agree with much of anything that is said in the “read more…” section.
A programming language is just a language. An operating system manages a computer’s resources for programs and users. A programming library provides commonly used functions for programmers. A program performs a specific function for end-users.
And generally, it shouldn’t get more complicated than that.
I think, author mistake in requiring protective features.
For a long time Forth used not to protect himself but rather
to use safe code. For opensource OS, provided by Forth, such
“paranoya” is not a good thing. I think there are too many
changes since DOS’ time and there is no need to protect self
from virii. All that protection now seems an illness.
Why OS is to protect from safe code running? That only adds
useless code and complexity. Of course, where it is needed
(where unknown buggy code can appear), safe compilation and
execution can be provided by supporting VOCABULARY with safe
versions of words.
“Is it an OS, that is frightened by itself!” (about muKernels).
I think next PLs are good choice for single-PL (SPL) OS:
1) Forth; 2) Smalltalk; 3) LISP; 4) ML class p.l.; 5) other,
see requirements below.
The p.l. used in SPL OS should be highly extensible and easy
to control, it should be interpreted in any case (for being
useful to operator/user as a shell). It should be able to
control hardware. For the first time, that is all.
BASIC is not extensible, thus I don’t take it in account anymore.
I have not heard Oberon to be interpreted.
I’d say that any “algoloid” like C, Pascal, PL/1-PL/M is not
a choice for SPL OS.
You don’t subclass Process to create one. You instantiate Process.
In Smalltalk, the easiest way to make a process, that is, instaniate Process with some body of code is to send the #newProcess message to a block.
That is, if you want a thread to sit in the background and print “hello” pausing a second in between, you could say:
| aBlock p |
aBlock := [
[true] whileTrue: [
Transcript show: ‘hello’; cr.
(Delay forSeconds: 0.5) wait.
]
].
“To run the process as the foreground thread/process”
p := aBlock newProcess.
” … or … ”
“or run it in the background!”
p := aBlock fork.
“To start, pause or kill the process:”
p resume.
p suspend.
p terminate.
“lower the process’s priority.”
p priority: 1.
“<blockquote>The Java specifiers famously revoked the ability to kill threads in Java.</blockquote>
Yes, I’ve heard. Anyone knows the story behind this?”
I think it was because resources like semaphores, open files, sockets, etc. aren’t tracked by thread, and so when a thread is killed, those things aren’t freed at the same time. This can lead to resource leaks, and deadlocks (as locked semaphores may never be unlocked).
Operating systems spend a lot of time tracking program resources to ensure they are released when the program exits. Programming languages could potentially do that too, but nothing (OS or language) can keep track of things it doesn’t know about. Duplicating this in both the OS and language runtime can get a little heavy, and error prone if one doesn’t match the other.
Better might be to increast integration between languages and OS (e.g. sometimes Java threads are mapped to OS threads).
<blockquote><blockquote>The Java specifiers famously revoked the ability to kill threads in Java.</blockquote>
Yes, I’ve heard. Anyone knows the story behind this?
</blockquote>
Killing a thread could leave an object in a locked state with
no way to unlock it. For much code, this is nothing new, because
a program can always grab a lock and loop forever. But killing a
thread could also leave system objects (normally only manipulated
by “trusted” code) locked, and the system had no way to protect
against this problem. I believe that’s the reason that thread
termination was removed from Java.
The alternative is to add a kernel–user distintion to the run-time
system, where thread kills only happen while executing
user code. (Kernel code always returns “quickly” to avoid
delaying a kill). Adding a kernel–user distinction is one way that
Godmar Back’s KaffeOS improved Java, so that Godmar was able
to support thread kills.
Look at emacs, If thats not a OS I dont know what is.
Like many of you pointed out, several languages are already OS’s. And most OS’s are written in PL’s. Take both Windows and Linux being written in C++ for instance. I believe this entire situation is more of a marriage than an either/or situation. We can’t have one without the other.
I think the author should switch to a real programming language like Delphi, C++ or Java instead of focusing on lightweight languages Pearl, Python and Ruby and then he should rewrite the article in about a year. All 3 of these languages have excellent threading models (although Delphi’s encapsulates C++) and lend themselves to isolation and compartmentalization very well.
I live for PERL, but I’ve heard that .NET has a bunch of security stuff in it as far as what code is and is not allowed to do. If you check out http://www.gotdotnet.com/terrarium, you can see an example of this in the form oa game.
Terrarium is this screensaver/game where you write AI code to create plants or animals, and then compile it. Your compiled code is then sent via P2P networking to other machines. Part of this is to demonstrate that if you try to put anything ‘dangerous’ in your code, it won’t be accepted and can’t be run. That, and there’s code signing, something to do with public key encryption. I wish I could talk about it more intelligently – I guess I have some reading to do.
You end with a remark that Tcl has a sandbox. A strong one, btw. It also has a Virtual File System. The ability to run multiple threads in total isolation, with an event loop for things like file/socket I/O and user interfaces. One could argue that these qualify as coming closer to a full-blown OS than what you describe.
Your bias shows in stopping at Perl, Python, and Ruby.
I haven’t seen this mentioned yet, but java is getting exactly the features you describe:
http://jcp.org/en/jsr/detail?id=121
“…All conformant implementations must guarantee at least isolation of Java state (i.e. logically disjoint heap spaces, per application mutable static fields, etc). Additional forms of isolation possible include separation of JNI state and separation of process state…Research by Sun and IBM has demonstrated these additional forms of isolation and sharing…”