Xv6: A Simple Unix-like Teaching Operating System

Submitted by edwin 2011-11-09 Unix 39 Comments

Way back in 2002, MIT decided it needed to start teaching a course in operating system engineering. As part of this course, students would write an exokernel on x86, using Sixth Edition Unix (V6) and John Lions’ commentary as course material. This, however, posed problems.

The biggest problem is that while the course focuses on teaching writing an exokernel for the x86 architecture, Sixth Edition Unix and Lions’ commentary focusses on the PDP-11 – a completely different kind of system. Students complained about these differences, but also the lack of relevance of learning how to code for the PDP-11. On top of that, Sixth Edition Unix is written in a dead dialect of C (pre-K&R C).

So, MIT took a drastic approach: they decided to rewrite Sixth Version Unix for x86 in ANSI C. Along the way, they improved it as well by adding SMP support. “Xv6’s use of the x86 makes it more relevant to students’ experience than V6 was and unifies the course around a single architecture,” the project page details, “Adding multiprocessor support requires handling concurrency head on with locks and threads (instead of using special-case solutions for uniprocessors such as enabling/disabling interrupts) and helps relevance. Finally, writing a new system allowed us to write cleaner versions of the rougher parts of V6, like the scheduler and file system.”

While this project was released in 2006, I had never heard of it (until yesterday), and I’m pretty sure many of you haven’t either. John Lions’ commentary and V6 are of course incredibly famous, but I’m not sure if the same applies to Xv6.

In any case, the code is out there for all to see and use (git clone git://pdos.csail.mit.edu/xv6/xv6.git), under a MIT license. It typically doesn’t run on real hardware; in fact, MIT runs it in QEMU.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

39 Comments

2011-11-09 10:34 pm
cb88
I’ve ran it in qemu before its fairly neat. Make sure you get the up to date sources when I first tried it out I got an old tarball and it had some bugs which I learned how to fix but at the same time they were already fixed upstream.
2011-11-09 11:34 pm
neozeed
I slapped together an elf cross compiler, and set it up so that you can cross compile the kernel from windows with a simple ‘build’ command…
http://vpsland.superglobalmegacorp.com/install/xv6.7z
I’ve seen patches that include VM & a basic TCP/IP stack.. I guess all that is missing is a functional libc, and shared memory, shared libraries….
It is amazing how quickly it compiles! … with MinGW or Cygwin dd will work correctly and you can build the whole thing much easier.

2011-11-10 10:35 pm
bogomipz
I guess all that is missing is a functional libc, and shared memory, shared libraries…
What makes you so sure shared libraries is important or even desirable?
http://9fans.net/archive/2008/11/142

2011-11-11 4:04 am
Alfman verbose=1
bogomipz,
Regarding shared libraries, I’ve often wondered this myself. Shared libraries are the cause of all dependency problems, but is there really that much of a net benefit?
I think maybe when RAM was extremely tight, the answer may have been yes. But these days, they may be of less value, we really ought to test our hypothesis.
Consider, that shared libraries can’t be optimized over API calls. It might take 30 bytes to call a libc function, which shifts the bytes around again to do a syscall. In a static application, it could theoretically optimize away all the glue code to do a syscall directly while saving space.
Obviously we have to look at bigger libraries too, like libjpg, but even there I wonder how much space would be wasted if it were statically compiled.
This isn’t to preclude to the use of shared libraries for applications which are genuinely related and deployed together. But I do see an awful lot of application specific libraries under /lib which have to be managed in lockstep with their associated application, why should these be shared libraries at all?

2011-11-11 12:48 pm
Vanders
Shared libraries are the cause of all dependency problems
Library versioning and symbol versioning are a solved problem. It’s only when developers do not follow the standards that they introduce problems. In a properly managed library dependency issues are non-existent. Glibc is the obvious example here: since Glibc2 (libc.so.6) was introduced on Linux, the library has remained both forwards and backwards compatible.
Consider, that shared libraries can’t be optimized over API calls. It might take 30 bytes to call a libc function, which shifts the bytes around again to do a syscall. In a static application, it could theoretically optimize away all the glue code to do a syscall directly while saving space.
The glue code is either needed, or it is not needed. If it’s needed, then you can’t simply “optimize it away”. If it’s not needed, then you can just remove it from the syscall shim.
Remember that in many cases the syscall shim will also attempt to avoid a syscall. It’s much, much better to do the sanity checking and return early if the arguments are bad before making the syscall. It may require a few more instructions before the syscall happens, but that’s still far less expensive than making the syscall only for it to return immediately because the arguments are wrong.
In some cases it’s also entirely possible for the syscall shim to satisfy the caller entirely from user space I.e. it doesn’t even need to call into the kernel.
Obviously we have to look at bigger libraries too, like libjpg, but even there I wonder how much space would be wasted if it were statically compiled.
This isn’t to preclude to the use of shared libraries for applications which are genuinely related and deployed together. But I do see an awful lot of application specific libraries under /lib which have to be managed in lockstep with their associated application, why should these be shared libraries at all?
Now this is where I do a 180 and agree with you! Shared libraries are overused and in a large percentage of cases are used inappropriately. As a simple rule of thumb I’d prefer that system libraries are shared, and any third party libraries required by an application should be static. That’s not fool proof, but it’s a starting point.

2011-11-12 12:08 am
Alfman verbose=1
Vanders,
“The glue code is either needed, or it is not needed. If it’s needed, then you can’t simply “optimize it away”. If it’s not needed, then you can just remove it from the syscall shim.”
I agree with most of your post, but the problem with shared libraries is that often times they just add a layer of indirection to the syscall without adding much value. If you use a shared library to perform a function, then you cannot optimize away the glue code used to call the shared library.
On the other hand if we’re willing to internalize the code into a static binary, the glue code becomes unnecessary (I’m not sure that GCC/LD will do this kind of optimization, but the potential is certainly there).
2011-11-12 1:27 am
Vanders
If you use a shared library to perform a function, then you cannot optimize away the glue code used to call the shared library.
That’s not how shared libraries work. There is no more code required (at run time) to call a function in a shared library than there is in calling a function within the executable.
2011-11-12 4:59 am
Alfman verbose=1
Vanders,
“That’s not how shared libraries work. There is no more code required (at run time) to call a function in a shared library than there is in calling a function within the executable.”
You’re kind of missing my point, though. I know that shared library functions are mapped into the same address space as static functions, and can be called the same way. But the fact that a function belongs to a shared library implies that it must abide by a well defined calling convention, and subsequently translate it’s internal variables to and from this interface. There are optimizations that can take place in a static binary that cannot take place with a shared library.
For example, we obviously cannot do inter-procedural analysis and optimization against a shared library function (since the shared library function is undefined at compile time). Theoretically, using static binaries, an optimizing compiler could analyze the call paths and eliminate all the glue code. Trivial functions could be inlined. Calling conventions could be ignored since there is no need to remain compatible with external dependencies.
In the ideal world, object files would be an intermediate representation like java class files, or .net assemblies. Not only would the run time compilation optimize for the current platform, but it could also perform inter-procedural optimization that might eliminate all costs currently associated with glue code.
2011-11-12 2:09 pm
Vanders
Theoretically, using static binaries, an optimizing compiler could analyze the call paths and eliminate all the glue code.
I can’t help but feel that the time and effort needed to do that well would be significant, yet only save a tiny fraction of the load and link time for a binary that uses classic shared libraries.
2011-11-13 9:03 am
Alfman verbose=1
Vanders,
“I can’t help but feel that the time and effort needed to do that well would be significant, yet only save a tiny fraction of the load and link time for a binary that uses classic shared libraries.”
I actually think that a sequential read of a (possibly larger) static binary on disk could be much faster than many fragmented reads from many shared libraries.
I agree that this would not be worthwhile to change under linux, which is already heavily invested in it’s current implementation. However I think there is merit in considering alternate methodologies for new platforms.

2011-11-11 10:07 am
christian
I guess all that is missing is a functional libc, and shared memory, shared libraries…
What makes you so sure shared libraries is important or even desirable?
http://9fans.net/archive/2008/11/142
All well and good until you find a critical bug in that DNS client library that every network capable program you have installed uses, and you now have to recompile (or relink at least) every single one of them.
Shared libraries may give memory usage benefits or not, may cause DLL hell in some cases, but from a modularity and management point of view, they’re a god send.
Performance poor? That’s an implementation detail of using lookup tables. There’s nothing stopping the system implementing a full run time direct link, at the expense of memory and start up performance.

2011-11-11 10:51 am
bogomipz
The argument that updating a library will fix the bug in all programs that dynamically link said library goes both ways; breaking the library also beaks all programs at the same time.
And if security is a high priority, you should be aware that dynamic linking has some potential risks on its own. LD_LIBRARY_PATH is a rather dangerous thing, especially when combined with a suid root binary.

2011-11-11 11:38 am
jabjoe
I’ll take being able to easily fix everything with easily being able to break everything every time over not able to fix anything.
The LD_LIBRARY_PATH suid root binary security hole is one that if you know about you can avoid. It’s not something that means throw the whole system out.
Update: Looks it’s protected against anyway.
http://en.wikipedia.org/wiki/Setuid
“The invoking user will be prohibited by the system from altering the new process in any way, such as by using ptrace, LD_LIBRARY_PATH or sending signals to it”
Edited 2011-11-11 11:43 UTC

2011-11-11 11:42 pm
Alfman verbose=1
christian,
“All well and good until you find a critical bug in that DNS client library that every network capable program you have installed uses, and you now have to recompile (or relink at least) every single one of them.”
Your point is well received.
However this person has a slightly different suggestion:
http://www.geek-central.gen.nz/peeves/shared_libs_harmful.html
He thinks applications shouldn’t use shared libraries for anything which isn’t part of the OS. This would largely mitigate DLL hell for unmanaged programs.
I realize this answer is gray and therefor unsatisfactory.
A better solution would be to have a standardized RPC mechanism to provide functionality for things like DNS. The glue code would be small, and could always be linked statically. This RPC would be kernel/user space agnostic, and could be repaired while remaining compatible. I think the shift from shared libraries to more explicit RPC interfaces would be beneficial, but it’d basically need a new OS designed to use it from the ground up – now that linux hosts tons of stable code, it’s unlikely to happen.

2011-11-11 11:32 am
jabjoe
Shared objects aren’t really about saving space any more (much of Window’s bloat is having a massive sea of common DLLs that might be needed, and multiple versions of them, for both x86 and AMD64). It’s about abstraction and updates. You get the benefits of shared code from static libs, but to take advantage of new abstractions or updates, with static libs, requires rebuilding. That’s a lot of rebuilding. Check out the dependency graph of some apps you use some time. They are often massive. To keep those apps up to date would require constant rebuilding. Then the update system would have to work in deltas on binaries else you would be pulling down much much more with updates. With shared objects you get updated share code with nothing but the shared object being rebuilt. Easy deltas for free. Having to rebuild everything will have a massive impact on security. On a closed platform this even worse because the vendor of each package has to decide it’s worth them updating. Often it’s even worse because each vendor has their own update system that may or may not be working. Worse, on closed platforms, you already end up with things built against many versions of a lib, often needing separate shared object files (which defeats part of the purpose of shared objects. Manifest is crazy with it’s “exact” version scheme.) Static libs would make this worse. With shared objects not only do you get simple updates but abstraction. Completely different implementations can be swapped in. Plugins are often a system of exactly that. Same interface to the plugin shared objects, but each adds new behaviour. Also put something in a shared object with a standard C interface, and many languages can use it.
With an open platform and a single update system, shared objects rock. You can build everything to a single version of each shared object. You update that single version and everything is updated (fixed/secured). You can sensibly manage the dependencies. You removed shared objects if nothing is using them. You only add shared objects something requires. This can and is, all automated. This does save space, and I would be surprised if that if you build everything statically the install wasn’t quite a lot bigger, unless you have some magic compressing filesystem witch sees the duplicate code/data and stores only one version anyway. But space saving isn’t the main reason to do it.
Any platform that moves more to static libs is going in the wrong direction. For Windows, it may well save space to move to having static libs for everything because of the mess of having so many DLLs not actually required. But it will make the reliability and security of the platform worse (though not if it already has an exact version system, then it’s already as bad as it can be).
In short, you can take shared objects only from my cold dead hands!

2011-11-11 4:40 pm
bogomipz
In short, you can take shared objects only from my cold dead hands!
Haha, nice
I agree with those that say the technical problems introduced with shared libraries and their versioning have been solved by now. And I agree that the modularity is nice. Still, the complexity introduced by this is far from trivial.
What if the same benefits could have been achieved without adding dynamic linking? Imagine a package manager that downloads a program along with any libraries it requires, in static form, then runs the linker to produce a runnable binary. When installing an update of the static library, it will run the linker again for all programs depending on the library. This process is similar to what dynamic linking does every time you run the program. Wouldn’t this have worked too, and isn’t this the natural solution if the challenge was defined as “how to avoid manually rebuilding every program when updating a library”?

2011-11-11 5:12 pm
Vanders
What you’re describing is basically prelinking (or prebinding). It’s worth mentioning that Apple dropped prebrinding and replaced it with a simple shared library cache, because the cache offered better performance.
2011-11-11 7:18 pm
bogomipz
What you’re describing is basically prelinking (or prebinding).
Prelinking exists to revert the slowdown introduced by dynamic linking. I’m talking about not adding any of this complexity in the first place, and just using xv6 in its current form to achieve the same modularity.
(Well, xv6 apparently relies on cross-compiling and does not have a linker of its own, but I would expect a fully functional version to include C compiler and linker.)
2011-11-11 5:39 pm
jabjoe
I don’t see how your system of doing the linking at update time is really any different than doing it at run time.
Dynamic linking is plenty fast enough, so you don’t gain speed. (Actually dynamic linking could be faster on Windows, it has this painful habit of checking the local working directory before scanning through each folder in the PATH environment variable. In Linux, it just checks the /etc/ld.so.cache file for what to use. BUT anyway, dynamic linking isn’t really slow even in Windows.)
You have to compile things different from normal static linking to keep the libs separate so they can be updated. In effect, the file is just a tar of executable and the DLLs it needs. Bit like the way resources are tagged on the end now. Plus then you will need some kind of information so you know what libs it was last tar’ed up against so you know when to update it or not.
What you really are searching for is application folders. http://en.wikipedia.org/wiki/Application_Directory
Saves the joining up of files into blobs. There is already a file grouping system, folders.
The system you might want to look at is: http://0install.net/
There was even a article about it on osnews:
http://www.osnews.com/story/16956/Decentralised-Installation-System…
Nothing really new under the sun.
I grow up on RiscOS with application folders and I won’t go back to them.
Accept dependencies, but manage them to keep them simple. One copy of each file. Less files with clear searchable (forwards and backwards) dependencies.
Oh and build dependencies (apt-get build-dep <package>), I >love< build dependencies.
Debian has a new multi-arch scheme so you can install packages alongside each other for different platforms. Same filesystem will be able to be used on multiple architectures and cross compiling becomes a breeze.
2011-11-11 7:17 pm
bogomipz
I don’t see how your system of doing the linking at update time is really any different than doing it at run time.
The difference is that the kernel is kept simple. The complexity is handled by a package manager or similar instead. No dynamic linker to exploit or carefully harden.
If you don’t see any difference, it means both models should work equally well, so no reason for all the complexity.
You have to compile things different from normal static linking to keep the libs separate so they can be updated.
What do you mean by this? I’m talking about using normal static libraries, as they existed before dynamic linking, and still exist to this day. Some distros even include static libs together with shared objects in the same package (or together with headers in a -dev package).
In effect, the file is just a tar of executable and the DLLs it needs. Bit like the way resources are tagged on the end now.
I may have done a poor job of explaining properly. What I meant was that the program is delivered in a package with an object file that is not yet ready to run. This package depends on library packages, just like today, but those packages contain static rather than shared libraries. The install process then links the program.
Plus then you will need some kind of information so you know what libs it was last tar’ed up against so you know when to update it or not.
No, just the normal package manager dependency resolution.
What you really are searching for is application folders.
No, to the contrary! App folders use dynamic linking for libraries included with the application. I’m talking about using static libraries even when delivering them separately.
The system you might want to look at is: http://0install.net/
Zero-install is an alternative to package managers. My proposal could be implemented by either.
2011-11-11 9:39 pm
jabjoe
The difference is that the kernel is kept simple.The complexity is handled by a package manager or similar instead. No dynamic linker to exploit or carefully harden.
Not really a kernel problem as the dynamic linker isn’t really in the kernel.
http://en.wikipedia.org/wiki/Dynamic_linker#ELF-based_Unix-like_sys…
What do you mean by this?
When something is statically linked, the library is dissolved, what is not used the dead stripper should remove. Your system is not like static linking. It’s like baking dynamic linking.
This package depends on library packages, just like today, but those packages contain static rather than shared libraries. The install process then links the program.
Then you kind of loose some of the gains. You have to have dependencies sitting around waiting in case they are needed. Or you have a repository to pull them down from….
No, just the normal package manager dependency resolution.
That was my point.
No, to the contrary! App folders use dynamic linking for libraries included with the application.
Yes.
I’m talking about using static libraries even when delivering them separately.
As I said before, it’s not really static, it’s baked dynamic. Also if you have dependencies separate you either have loads kicking about in case they are need (Windows) or you have package management. If you have package management all you get out of this is baking dynamic linking. For no gain I can see…..
Zero-install is an alternative to package managers.
It’s quite different as it’s decentralized using these application folders. Application folders are often put forwards by some as a solution to dependencies.
2011-11-12 4:35 pm
bogomipz
Not really a kernel problem as the dynamic linker isn’t really in the kernel.
Sorry, I should have said that the process of loading the binary is kept simple.
When something is statically linked, the library is dissolved, what is not used the dead stripper should remove.
Yes, this is why dynamic linking does not necessarily result in lower memory usage.
Your system is not like static linking. It’s like baking dynamic linking.
This is where I do not know what you are talking about.
Creating a static library results in a library archive. When linking a program, the necessary parts are copied from the archive into the final binary. My idea was simple to postpone this last compilation step until install time, so that the version of the static library that the package manager has made available on the system is the one being used.
This way, the modularity advantage of dynamic linking could have been implemented without introducing the load time complexity we have today.
2011-11-12 5:00 pm
moondevil
Still you loose the benefits of plugins, unless you adapt some form of IPC mechanism, like sandboxing in Lion.
2011-11-12 5:31 pm
bogomipz
Yes, you are right. dlopen() and friends are implemented on top of the dynamic linking loader.
Although sub-processes and IPC is more in line with the Unix philosophy, plugins are definitely useful.
2011-11-12 7:00 pm
jabjoe
Sorry, I should have said that the process of loading the binary is kept simple.
At the cost of making updating or subsisting more complicated and run time subsisting impossible or more complicated.
Yes, this is why dynamic linking does not necessarily result in lower memory usage.
Only if you have one thing using the lib, or everything is barely using the lib and the dead strip gets most of it. It would be a very very rare case when it uses less disk or less RAM.
Creating a static library results in a library archive. When linking a program, the necessary parts are copied from the archive into the final binary.
Linking isn’t just libs, it all the object files. A static lib is just a collection of object data. Linking is putting all this into a executable. With static linking it doesn’t have to care about keeping stuff separate, you can optimize how ever you like. Your system would mean that you would need to keep the lib distinct. Think about when a function is called, you can’t just address the function, why, because the lib sizes will change, thus the file layout, thus the address layout. So you call to a jmp which you change when you bake the libs. You do that, or you have to store every reference to every function in every lib and update them all at time of baking. Which isn’t what you would do as it’s insane. You would do the jmp redirection and keep the lib distinct. Your system is more like dynamic linking than static linking.
Right I’ve done a quick test using lsof and python, totalling up the file sizes for shared object files for all and unique. This gives a rough idea how much memory the current system uses and how much your would use.
Current system: 199 meg
Your system: 1261 meg
Disk will be worse because it will be everything, not just what is running. Might still not be much by disk standards, but by SD standards….
So although I still don’t think the biggest thing is space used (it’s management and simplicity), it certainly shouldn’t be discounted.
Very smart people have been working on this kind of thing for many decades and there is a reason things are like they are.
2011-11-13 9:26 am
Alfman verbose=1
jabjoe,
“This gives a rough idea how much memory the current system uses and how much your would use.
Current system: 199 meg
Your system: 1261 meg
Disk will be worse because it will be everything, not just what is running. Might still not be much by disk standards, but by SD standards…. ”
Could you clarify specifically what you are measuring?
It’s not really fair just to multiply the size of shared libraries into every running binary which has a dependency on them, if that’s what you are doing. This is certainly not what I’d consider an optimized static binary.
“Very smart people have been working on this kind of thing for many decades and there is a reason things are like they are.”
Possibly, but on the other hand our tools may have failed to evolve because we’ve focused too much on shared libraries. They are the metaphorical hammer for our nails. Not that I view shared libraries as necessarily evil, but assuming they did not exist, we would have undoubtedly invested in other potentially much better solutions to many problems.
2011-11-13 2:14 pm
jabjoe
Could you clarify specifically what you are measuring?
I’m using lsof to list all the open files. Those that are .so (shared object) I’m noting. To guess you systems count I am just adding them together each time they are appear. For the current system count I count each just once.
It’s not really fair just to multiply the size of shared libraries into every running binary which has a dependency on them, if that’s what you are doing. This is certainly not what I’d consider an optimized static binary.
I said ‘rough idea’ and that it is. I’ve tried explain a few times, your system would not allow for much more optimization than the existing dynamic lib system because you need to keep the lib distinct. I’ve tried to explain why you would need to keep lib distinct. The only optimization you’re system might have is you could dead strip what is not used when baking the binary. It will make a difference, but no where near enough.
Possibly, but on the other hand our tools may have failed to evolve because we’ve focused too much on shared libraries.
We evolved to shared libs. At one point daemons where used just for shared code.
They are the metaphorical hammer for our nails.
It does happen, but this is not one of those. The only problem with shared libs is dependencies (resolving them and distributing them), and that is solved with system wide package management and open source.
Not that I view shared libraries as necessarily evil, but assuming they did not exist, we would have undoubtedly invested in other potentially much better solutions to many problems.
You are wrong. The history is longer and more diverse than you seam to think. The people involved where/are smarter than you and I.
2011-11-13 11:47 pm
Alfman verbose=1
“I’m using lsof to list all the open files. Those that are .so (shared object) I’m noting. To guess you systems count I am just adding them together each time they are appear. For the current system count I count each just once.”
So, you would add the entire libc every time it is referenced? If so, I don’t think this is a valid metric at all.
“We evolved to shared libs. At one point daemons where used just for shared code.”
Yes, as I said, this may have come at the expense of other evolutionary paths. Evolution may have simply reached a local maxima.
“It does happen, but this is not one of those. The only problem with shared libs is dependencies (resolving them and distributing them), and that is solved with system wide package management and open source.”
Moving the problem upstream is acceptable to end users, but for anyone who’s ever tried to compile projects that are heavy in dependencies (something like asterisk/callweaver), we still suffer from major dependency hell. The problem is not solved for us.
“You are wrong. The history is longer and more diverse than you seam to think. The people involved where/are smarter than you and I.”
Please, speak for yourself
2011-11-14 9:39 am
jabjoe
So, you would add the entire libc every time it is referenced? If so, I don’t think this is a valid metric at all.
It’s all I can do to (easily) measure. As I said, even with dead stripping, I don’t think as much as you clearly hope would be saved. You are proposing a lot of duplication in RAM and on disk (only a problem on small disks like SD). But this isn’t where the problem is so much as the complexity and inflexibility added. It being badly inefficient as well doesn’t help of course.
Yes, as I said, this may have come at the expense of other evolutionary paths. Evolution may have simply reached a local maxima.
The other paths failed. Some problems change with progress of Moore’s law. Others scale with it. This appears to be one that scales with it. If anything, as the dependency tree grows, managing it properly becomes more important.
Moving the problem upstream is acceptable to end users, but for anyone who’s ever tried to compile projects that are heavy in dependencies (something like asterisk/callweaver), we still suffer from major dependency hell. The problem is not solved for us.
If it’s in the repository, it is solved. “apt-get build-dep <package>” and you are away. I have spent over a day sometimes trying to get stuff building on Windows that has taken seconds on Linux. Why? Dependencies. On Linux I get all the dependencies ready to go in a single line. Even if the thing itself isn’t in the repositories, normally it’s dependencies are. Normally the worse you hit is you have to run “./configure” a few times each time install a dependency, then you are done. Some projects have a bootstrap script to install all the dependencies ready for apt or yum systems. On Windows you have to track down the right version of each and set it up in the right way. Each dependency can bring in ‘n’ more. It’s a nightmare. This is one reason Windows often gets second class support from projects. It’s more work to support.
Your scheme won’t make this go away as it’s only a runtime (and update time) thing. The current system, for open Unix systems, solves build dependencies as well as solving runtime dependencies.
Please, speak for yourself
If you think you are coming up with something better than all these smart people, and decades of evolution, I probably should just give up on you because nothing I explain or present you with is going to help.
2011-11-14 10:27 am
Alfman verbose=1
jabjoe,
“It’s all I can do to (easily) measure. As I said, even with dead stripping, I don’t think as much as you clearly hope would be saved.”
Clearly the difference is in glue code and dead code elimination, and like you I can’t measure it very easily because most tools just don’t do it effectively. So why don’t we leave it as an open question?
“If it’s in the repository, it is solved. ‘apt-get build-dep <package>’ and you are away….”
You’re not actually contradicting anything I said, so we are in agreement here. But you’ve said nothing about solving dependency problems for people working on bleeding edge code or out of the repositories.
“Your scheme won’t make this go away as it’s only a runtime (and update time) thing. The current system, for open Unix systems, solves build dependencies as well as solving runtime dependencies.”
It’s funny that you think you know my scheme, since I haven’t really outlined one.
“If you think you are coming up with something better than all these smart people, and decades of evolution, I probably should just give up on you because nothing I explain or present you with is going to help.”
What do you have against progress? I won’t deny the accomplishments of the past, but they made mistakes and many of those are still with us… is it shameful to admit that? Revisiting historical developments is just a natural part of the ongoing drive to improve ourselves. The moment we take the attitude that our predecessors are better than us is the moment it becomes pointless to be a computer scientist.
Maybe I am wrong and you do believe in progress, but then what basis do you have to eliminate me from the set of people who can make it happen? It’s just a simple discussion about shared libraries, why do you need to make it personal?
2011-11-14 5:47 pm
jabjoe
Clearly the difference is in glue code and dead code elimination, and like you I can’t measure it very easily because most tools just don’t do it effectively. So why don’t we leave it as an open question?
Because it is not. Even if most things don’t use much of a lib, and your dead stripper would catch that, you still doing a awful lot of duplication not currently done. The only question is exactly how bad it is. Without dead stripping it is awful. Even if dead stripping generally throws away half, it’s still awful. Plus it’s adding complexity
At work right now I’m actually moving stuff from static to dynamic. I’m changing a project from static linking some libs to dynamically linking them from a single DLL (my department’s DLL). This isn’t to save memory, because it won’t, the DLL is only for that application (for now, python guys are interested though), and there is only one of that application. The reason is to free us to change our stuff without the application requiring to rebuild. It also frees us from using the same libraries as them, and re-implement things without it affecting them (previously there was no clear public/private interfaces between the project and libs).
You’re not actually contradicting anything I said, so we are in agreement here. But you’ve said nothing about solving dependency problems for people working on bleeding edge code or out of the repositories.
I am running bleeding edge code at home. XBMC with the PVR extension, rpcemu and python-espeak, all built from the source. As I said, most, if not all (all in my case) build dependencies are in the repository. If I need dependencies more up to date, I can mix in some Debian unstable or Debian experimental. But I’ve never had to. I don’t know how many things I’ve downloaded and build from source, grabbing the build dependencies from the repository as I’ve gone. Even that old Unix Jurassic Park 3D UI just for fun. 😉
It’s funny that you think you know my scheme, since I haven’t really outlined one
As far as I can see, it is just bake the dynamic dependencies at update time, hopefully dead strip away at least some, and then at run time you have no dependencies. For apps to be updated, you have repository system track what needs updating with what.
What do you have against progress?
Not going to bother to answer because it’s a bad question.
I won’t deny the accomplishments of the past, but they made mistakes and many of those are still with us… is it shameful to admit that? Revisiting historical developments is just a natural part of the ongoing drive to improve ourselves. The moment we take the attitude that our predecessors are better than us is the moment it becomes pointless to be a computer scientist.
Nothing wrong with revisiting stuff. But pick stuff it’s sane to revisit. Saying everything is always questionable, though strictly speaking correct, is also the argument creationist use to say evolution is just a theory and should be constantly questioned. Question what you have good reason to question. Don’t waste time questioning what was long since all but proven beyond doubt, or we will never make any real progress because we will spent our whole time in loops.
Maybe I am wrong and you do believe in progress, but then what basis do you have to eliminate me from the set of people who can make it happen?
The idea itself. It’s a stinker, but I think you need to go and find that for yourself. Try and implement it and get some real data.
It’s just a simple discussion about shared libraries, why do you need to make it personal?
Don’t mean to be personal, but I’m trying to make it clear that you need to look where we have been and understand why we are where we are before you talk about charging “forwards”, because to me it looks like ignoring history and repeating mistakes of the past.
Some of the worse mistakes we have to live with are exactly because people didn’t understand the past before they when and repeated mistakes. They dismissed an older design for their ‘new’ design without realising their ‘new’ design looked a lot like the older design’s predecessors… This could lead on to a anti-Windows rant seamlessly… but I’m know this doesn’t just apply to OSs, or even just software in general. Humans have this bad habit. “Those who cannot learn from history are doomed to repeat it.”
The book that taught me a lot about OSs and how little has really changed was “Lion’s Commentary on Unix 6th edition”. If you get nothing else from this discussion, get this book.
2011-11-14 9:52 pm
Alfman verbose=1
jabjoe,
Not only do you lack creativity, but you condemn those who have it.
How do you like this argument? See how unfair it is? It doesn’t prove you wrong any more than your arguments prove me wrong, it’s just a personal attack and you choose to start launching them from the get-go.
I’m a computer scientist because I love it, I work harder than most to understand how things work and to find creative solutions. And whether you acknowledge it or not I am pretty good at it. I’m open minded even to the possibility that your points may be right, but you’ve forced me to be so one sided because you’ve been so dogmatic yourself. You seem keen on making assertions about me that are not only irrelevant, but that you know nothing about. So please, lay off the patronizing attitude.
Now back to the topic at hand:
“Because it is not. Even if most things don’t use much of a lib, and your dead stripper would catch that, you still doing a awful lot of duplication not currently done.”
In some cases yes, in other cases no. Many functions, particularly wrappers for other functions, benefit from being inlined. Shared libraries make inlining impossible, and they make inter-procedural optimization impossible.
“At work right now I’m actually moving stuff from static to dynamic….The reason is to free us to change our stuff without the application requiring to rebuild.”
That’s fine, but it doesn’t preclude other solutions.
“I am running bleeding edge code at home. XBMC with the PVR extension, rpcemu and python-espeak, all built from the source. As I said, most, if not all (all in my case) build dependencies are in the repository.”
Some projects will build against older *stable* libraries, other times we’re not so lucky. Solving dependencies manually is a hair wrangling experience when versions are out of sync. Maybe these are becoming less common, but I’ve had enough trouble to be unwilling to dismiss the matter all together.
“As far as I can see, it is just bake the dynamic dependencies at update time, hopefully dead strip away at least some, and then at run time you have no dependencies. For apps to be updated, you have repository system track what needs updating with what.”
I did say above that shared libraries aren’t strictly bad, it’s just that they implicitly result in some types of inefficiencies which could be solved by running well optimized static binaries.
But I’m more concerned with how shared libraries have affected software design. If *nix software had evolved around a solid/integrated RPC mechanism instead of around shared libraries, I think we’d be better off today. Pipes, as ingenious as they are, are too one dimensional. RPC could have offered even more flexible ways to have processes interact. Many libraries we have today could be replaced with ad-hoc RPC services. The DNS example offered above would be a perfect example of something that would benefit from having an RPC interface instead of being a shared library. The RPC services could be categorized into an organized namespace instead of being dumped into a global library heap in the file system. Running shared libraries in a process’s address space has negative security consequences, and adds hidden dependencies on subtle things like memory allocators and signals.
In fact programming languages themselves might be better today had RPC been considered a primitive. Any application could call any RPC service, which could be provided through any application, possibly across networks.
“The idea itself. It’s a stinker, but I think you need to go and find that for yourself. Try and implement it and get some real data.”
You are being too quick to jump to conclusions without taking the time to consider the potential of alternative evolutionary paths.
Evolution does not inherently find the best solutions.
(Long article about the evolution of vision, so you may want to jump to the last page)
http://www.scientificamerican.com/article.cfm?id=evolution-of-the-e…
There will always be ways to improve operating systems, however we need to be willing to work with new ideas.

2011-11-10 9:29 am
Mikaku
This is also interesting:
http://www.nordier.com/v7x86/
2011-11-10 12:32 pm
hakossem
Thix is a Unix-like OS that implement almost all POSIX.1 standard:
http://www.hulubei.net/tudor/thix/
http://thix.eu/
2011-11-10 7:04 pm
nokturnal
No tab completion. What kind of OS is this?
Kidding, of course. It’s actually really neat.

2011-11-12 5:02 pm
moondevil
The last time I used HP-UX, Aix and Solaris you would not get tab completion with the default shells as well.

2011-11-10 10:33 pm
sydbarrett74
I know that Minix is a more complex and feature-rich OS (not in the least because it’s capable of production use), but other than this, can someone tell me what differences it has architecturally compared to Xv6?
Also, I’m wondering why they didn’t just use an older, smaller version of Minix since Tanenbaum *did* write it originally as a teaching OS.

2011-11-11 10:24 am
christian
I know that Minix is a more complex and feature-rich OS (not in the least because it’s capable of production use), but other than this, can someone tell me what differences it has architecturally compared to Xv6?
I think because of the rich commentary available with V6 UNIX (Ie. the Lions book), and V6 is very simply.
In fact, I’d go so far to say that the V6 kernel would probably make for a reasonable base of a micro-kernel with a bit of work.
But as it is, V6 is still a monolithic kernel, with all the OS services linked into kernel space, whereas Minix provides only critical services that cannot operate in user space, leaving the rest to user space servers.
I’m torn on the micro versus monolithic kernel debate. Some services are not really restartable without hacks that obviate the benefits of a micro-kernel in the first place (how would you restart the filesystem server if you can’t read the filesystem server binary from the filesystem? You’d have to link it with the kernel blob, meaning it couldn’t be changed at run time).
Also, I’m wondering why they didn’t just use an older, smaller version of Minix since Tanenbaum *did* write it originally as a teaching OS.
The original Minix version was 16-bit and probably not well integrated with development tools like GCC and GDB.
It was a pleasure just typing “make qemu-gdb” and attaching running a debugger in another window, and stepping through the kernel as it did it’s work. I guess it’d take a lot of work to get Minix into that state.

2011-11-11 12:52 pm
sydbarrett74
Christian,
Thank you for your reply. It was very informative.
Cheers!