So you have taken the test and you think you are ready to get started with OS development? At this point, many OS-deving hobbyists are tempted to go looking for a simple step-by-step tutorial which would guide them into making a binary boot, do some text I/O, and other “simple” stuff. The implicit plan is more or less as follow: any time they’ll think about something which in their opinion would be cool to implement, they’ll implement it. Gradually, feature after feature, their OS would supposedly build up, slowly getting superior to anything out there. This is, in my opinion, not the best way to get somewhere (if getting somewhere is your goal). In this article, I’ll try to explain why, and what you should be doing at this stage instead in my opinion.
The importance of design
First, heavy reliance on tutorials is bad for creativity. More often than not we end up blindly following the thing without fully understanding the verbose paragraphs describing what we’re doing, no matter which big red warning the author has put against this practice. What do we get this way? Many low-level tutorials on the web describe how to create a UNIX clone mainly written in C with a monolithic kernel and a simple console shell. Reminds you of something? That’s normal, we already have tons of these around. It’s quite sad to spend all this time in OS development only to get yet another UNIX-like kernel with more limited hardware support than the existing ones, in my opinion, though you might think otherwise.
Another reason for not following this approach is that having our head stuck in code makes us short-sighted. There are many pitfalls which can be avoided through some time spent designing our thing before coding. Simple examples : asynchronous I/O and multiprocessing. There’s nothing intrinsically hard about supporting those, given sufficient doc on the subject, if you’ve made the decision to use them early enough and stick with it. It’s just a different way of thinking. However, if you started without those and try to add them to your codebase later, the switching process will be quite painful.
Finally, the last reason why I think that design is important even for hobby projects is that their teams are small. Generally one or two developers. This is not necessarily a bad thing : in early stages of an OS project, you generally don’t want endless arguments and heaps of code snippets written by different persons in different ways. The obvious drawback, however, is that unless you focus on a limited set of things and try to do these well, you’re not going very far. This is why I think at this stage you should be not be writing code already but rather defining what the project’s goals are, where you want to go.
Exploring motivations
To begin with, forget that you’re targeting a specific computer, and even that you’re writing code. You, as a team of 1-2, maybe more human beings, are designing something which will provide some kind of service to other human beings (which may simply be the members of the team). It all begins here. What is this service which you want to provide? When will you be able to say that your project is successful? When will you be able to say that you’re not on the right track? To encounter success in a hobby OS project, like in many other kinds of personal projects, the first step is to define it clearly.
So go ahead, take a piece of paper or some other mean of text storage (in my case it was a blog), and explore your motivations. If you’re writing an OS because you have some gripe with the existing ones you know of, spend some time finding out what you don’t like in them. Try to understand why they are so, explore their evolution in time, read books on the subject if you can find them. Even if it turns out to be a random choice, you’ve found something : the importance of not doing this part randomly, and to pay particular attention to its design. If you’re writing an OS in order to learn something, try to define what you want to learn in more details. Maybe you want to know where the process abstraction comes from? Through which process does one goes from a complex and dumb electrical appliance which does calculations to something which can actually be used by human beings without a high level of qualification?
In all the cases, take as much notes as possible without disturbing your thinking process. There are two reasons for this : first you can review said notes later, second translating thoughts in text forces you to be more rigorous than you would be if you could stick with ideas so vague that they can’t even be directly expressed in English (or whatever language you may fluently speak). Try to be as precise as if these notes were to be published and read by other people who are not in your head.
Defining a target audience
After having defined why you’re doing this, try to quickly define what your target audience is. An operating system is an interface between hardware, end users, and third-party software developers, so you have to define all three to some extent : what hardware is your operating system supposed to run on? Who is supposed to use it? Who is supposed to write software for it, and what kind of software? You don’t have to be overly precise at this stage, but there are some things which you’d like to decide now.
On the hardware side: you should at least know how the users will interact with the hardware (Keyboard? Pointing devices? Which sort of each?), what range of screen areas you wish to cover, how powerful the oldest CPU which you want to target is and how much RAM is guaranteed to be there. You might also want to know right away if your OS requires an internet connection to operate properly, what sorts of mass storage devices you’re going to use, and how third-party software (if there is some) will be installed.
Quickly check that the kind of hardware you want to code for is homebrew-friendly. Most desktops and laptops are (with maybe the upcoming ChromeOS notebooks as an exception), video games consoles are a mixed bag (Nintendo’s ones often are, others generally aren’t), products from Apple and Sony aren’t until they are deprecated. A word of warning about cellphones: though they may sound like an attractive platform to code for, their hardware is generally-speaking extremely poorly documented, and you should keep in mind that even when you find one which does allow homebrew OSs to run and has sufficient documentation about it available on the web, the next generation is not guaranteed in any way to offer such comfort at all.
The end user is perhaps the one thing which is hardest to define precisely, except when the developers work only for themselves. You should probably keep around the description of a number of stereotypes representing the kind of people which you want to target. Some examples: “70-year old grandma who has been offered a computer by her grandchildren for Christmas. Never used one. Pay special attention to her sight problems.”, “40-year old bearded sysadmin on a company’s network. Won’t use a computer if it can’t be operated in CLI mode with a bash-like syntax. Wants to automate many tasks which are part of his job, think about powerful scripting engines. Needs multi-user support, and a special account to rule them all which only him may access.”, “20-year old creative girl who draws things using a pen tablet and various free software. A computer is a tool like every single other for her, it shouldn’t get in the way more than a block of paper.”
Third-party developers; first thing to note is that if you want them to get interested in your project at some point, you’ll first have to attract them as end users anyway. People don’t write code for a platform which they wouldn’t like to use, except when they are paid for it. On the other hand, devs have a special status in that, to state the obvious, they create software. One of your tasks will hence be to define which kind of software you let them create.
It can range from them writing vital parts of your operating system for you (frequently happens in the world of FOSS desktop operating systems) to every software being written in-house and third-party devs being non-existent (has grown out of fashion these days, except in some categories of embedded devices). Between these extreme situations, there’s a wide range of possibilities. Here are a few…
-
The design/management of OS parts is proprietary, but the spec is disclosed, so that third-party developers are free to implement or re-implement parts of it the way they want
-
Most of the operating system itself is made in a proprietary and undisclosed fashion, but you’re open to third-party drivers and low-level tools (Windows 9x, Mac OS)
-
Same as above, but you ask that applications needing access to some low-level or otherwise “dangerous” functionality go through some approval or signing process (Recent releases of Windows, Symbian)
-
You don’t tolerate third-party low-level apps and will do everything you can to prevent their very existence, but you offer a native development kit for user applications with generous access to the system APIs
-
Same as above, but native third-party user apps are considered dangerous and must go through some signing/approval process (iOS)
-
No native third-party app, everything goes through managed apps which only have restricted access to system capabilities (Android, Windows Phone 7, most feature phones)
With all this information in mind, you should start to have a fairly good idea of where you’re heading, and be ready for the next step.
Goals and frontiers
Now that the idea is there, try to put it on paper. Describe what your OS is supposed to achieve, for who, on which hardware, who will code what, and when you’ll be able to tell that the project is successful according to its initial goals, that it has reached the 1.0 release, so to speak. By defining the project’s goals you define criteria for objectively measuring its success and failure, which will prove to be a very valuable resource later, be it only as a mean to avoid feature bloat and other ways of wasting your very precious development resources.
In the end, look at this global view of the project you have now and ask yourself this simple question : “Do I think I can make it?”. If you think you should maybe lower your expectations a little bit, now is the right time, because the later you do it the higher the damage done. Play with all this stuff you have gathered together, polish everything until you have a project you’re satisfied with (or maybe drop this idea of writing an OS altogether if you don’t feel like it any more), and then you’ll be ready for the next step, namely designing your kernel.
I appreciate the author took time to write this article, but I’m sorry to say this article is obvious and mundane. There is really nothing interesting or insightful, just a list of motivational steps.
From reading the article I just get the feeling the author does not know much about operating systems design, but he tries to look clever and gives some advice to would be operating system developers from a lame user’s prospective.
You’d be surprised how many people don’t stop to consider the obvious. Nobody gets taught to do that in school. Thinking of the obvious, it not an obvious thing to many.
Exactly. I’ve got a nice book about website usability where if you only read the author’s twelve principles on the subject, you consider that she’s just stating the obvious and wonder why you have bothered buying the book at all.
Then, if you take the time to read the rest of the book, you discover many, many examples of famous websites which don’t follow these “obvious” rules.
Afterwards, you can consider that all website designers are idiots. Or admit that even when something sounds obvious, it’s not necessarily so.
This, plus some time spent on OSdev’s forums, is why I felt it was a good idea to include this part in my tutorial’s plan. The “throw random features on a raw educational kernel and hope it sticks” attitude is much more prevalent that one would spontaneously think it is.
If an OS is meant to be used, considering it from the user’s point of view first is a truly vital step. Because it allows to guide further design decisions, and avoid making something which tries to do everything at once, and ends up sucking in every area.
Edited 2011-02-05 13:30 UTC
Consider your perspective now in comparison to your perspective when you first started with computers. With that in mind, consider the perspective of someone who’s never worked on creating an OS before & compare that to the perspective of someone who has not only worked on one, but has also released one. That’s quite a large gap in perspectives between the new guys & the guys who’ve been there and done that, correct? I’m sure that you’ll agree that hindsight is 20/20 & there are many things that are obvious now, but once weren’t.
“their OS would supposedly build up, slowly getting superior to anything out there. This is, in my opinion, not the best way to get somewhere (if getting somewhere is your goal)”
It is the ONLY way of getting somewhere, and the sole motiviation of any OS dev hobbyist getting anywhere.
The author is a moron who tries to make us write yet another “unix like” system.
There’s a ton of ways to be superior to anything out there, and everyone in this business need the audacity to believe it.
So, when are you reading the rest of the article ?
more comments like this I read, the more cynical i become about the human race
So where is your awesome, mind blowing non-unix OS?
Could you possibly be more wrong? That’s exactly what the author’s trying to avoid. Did you even read the whole article or are you just spouting out non-sense?
Your target audience should be world of warcraft;-)
The distro would jump to the number three spot in no time
On a more serious note. Microsoft monopoly on DX graphics engine. One of the last deal breakers to other OS adoption?
It’s important, but I think the biggest problem for now is that most computers are sold bundled with Windows, to the point where for the average Joe, PC = Windows. This is the main reason why everyone would expect DX games to work everywhere, and is pissed off when it doesn’t happen.
Look at a traditionally multi-OS environment like the mobile space : the life of newcomers is much easier, because people are used to seeing lots of devices with similar HW capabilities but incompatible software. You mostly choose a phone based on hardware and experience of the brand, you don’t expect all phones to work in exactly the same way and run the same apps…
Edited 2011-02-05 15:44 UTC
You know, DX isn’t the only API used for game development. It’s probably the most popular on Windows, but None of the other OSs have it & there are still quite a bit of games out there that run on more than just Windows.
Food for thought.
Frankly when I look at things like Haiku and Wayland, I’m not 100% sure that the decision behind them are very rational..
For Haiku, not starting from a FreeBSD or Linux kernel, is probably a severe case of NIH(*), for Wayland, I still don’t understand why they didn’t create a major version change of the X protocol instead, but I’m not yet 100% sure that it’s only NIH syndrome.
* Apple and Google have shown that you can reuse a kernel core to something totally different in the userspace..
Haiku did not write their kernel from scratch, they forked NewOS, so it’s not a clear cut case of NIH.
I do agree, however, that using Linux or kFreeBSD would mean less work and better hardware support. Video drivers is an exception, though, because I doubt they would have kept X11. Whether the devs would be happy with the result is another question, and since they didn’t choose this road, I suspect they decided it was too much of an architectual compromise.
Before Haiku started to show signs of success, there was a project aiming to recreate the BeOS APIs on top of Linux. This basically was the rational approach you seem to wish for, but it failed: http://blueeyedos.com/
Still, you may be right that it would be saner to start with a widespread kernel.
I don’t know what has happened in Haiku’s case, but there are rationales behind not using Linux/FreeBSD’s kernels, if your design goals don’t match theirs.
Examples : if you want something customizable and reliable, you’d probably want a microkernel. If you want something good for hugely interactive tasks, there are better options than Linux and BSDs out there too (just look at the huge lot of RTOS projects).
Besides NIH, there’s also the “there’s so much to fix, it’s just better to start over” aspect of things. This doesn’t prevent reusing some code either, as an example Haiku reuses some Linux driver code.
Just because it is a microkernel doesn’t mean it is more customizable or reliable. The only difference is that if a driver runs in userland and dies it might not bring down the entire system. The system is as reliable as the quality of it’s drivers.
And if by interactive task you mean real-time task then you shouldn’t compare Linux with an RTOS since the goal of them are different meaning Linux until recently couldn’t even run real-time tasks.
Edited 2011-02-05 22:33 UTC
By customizable, I meant that putting a process boundary between things makes sure that they are much more independent from each other than they would be if they were part of the same codebase. The microkernel model enforces modularity by its very nature.
Better reliability is enforced because a much more fine-grained security model can be used, where even drivers have only access to the system capabilities they need (and thus are prevented from doing some sorts of damage when they run amok). As you mention, putting drivers in the userspace also allows them to crash freely, without taking the rest of the OS with them.
To the contrary, that’s precisely the point.
Not every OS project should be based on Linux, because it’s only good for a limited set of things. If you have RTOSs or desktop reactivity in mind, Linux is a very poor base to start from. It’s still good to take a look at its source when coding drivers, though, due to its wide HW support.
“By customizable, I meant that putting a process boundary between things makes sure that they are much more independent from each other than they would be if they were part of the same codebase. The microkernel model enforces modularity by its very nature.”
The lack of modularity in linux is a serious problem. We continue to have problems with graphics support in new kernels due to factors which are completely out of the user’s hands.
Driver writers blame kernel developers for constantly changing interfaces (the total lack of a kernel ABI whatsoever), meanwhile kernel developers blame driver writers for nor releasing source code. The idealogical battle, which is rational on both sides, causes end users to suffer.
Even when everyone plays fairly (by releasing source code), there’s a great deal which is rejected by linux mainline (let’s use AUFS as an example of something that linux users want, but kernel maintainers reject).
This means the driver developers need to either release a binary compiled against every single kernel/distro variant the user’s might be using, or the end users are forced to compile their own kernel and hope that the mainline is compatible with the patches they want to use.
The situation gets potentially much worse if the user wants to install additional patches.
These problems stem directly from the lack of modularity in linux. Modularization would be an excellent topic to iron out in the initial OS design rather than doing a ginormous macro-kernel.
As much as I despise ms for using DRM to lock open source developers out of the kernel, I have to say they did get the driver modularization aspect right.
In the ideal world, driver interfaces would be standardized such that they’d be portable across operating systems. Not that this is likely to happen, windows kernel DRM means open source devs are not welcome there any longer. And linux maintainers don’t love the idea of defining ABI interfaces because it enables driver writers to distribute binaries easily without source.
“Not every OS project should be based on Linux, because it’s only good for a limited set of things.”
Nobody wants another *nix clone which works on less hardware than the original.
Well I think I disagree, Linux/BSD are very capable kernels but they are not optimized for desktop use (interactivity) and while it certainly could be rewritten for such purpose, from a programmer’s perspective I would rather spend the time necessary for rewriting an existing kernel on making a new one better suited for the task instead. Either way they didn’t have to start from scratch, as the previously mentioned NewOS kernel was available (written by an ex-Beos engineer).
As for the talk of micro kernels, I’d like to point out that Haiku is not a micro kernel. Hardware drivers will bring down the system if they fail (same goes for networking, filesystem which resides in the kernel space iirc). However it does have a stable’ish driver api which means updating the kernel won’t break existing drivers.
Microkernels offers stability at the expense of performance, rtos’es offers fine grained precision at the expense of performance. There are lots of places where these characteristics are worth the loss performance, but the desktop isn’t one of them.
Are you sure that microkernels wouldn’t be worth it ?
AFAIK, desktop computers have had plenty of power for years (just look at the evolution of games). The problem is just to use that power wisely.
On a KDE 4 Linux desktop, having some disk-intensive taks in the background is all it takes to cause intermittent freezes. This, simply put, shouldn’t happen. If the computer is powerful enough to interact smoothly with the user when no power-hungry app is running, it should be powerful enough when there’s one around. I think what is currently needed is not necessarily raw firepower, but wise and scalable resource use. Given that, I bet that microkernels could provide a smooth desktop experience on something as slow as an Atom.
Edited 2011-02-06 07:48 UTC
This has nothing to do with monolithic vs hybrid vs micro. It has to do with the Linux kernel being optimized for throughput rather than responsiveness (as in, not really optimized for the desktop). You can have the exact same optimization for throughput on hybrid and micro kernels. There have been patches around for ages which helps alleviate this problem, iirc the latest ‘200 lines blabla’ patch is supposed to be included in the kernel.
What a micro kernel offers is separation between it’s components, so that if one fails the rest of the system will continue to function. This results in components having to pass messages around to intercommunicate which is much slower than accessing memory directly, hence loss of performance.
Now the likelyhood that say my keyboard driver would malfunction while I’m using my computer is very low, in fact it has never happened during my entire lifetime. If it would happen though, it would bring my system down with it and it would be a bummer for sure. But I still don’t feel the need for a micro kernel just so that IF this happened I’d be able to save whatever I was working on, particularly since it comes with a definite performance penalty.
However, if I was sitting in the space shuttle, and the same unlikely thing happened and a malfunctioning keyboard driver would take down my computer then it would be a disaster. So in this case, yes I’d certainly be willing to sacrifice performance so that in the unlikely event this happened the whole system would not shut down.
Obviously depending on what you are doing, but why would anyone bother, the current operating systems are not so unstable that micro kernels are needed for mainstream usage. I’d rather take well written drivers and the performance thankyouverymuch.
Well, it’s a choice. Myself, I’d rather take something guaranteed to be rock-solid and not to crash simply due to some buggy NVidia/AMD driver. I do only few things which require performance on my computer (compilation and image editing), and none of these would be much affected by a kernel->microkernel switch.
(Sorry for the misunderstanding, I meant that the overhead of a microkernel was nothing compared to the amount of power “lost” due to inefficient scheduling)
Edited 2011-02-06 09:58 UTC
“Microkernels offers stability at the expense of performance…There are lots of places where these characteristics are worth the loss performance, but the desktop isn’t one of them.”
I’d say the stability problems such as corruption and overflow stem more from the choice of highly “unsafe” languages rather than choice of micro-kernel/macro-kernel.
You’re argument in favor of a macrokernel in order to achieve performance is somewhat dependent on the assumption that a microkernel cannot perform well. However I think there are various things to boast the performance of a microkernel design.
The microkernel does not have to imply expensive IPC. If modules are linked together at run or compile time, they could run together within a privileged cpu ring to boast performance.
As for stability and module isolation, there are a few things we can try:
1. Code could be written in a type safe language under a VM such as Java or Mono. The calls for IPC could be implemented by exchanging data pointers between VMs sharing a common heap or memory space without changing CPU rings. Individual models would never step on each other despite existing in the same memory space.
Not only is this approach plausible, I think it’s realistic given the current performance and transparency of JIT compilers.
2. Segmentation has been declared a legacy feature in favor of flat memory models, but hypothetically memory segmentation could provide isolation among microkernel modules while eliminating the need for expensive IPC.
3. User mode CPU protections may not be necessary if the compiler can generate binary modules which are inherently isolated even though running in the same memory space. Therefor, the compiler rather than the CPU would be enforcing module isolation.
As much as people hated my opinion on up front performance analysis, I’d say this is an instance where the module inter-communications interface should be performance tested up front. Obviously, as more of the kernel modules get built, this will be very difficult to change later on when we notice an efficiency problem.
I’ve spent hours arguing on this precise subject with moondevil, I won’t start over. In short, I’ll believe that it’s possible to write a decent desktop OS in a “safe” language when I see it.
In meantime, microkernels offer the advantage of reducing much the impact of failures and exploits., when there are some. A buggy process can only have the impact it’s authorized to have.
That’s not what I said. My take on the subject is that microkernels can obviously not have the same performance as a macrokernel (some optimization is only possible when kernel components share a common address space), but that they can have sufficient performance for desktop use.
Then you do not have a microkernel, but a modular monolithic kernel. Putting components in separate processes is afaik a defining characteristic of microkernels.
As said before, I’ll believe it when I see it.
Note that microkernels are not incompatible with shared memory regions between processes, though. It’s one of the niceties which paging permits. In fact, I believe that they are the key to fast IPC.
Segmentation is disabled in AMD64 and non-existent in most non-x86 architectures, so I’m not sure it has much of a future. Besides… How would you want to use it ? If you prevent each process from peeking in other process’ address space, then they need IPC to communicate with each other. But perhaps you had something more subtle in mind ?
But then hand-crafted machine code and code from other compilers than yours could bypass system security… Unless you would forbid those ?
It is possible to stress-test inter-module/process communication after implementing it and before implementing modules, or even while implementing it. The problem is to determine what is good enough performance at this early stage. Better make code as flexible as possible.
Edited 2011-02-06 14:15 UTC
“That’s not what I said.”
Sorry, I responded to your post quoting something which was from someone else.
“I’ve spent hours arguing on this precise subject with moondevil, I won’t start over.”
Fair enough, but it’s not really adequate to dismiss my argument, there isn’t even a citation.
“As said before, I’ll believe it when I see it.”
It doesn’t exist yet, therefor you don’t believe it could exist?
Neolander, I appreciate your view, but I cannot let you get away with that type of reasoning.
All of today’s (major) kernels predate the advent of efficient VMs. With some original out of the box thinking, plus the benefit of the technological progress in the field in the past 15 years, a type safe efficient kernel is not far-fetched at all.
Per usual, the main impediments are political and financial rather than technological.
“Segmentation is disabled in AMD64 and non-existent in most non-x86 architectures, so I’m not sure it has much of a future.”
That’s exactly what I meant when I called it a legacy feature. However, conceivably the feature might not have been dropped if we had popular microkernels around using it.
“But then hand-crafted machine code and code from other compilers than yours could bypass system security… Unless you would forbid those ?”
You need to either trust your binaries are not malicious, or validate them for compliance somehow.
If we’re running malicious kernel modules which are never the less “in spec”, then there’s not much any kernel can do. In any case, this is not a reason to dismiss a microkernel.
“It is possible to stress-test inter-module/process communication after implementing it and before implementing modules, or even while implementing it.”
I am glad we agree here.
Okay, let’s explain my view in more details.
First, let’s talk about performance. I’ve been seeing claims that interpreted, VM-based languages, can replace C/C++ everywhere for some times. That they now are good enough. I’ve seen papers, stats, and theoretical arguments for this to be true. Yet when I run a Java app, that’s not what I see. As of today, I’ve seen exactly one complex java application which had almost no performance problems on a modern computer : the Revenge of the Titans game. Flash is another good example of popular interpreted language which eats CPU (and now GPU) time for no good reason. It’s also fairly easy to reach the limits of Python’s performance, in that case I’ve done it myself with some very simple programs. In short, these languages are good for light tasks, but still not for heavy work, in my experience.
So considering all of that, what I believe now is that either the implementation of current interpreters sucks terribly, or that they only offer the performance they claim to offer when developers use some specific programming practices that increase the interpreter’s performance.
If it’s the interpreter implementation, then we have a problem. Java has been here for more than 20 years, yet it would still not have reached maturity ? Maybe what this means is that although theoretically feasible, “good” VMs are too complex to actually be implemented in practice.
If it’s about devs having to adopt specific coding practices in order to make code which ran perfectly well in C/C++ run reasonably well in Java/Flash/Python… Then I find it quite ironical, for something which is supposed to make developer’s life easier. Let’s see if the “safe” language clan will one day manage to make everyone adopt these coding practice, I’ll believe it when I see it.
Apart from the performance side of things, in our specific case (coding a kernel in a “safe” language that we’ll now call X), there’s another aspect of things to look at. I’m highly skeptical about the fact that those languages could work well at the OS level AND bring their usual benefits at the same time.
If we only code a minimal VM implementation, ditching all the complex features, what we end up having is a subset of X that is effectively perfectly equivalent to C, albeit maybe with slightly worse performance. Code only a GC implementation, and your interpreter now has to do memory management. Code threads, and it has to manage multitasking and schedule things. Code pointer checks, and all X code which needs lots of pointers see its performance sink. In short, if you get something close the the desktop language X experience, and get all of the usual X benefit in terms of safety, your interpreter ends up becoming a (bloated) C/C++ monolithic kernel in its own right.
Then there are some hybrid solutions, of course. If you want some challenge and want to reduce the amount of C/C++ code you have to a minimal level, you can code memory management with a subset of X that does not have GC yet. You can code pointer-heavy code with a subset of X where pointer checks are disabled. And so on. But except for proving a point, I don’t see a major benefit in doing this instead of assuming that said code is dirty by its very nature and just coding it in C/C++ right away.
Yes, but you did not answer my question. Why would they have used segmentation instead of flat seg + paging ? What could have segmentation permitted that paging cannot ?
Again, I do not dismiss microkernels. But I do think that forcing a specific, “safe” compiler in the hand of kernel module devs is a bad idea.
Edited 2011-02-07 08:41 UTC
“First, let’s talk about performance. I’ve been seeing claims that interpreted, VM-based languages, can replace C/C++ everywhere for some times….”
Firstly, I agree about not using interpreted languages in the kernel, so lets get that out of the picture right away.
Secondly, to my knowledge, the performance problems with Java stem from poor libraries rather than poor code generation. For instance, Java graphics were designed to be easily portable rather than highly performing, therefor it’s very poorly integrated with the lower level drivers. Would you agree this is probably where it gets it’s reputation for bad performance?
Thirdly, many people run generic binaries which aren’t tuned for the system they’re using. Using JIT technology (actually, the machine code could be cached too to save compilation time), the generated code would always be for the current processor. Some JVMs go as far as to optimize code paths on the fly as the system gets used.
I do have some issues with the Java language, but I don’t suppose those are relevant here.
“I’m highly skeptical about the fact that those languages could work well at the OS level AND bring their usual benefits at the same time.”
Can you illustrate why a safe language would necessarily be unsuitable for use in the kernel?
“If we only code a minimal VM implementation, ditching all the complex features, what we end up having is a subset of X that is effectively perfectly equivalent to C, albeit maybe with slightly worse performance.”
‘C’ is only a language, there is absolutely nothing about it that is inherently faster than Ada or Lisp (for instance). It’s like saying Assembly is faster than C, that’s not true either. We need to compare the compilers rather than the languages.
GNU C generates sub-par code compared with some other C compilers, and yet we still use it for Linux.
“Code only a GC implementation, and your interpreter now has to do memory management. Code threads, and it has to manage multitasking and schedule things.”
I don’t understand this criticism, doesn’t the kernel need to do these things regardless? It’s not like you are implementing memory management or multitasking just to support the kernel VM.
“Code pointer checks, and all X code which needs lots of pointers see its performance sink.”
This is treading very closely to a full blown optimization discussion, but the only variables which must be range checked are those who’s values are truly unknown within the code path. The compiler can optimize away all range checks on variables who’s values are implied by the code path. In principal, even an unsafe language would require variables to be range checked explicitly by the programmer (otherwise they’ve left themselves vulnerable to things like stack overflow), which should be considered bugs and thus an unfair “advantage”.
“Why would they have used segmentation instead of flat seg + paging ? What could have segmentation permitted that paging cannot ?”
In principal, paging can accomplish everything selectors did. In practice though switching selectors is much faster than adjusting page tables. A compiler could trivially ensure that the kernel module didn’t overwrite data from other modules by simply enforcing the selectors except in well defined IPC calls – thus simultaneously achieving good isolation and IPC performance. Using page tables for isolation would imply that well defined IPC calls could not communicate directly with other modules without an intermediary helper or mucking with page tables on each call.
Of course, the point is mute today anyways with AMD64.
In fact, that was what most of my post was about
Probably. Java is often praised for its extensive standard library, so if said library is badly implemented, the impact will probably be at least as terrible as if the interpreter is faulty since all Java software is using it.
Does it have that much of an impact ? I’m genuinely curious. Didn’t play much with mtune-like optimizations, but I’d spontaneously think that the difference is the same as between GCC’s O2 and O3.
That was what the rest of the post was about.
Don’t know… I’d say that languages have a performance at a given time, defined by the mean performance of the code generated by the popular compilers/interpreters of that time.
Anyway, what I was referring to is that interpreted languages are intrinsically slower than compiled languages, in the same way that an OS running in a VM is intrinsically slower than the same OS running on the bare metal : there’s an additional bytecode re-compilation overhead. They don’t have to be much slower though : given an infinite amount of time and RAM, compiled and interpreted code end up having equal speed in their stationary state, and interpreted can even be slightly faster due to machine-specific tuning. The problem is the transient, and situations where only few RAM is available.
Sure, but if the kernel’s VM ends up doing most of the job of a kernel, what’s the point of coding a kernel in X at all ? The VM, which is generally coded in a compiled language, ends up being close to a full-featured kernel, so I don’t see the benefit : in the end, most of the kernel is actually coded in whatever language the VM is written in.
Take a linked list. When parsing it, a process ends up looking at lots of pointers without necessarily knowing where they come from. This is the kind of code which I had in mind.
It’s possible to have overhead only at process load and first call time with paging and tweaked binaries, but I need a better keyboard than my cellphone’s one to explain it.
Edited 2011-02-07 16:52 UTC
“Anyway, what I was referring to is that interpreted languages are intrinsically slower than compiled languages”
Yes.
“in the same way that an OS running in a VM is intrinsically slower than the same OS running on the bare metal”
But, most VM implementations do run apps on bare metal (within user space). We’re talking about making a VM run on bare metal in kernel space.
A VM guaranties isolation, but beyond that requirement there is absolutely no reason it has to “interpret” code. It’s genuine machine code which runs directly on the processor.
You said you’ve read the Java/C benchmarks, which is why I didn’t cite any, but it sounds like your doubting the results? Why?
“Sure, but if the kernel’s VM ends up doing most of the job of a kernel, what’s the point of coding a kernel in X at all ? The VM, which is generally coded in a compiled language, ends up being close to a full-featured kernel, so I don’t see the benefit”
Ok I understand.
We shouldn’t underestimate what can be done in ‘X’ (as you call it). C’s memory manager can be written in C, why rule out using X to do the same thing? It’s only a question of bootstrapping.
More importantly though, the vast majority of code running in kernel space (whether micro/macro), is device drivers. In a micro-kernel design, the core kernel should be very small and do very little – like switch tasks and help them intercommunicate. If this very small piece cannot be implemented in pure ‘X’, then so be it. It’s like peppering ‘C’ with assembly.
Even for a macro-kernel design, I’d say a safe language could be beneficial.
Personally, I actually like ‘C’, but the lack of bounds checking is something that developers have been struggling with since it’s inception.
It has other shortcomings too: the lack of namespaces causing library collisions, ugly header file semantics, a very weak macro/template system, a lack of standardized strings, etc.
I’m not saying we should not use C, but if we do, then get ready for the “usual suspects”.
“Take a linked list. When parsing it, a process ends up looking at lots of pointers without necessarily knowing where they come from. This is the kind of code which I had in mind.”
One approach could be to adapt the way many safe languages already handle references (which let’s face it, is a “safe” pointer). All references could be dereferenced safely without a check, any other pointers (let’s say coming from user space) would need to be validated prior to use.
Sorry for the confusion. I was talking about the VirtualBox/VMware kind of virtual machine there : software which “emulates” desktop computer hardware in order to run an OS in the userspace of another OS.
So you say that it could be possible to envision “safe” code that’s not interpreted ? What differs between this approach and the way a usual kernel isolates process from each other ?
What I’ve read showed that for raw computation and a sufficiently long running time, there’s no difference between Java and C, which means that JIT compilation does work well. On the other hand, I’ve not seen benchmarks of stuff which uses more language features, like a comparison of linked list manipulation in C and Java or a comparison of GC and manual memory management from a performance and RAM usage point of view. If you have such benchmarks at hand…
I use X when I think that what I say applies to all “safe” programming languages.
The problem is that in many safe languages, memory management and other high-level features are taken for granted, as far as I know, which makes living without them difficult. As an example, GC requires memory management to work, and it’s afaik a core feature of such languages.
There we agree… Except that good micro-kernels try to put drivers in user space when possible without hurting performance.
I think I agree.
Don’t know… I hated it initially, but once I got used to it it only became a minor annoyance.
Fixed in C++
I totally agree there, C-style headers are a mess. The unit/module approach chosen by Pascal and Python is imo much better.
Fixed in C++
Fixed in C++, but if I wanted to nitpick I’d say that char* qualifies.
Low-level code must always be polished like crazy anyway.
Are these working in the same way as the C++ ones ? If so, are they suitable for things like linked lists where pointers have to switch targets ?
“Sorry for the confusion. I was talking about the VirtualBox/VMware kind of virtual machine there”.
Well, that would skew the discussion considerably.
For better or worse, the term “virtual machine” has been overloaded for multiple purposes. I intended to use the term as in “Java Virtual Machine”, or dot net…
“So you say that it could be possible to envision ‘safe’ code that’s not interpreted ? What differs between this approach and the way a usual kernel isolates process from each other ?”
It would depend on the implementation of course. But theoretically, a single JVM could run several “virtual apps” together under one process such that they are all virtually isolated. Each virtual app would pull from a unified address space but would have it’s own mark/sweep allocation tree (for example). This would enable the JVM to kill one virtual app without affecting others.
Take this design, and apply it to the kernel itself using virtually isolated modules. This is different from the ‘process’ model where the CPU protections and page tables to enforce isolation.
As an aside, I have nothing against the ‘process’ design, but the overwhelming concern that we always talk about is the IPC cost.
“What I’ve read showed that for raw computation and a sufficiently long running time…”
Honestly I haven’t used Java in a while, I’m using it here since it is the most popular example of an application VM in use today, another might be microsoft’s CLR. If you have benchmarks showing slow Linked Lists in Java, I’d like to see them.
“The problem is that in many safe languages, memory management and other high-level features are taken for granted, as far as I know, which makes living without them difficult.”
You have to implement memory management in any OS. Language ‘X’ would simply have to implement it too, what’s the difference whether it’s implemented in ‘C’ or ‘X’?
“As an example, GC requires memory management to work, and it’s afaik a core feature of such languages.”
Yes, but we need to implement memory management anyways. Malloc is part of the C spec, and yet malloc is implemented in C. Obviously the malloc implementation cannot use the part of the spec which is malloc. Instead, malloc is implemented using lower level primitives (ie pages)- I have written my own, it’s not so bad.
I’m on the fence with pure Garbage Collection. It seems really lazy to me to let unused objects float around on the heap. I’d lean towards having a delete operator.
Let me just state the biggest reason NOT to go with ‘X’: we’d have to design and implement the whole language, and that would be a lot of work – we don’t want to get off track from developing the OS.
So it makes sense to choose something that’s already out there, and the tradition is to go with ‘C’ – it just sucks that it’s such an unsafe language.
An alternative might be the ‘D’ language, which sounds like promising ‘C’ replacement.
“Are these working in the same way as the C++ ones ? If so, are they suitable for things like linked lists where pointers have to switch targets ?”
C++ references are for the benefit of the programmer, but add no functional value. Safe languages often say they don’t have “pointers”, they have “references” (I hate this distinction since they refer to the same thing).
In perl, you can do anything you would in C with a pointer except for arithmetic. In principal a safe language could support arithmetic so long as it is range checked against an array before being dereferenced. However, perl has it’s own primitive arrays, so pointer arithmetic is not supported.
I know. I used it without thinking, because it’s what I’m usually talking about when mentioning virtual machines. Sorry again.
Wouldn’t it be much cheaper for the MMU to enforce isolation than for software ?
I think there are lots of way to optimize it, but until my kernel is grown up enough to try them all I have no way to say if once all those optimizations are applied the result is good enough from a performance perspective.
I use it as my main example for exactly the same reason : much less data is available on CLR&dotnet, probably due to them being much younger technology than Java.
I have no benchmark showing slow linked lists, in fact I have no benchmark showing heavy pointer manipulation and memory allocation/liberation at all. All the benchmarks I’ve seen so far are about raw computation, and all it takes to get good computational speed in an interpreted language is a good JIT compiler. It doesn’t test the other potential performance bottlenecks. That’s what I was complaining about.
That users of X are used to several programming practices which implicitly require the use of memory allocation ? In C, you know what you’re doing anytime you write a malloc() call, but AFAIK higher-level languages allow several implicit memory allocations constructs, like e.g. defining an array of size a where a is an integer whose value is not known in advance, the goal being that the developer does not have to care about memory allocation at all as it is an automated process. This would probably make the life of a seasoned X developer writing a kernel in X harder.
I’d say this is a hot debate, although I totally agree with you that it’s really not that hard to write a free() anytime you write a malloc() and a delete anytime you write a new…
Wait… Why ? We have to design and implement the interpreter/VM, sure, but if we decide to go with Java or C# the language is already there, right ?
Been hearing about it for some time, but I think that it’s too obscure of a language for OS development at the moment. When we start to call functions from assembly and to call assembly functions, or something equally dirty, it’s better to use a language whose internals have been dissected and discussed on the web many times. The D inventor has made the choice to totally drop C++ link-time compatibility, so we’re in unknown territory there.
So the only difference between a safe language’s “reference” and a C-style pointer is the inability to do pointer arithmetic ? As long as there are ways to replace it (e.g. having a way to parse an array of unknown length from a known physical memory location), that doesn’t sound like that much of an issue.
“Wouldn’t it be much cheaper for the MMU to enforce isolation than for software ?”
It depends. If we have a type safe language with safe references – then the VM could enforce isolation between modules quite naturally without a MMU. Simply put, structures belonging to one model don’t have direct references to structures belonging to another module – so no checks need to be performed to dereference existing references.
The only time a check is needed is when a new pointer is generated (ie via pointer arithmetic), this is the only case the VM would need check for permission – but generally speaking, a bounds check is needed after any pointer arithmetic anyways.
In other words, a VM with a safe language could theoretically achieve object isolation for free.
There is no cost for using the MMU either, however this puts constraints on the IPC used, which could be less efficient. I guess it would be a good point to discuss IPC pros/cons… but I’ll let someone else start that discussion.
“if we decide to go with Java or C# the language is already there, right ?”
Yes, but I’m sure there are going to be many things that need to be worked out in the kernel. Does anyone know the extend to which Java depends on userland runtime libraries?
If we wanted to attempt the “virtual isolation” rather than MMU based isolation, then we’d need to make the necessary changes to allow the JVM to tear down individual modules without touching others – this is not a native Java capability.
“So the only difference between a safe language’s ‘reference’ and a C-style pointer is the inability to do pointer arithmetic?”
Yes, most safe languages don’t support pointer arithmetic (consider javascript). But I’d assert that even arbitrary pointers could be used safely if they are bounds checked first. I can’t find a fundamental reason that safe language references need to be checked more frequently than safe+bug free pointers in C. However this is totally dependent on the quality of the compiler – there’s a good chance that C compilers are more mature than anything else.
Of course, it would always be possible to optimize the compiler later.
Yes, this is often stated as a PRO for JIT compiled code, but since it is JIT (just-in-time) the actual optimizations that can be performed during an acceptable timeframe are VERY POOR.
Well, assembly allows more control than C, so given two expert programmers, the Assembly programmer will be able to produce atleast as good and often better code than the C programmer, since some of the control is ‘lost in translation’ when programming in C as opposed to Assembly. Obviously this gets worse as we proceed to even higher-level languages where the flexibility offered by low level code is traded in for generalistic solutions that works across a large set of problems but are much less optimized for each of them.
Which compilers would that be? Intel Compiler?
While both a GC and manual memory management has the same cost in the actual allocating and freeing of memory (well almost, a GC running in a WM will have to ask the host OS for more memory should it run out of heap memory which is VERY costly, also in order to reduce memory fragmentation it often compacts the heap which means MOVING memory around, again VERY costly though hopefully less costly than asking the host OS for a heap resize), a GC adds the overhead of trying to decide IF/WHEN memory can be reclaimed which is a costly process.
I’ve been looking forward to seeing a managed code OS happening because I am very interested in seeing how it would perform. My experience tells me that it will be very slow, and that programs running on the OS will be even slower, last I perused the source code of a managed code OS it was filled with unsafe code, part of which was there because of accessing hardware registers but alot of it also for speed, which is a luxury a program RUNNING on said OS will NOT have.
Hopefully we will have a managed code OS someday capable of more than printing ‘hello world’ to a terminal which might give us a good performance comparison but personally I’m sceptic. I think Microsoft sent Singularity to academia for a reason.
The problem is, an efficient implementation of a microkernel needs a lot of optimization planning up front, which, if you’ve read the previous article in the series, is a very unpopular notion.
Therefor, most programmers will start writing the OS the easiest way they know how, which more often than not means modeling it after existing kernels. Unfortunately this often results in many new operating systems sharing the same inefficiencies as the old ones.
I was trying to highlight some areas of improvement, but of course people are highly resistant to any changes.
Now that I have a better keyboard at hand…
As I said, it’s possible to have some IPC cost and page table manipulation only when a process is started, and maybe at first function call. Here’s how :
-Functions of a process which may be called through IPC must be specified in advance, so there’s no harm tweaking the binary so that all these functions and their data end up being in separate pages of RAM, in order to ease sharing. I think it should also be done with segmentation anyway.
-The first time a process makes an IPC call, part of or all of these pages are mapped in his address space, similar to the way shared libraries work.
-Further IPC calls are directly made in the mapped pages, without needing a kernel call or a context switch.
If we are concerned about the first call overhead (even though I’d think it should be a matter of milliseconds), we can also make sure that the mapping work is done when the second process is started. To do that, we ensure that our “shared library” mapping process happens when the process is started, and not on demand.
What’s the problem with this approach ?
Edited 2011-02-08 07:45 UTC
“As I said, it’s possible to have some IPC cost and page table manipulation only when a process is started….What’s the problem with this approach ?”
This is one possible approach, however it limits the type of IPC which can be used.
In particular, this approach forces us to copy data into and out of the shared memory region. Furthermore, if the shared memory region contains any critical data structures, they could be clobbered by a module pointer bug which could cause other modules to crash.
The segmentation approach would allow the well defined/safe IPC code to access any data explicitly shared through the IPC protocol without any copying.
All other code in the module is isolated since the compiler would not allow it to change the segments.
Therefor, selectors do have different protection semantics than a static page table.
Why ? We stay in the same process, just calling code from the shared memory region.
Indeed, that’s why I think global/static variables should be avoided like pest in shared functions. A state in shared code is a disaster waiting to happen.
Can you describe in more details how it would work for some common IPC operations ? (sending/receiving data, calling a well-defined function of another process…)
Edited 2011-02-08 08:52 UTC
“Why ? We stay in the same process, just calling code from the shared memory region.”
I was assuming that you intended to passed all data around through a shared data region. But it sounds like you want to use protected IPC functions to copy data directly between isolated module heaps?
That’s not bad, I guess it’d need to be profiled. But that single copy disturbs me, a macrokernel wouldn’t need to do it.
“Can you describe in more details how it would work for some common IPC operations ?”
What I had in mind was basically, assuming we had virtual isolation capabilities of a kernel VM, an IPC mechanism could transfer object references between modules. The objects themselves would remain in place, and there’d be no copying.
To do this efficiently+safely the source module could have only one reference to the object being transfered, but this isn’t unreasonable. Also, presumably any references within the object being transfered would also be transfered recursively.
I don’t have time to discuss it further right now.
I’m not sure we’re talking about the same thing. If we’re talking about transferring data between process A and process B, here’s how I’d do it :
-> Process A allocates that “shareable” data in separate page(s) of physical memory to start with, using a special variant of malloc. Sort of like having several, isolated heaps. When process A is ready to send that data to process B, it calls the kernel.
-> The kernel maps the page(s) of physical memory in process B’s virtual address space and un-maps them from process A’s virtual address space, then gives a pointer to that data to B.
-> Data transfer is done. No copy required, only one context switch to do the mapping and pointer generation work.
Now, let’s assume that we want to do fast distant procedure call : process A wants to call a function of process B as if it was a shared library.
-> When process B is loaded, special care is taken to ensure that the code which may be called by external processes is in separate pages of physical memory, isolated from the rest of the process’ code.
-> When process A wants to call a function of process B, the kernel maps that page of code in process A’s address space and gives A a pointer to the function.
-> Process A then calls the function as if it was its own code.
That’s what I had in mind with my paging tricks too, using the MMU and separated heaps for shared objects as an isolation mechanism. But you were talking about segmentation…
Indeed, when I think about IPC, pointers are always the tricky bit. An IPC system which does transfer pointers/references automatically without any help from the developer would be something fantastic, but that’d be pretty hard to code too…
Alright, see you later then
Edited 2011-02-08 11:05 UTC
“-> Process A allocates that ‘shareable’ data in separate page(s) of physical memory…When process A is ready to send that data to process B…The kernel maps the page(s) of physical memory in process B’s virtual address space and un-maps them from process A’s…”
This would work, but remapping page tables from A to B is rather expensive since we need to manipulate/flush/reload the page entries in the CPU. Let me know if I am still misunderstanding your idea.
If you’re transferring large buffers, then the overhead to remap pages probably won’t be bad, but for small IPC requests I am pretty sure it’d be cheaper to just copy the data outright – especially if that data is in cache.
The virtual isolation I described wouldn’t have any copy or remap overhead. A pointer could be passed between modules just like a macro-kernel and the language VM semantics would ensure safety.
Another difference, which you mentioned, is that MMU based IPC would necessarily need 4K data pages (or 2MB on large page systems).
Obviously micro-kernels are traditionally implemented via MMU isolation.
I just wanted to point out that theoretically a safe language could offer isolation without an MMU and achieve IPC which is as speedy as a macro-kernel.
OS devs will probably go with C anyways, but it does seem like a plausible way to make a micro-kernel design perform like a macro-kernel.
No misunderstanding This approach is only valid for data transfers which are large enough for a TLB/cache flush and the memory lost by having separate heaps to be worth it. For smaller data transfers, a traditional shared buffer between both processes would probably remain the best solution. The exact boundary between both should probably be set based on some testing.
Indeed. What I wanted to point out on my side is that MMU-based isolation can be optimized a lot, maybe enough to reduce its impact to an acceptable level. But still, your idea should really be implemented too, so that we may compare both approaches
Heh, this tutorial has an emphasis on developper creativity, I’ll only put C-specific code towards the very end, so maybe you have a chance to get someone interested in your idea if that someone is reading our talk… Or you can write an article on the subject, too.
Edited 2011-02-09 06:15 UTC
“so maybe you have a chance to get someone interested in your idea if that someone is reading our talk… Or you can write an article on the subject, too.”
Haha, I can’t imagine that being too popular. Most people seem to perceive my opinions as terribly uninformed, though I don’t quite understand why.
Because they sound crazy maybe… The way I handle crazy things myself is to challenge the author until he proves to be able to handle all my concerns reasonably well, afterwards I (usually ) admit he might be right. But other people might do it another way.
Edited 2011-02-10 07:15 UTC
Swing UI Performance is not the same as Java Performance, which is probably what you are complaining about.
“In short, I’ll believe that it’s possible to write a decent desktop OS in a “safe” language when I see it.”
http://programatica.cs.pdx.edu/House/
http://web.cecs.pdx.edu/~kennyg/house/
Now bend down and praise the Lords…
Kochise
This thing has an awful tendency to have one of my CPU cores run amok, I wonder if it uses timer interrupts properly… But indeed, I must admit that apart from that it does work reasonably well.
*bends down indeed, impressed by how far people have gone with what looks like a language only suitable for mad mathematicians when browsing code snippets*
However, I wonder : if haskell is a “safe” language, how do they manage to create a pointer targeting a specific memory region, which is required e.g. for VESA VBE ? Or to trigger BIOS interrupts ?
Edited 2011-02-07 20:21 UTC
“However, I wonder : if haskell is a ‘safe’ language, how do they manage to create a pointer targeting a specific memory region, which is required e.g. for VESA VBE ? Or to trigger BIOS interrupts ?”
A safe compiler can assure us that all pointers are in bounds before being dereferenced. Most of these bounds checks would be “free” code since the values are implied within the code paths.
The compiler might track two pointer variable attributes: SAFE & UNSAFE.
A function could explicitly ask for validated pointers.
This way, the compiler knows that any pointer it gets is already safe to use without a bounds check.
void Func(attribute(SAFE) char*x) {
// safely dereference x
}
for(char *p = (char*)0xa0000; p<(char*)0xaffff; p++) {
Func(p); // no extra bounds check needed, the code path implies the pointer is in valid range.
}
char*p;
fscan(“%p”, &p); // yucky dangerous pointer
Func(p); // here the compiler is forced to implicitly bounds check the pointer, since the function is requesting a safe pointer.
This is not a performance penalty because code which does not validate the pointer is a bug waiting to be exploited anyways.
The SAFE/UNSAFEness of pointers can be tracked under the hood and need not complicate the language. Although if we wanted to, we could certainly make it explicit.
Developers of safe languages have been doing this type of safe code analysis for a long time. It really works.
Unfortunately for OS developers, most safe languages are interpreted rather than compiled, but JVM and CLR show that it is possible.
This is somewhat as much as impressive than my own attempt to do something similar, yet using existing kernel code (Minix 3) and functional language (Erlang).
The biggest problem so far is that Minix 3 is only “self-compiling” : you can only developp and hack Minix FROM Minix (no cross development possible due to hackish code targeted to their own C compiler -ACK- no pun intended) and the big dependence of Erlang to GCC’s specific extensions.
C portability has never been so much discutable…
On a brighter note, I also greatly considered this thesis project as programming interface :
http://www.csse.uwa.edu.au/~joel/vfpe/index.html
It is written in Java, and might scales better over Haskell than Erlang, yet I find some “operations” to be more intuitive using emacs (plain text editing) than creating “tempo” objects to fit functional’s lazyness :/
My 0.02€ :p
Kochise
EDIT : typo
Edited 2011-02-08 17:23 UTC
When I read “visual programming”, the image which pops up in my brain is that of LabView, and my hand spontaneously starts looking for weapons, I just can’t help it ^^
Edited 2011-02-08 17:35 UTC
Don’t feel so ticklish
Kochise
Playing with complex LabView programs has really been a traumatizing experience…
I’d say it’s something like debugging an old-fashioned BASIC program (you know, those with GOTOs everywhere), but with an added dimension of space so that it feels even more messy.
Edited 2011-02-08 18:07 UTC
Well, I for one, consider that a *decent desktop OS* needs to be able to run:
-an HW accelerated games such as Doom3.
-a fully featured webbrowser
-a fully featured office suite (say LibreOffice).
Wake me up when they reach this point..
And then there is also the issue of hardware support..
If they can support PCI and VESA, they can support HW accelerated graphics as well, it’s only a matter of writing lots of chipset-specific code in order to support it, which is a brute force development task.
Again, porting webkit/gecko or an office suite on a new architecture is a brute force development task once some prerequisites (like a libc implementation and a graphics stack) are there. I sure would like to see some complex applications around to see how well they perform, but Kochise’s example does show that it’s possible to write a simple GUI desktop OS with haskell.
Again, that’s not the point of such a proof-of-concept OS. They only have to show that it’s possible to implement support for any hardware, by implementing support for various hardware (which they’ve done), the rest is only a matter of development time.
Of course, I’d never use this OS as it stands. But it does prove that it’s possible to write a desktop OS in this language, which is my original concern.
Edited 2011-02-08 10:15 UTC
I used to consider this a plausible approach, too. However, any shared-memory approach will make the RAM a bottleneck. It would also enforce a single shared RAM by definition.
This made me consider isolated processes and message passing again, with shared RAM to boost performance but avoiding excessive IPC whenever possible. One of the concepts I think is useful for that is uploading (bytecode) scripts into server processes. This avoids needless IPC round-trips and even allows server processes to handle events like keypresses in client-supplied scripts instead of IPC-ing to the client, avoiding round-trips and thus be more responsive.
The idea isn’t new, though. SQL does this with complex expressions and stored procedures. X11 and OpenGL do this with display lists. Web sites do this with Javascript. Windows 7 does it to a certain extent with retained-mode draing in WPF. There just doesn’t seem to be an OS that does it everywhere, presumably using some kind of configurable bytecode interopreter to enable client script support in server processes in a generalized way.
Example: a GUI server process would know about the widget tree of a process and has client scripts installed like “on key press: ignore if the key is (…). for TAB, cycle the GUI focus. On ESC, close window (window reference). On ENTER, run input validation (validation constraints), and send the client process an IPC message is successful. (…)”
There you have a lot of highly responsive application-specific code, running in the server process and sending the client an IPC message only if absolutely needed, while still being “safe” due to being interpreted and any action checked.
That would be a more elegant way to do the same as could be done with paging. On 64-bit CPUs the discussion becomes moot anyway. Those can emulate segments by using subranges of the address space; virtual address space is so abundant that you can afford it. The only thing you don’t get with that is implicit bounds checking, but you still can’t access memory locations which the process cannot access anyway.
If used for “real” programs, this argument is the same as using a JVM or .NET runtime.
On the other hand, if you allow interpreted as well as compiled programs, and run them in the context of a server process, you get my scripting approach.
Morin,
“I used to consider this a plausible approach, too. However, any shared-memory approach will make the RAM a bottleneck. It would also enforce a single shared RAM by definition.”
That’s a fair criticism – the shared ram and cache coherency model used by x86 systems is fundamentally unscalable. However, considering that shared memory is the only form of IPC possible on multicore x86 processors, we can’t really view it as a weakness of the OS.
“This made me consider isolated processes and message passing again, with shared RAM to boost performance but avoiding excessive IPC whenever possible. One of the concepts I think is useful for that is uploading (bytecode) scripts into server processes.”
I like that idea a lot, especially because it could be used across computers on a network without any shared memory.
Further still, if we had a language capability which could extract and submit the logic surrounding web service calls instead of submitting web service calls individually, that would be a killer feature of these “bytecodes”.
“That would be a more elegant way to do the same as could be done with paging. On 64-bit CPUs the discussion becomes moot anyway.”
See my other post as to why this isn’t so if we’re not using a VM for isolation, but your conclusion is correct.
I was referring to the shared RAM and coherency model used by Java specifically. That one is a lot better than what x86 does, but it still makes the RAM a bottleneck. For example, a (non-nested) “monitorexit” instruction (end of a non-nested “synchronized” code block) forces all pending writes to be committed to RAM before continuing.
If you limit yourself to single-chip, multi-core x86 systems, then yes. That’s a pretty harsh restriction though: There *are* multi-chip x86 systems (e.g. high-end workstations), there *are* ARM systems (much of the embedded stuff, as well as netbooks), and there *are* systems with more than one RAM (e.g. clusters, but I’d expect single boxes that technically contain clusters to be “not far from now”).
Morin,
“There *are* multi-chip x86 systems (e.g. high-end workstations), there *are* ARM systems (much of the embedded stuff, as well as netbooks), and there *are* systems with more than one RAM…”
Sorry, but I’m not really sure what you’re post is saying?
Then I might have misunderstood your original post:
I’ll try to explain my line of thoughts.
My original point was that shared RAM approaches make the RAM a bottleneck. You responded that shared RAM “is the only form of IPC possible on multicore x86 processors”.
My point now is that this is true only for the traditional configuration with a single multi-core CPU and a single RAM, with no additional shared storage and no hardware message-passing mechanism. Very true, but your whole conclusion is limited to such traditional systems, and my response was aiming at the fact that there are many systems that do *not* use the traditional configuration. Hence shared memory is *not* the only possible form of IPC on such systems, and making the RAM a bottleneck through such IPC artificially limits system performance.
As an example to emphasize my point, consider CPU/GPU combinations with separated RAMs (gaming systems) vs. those with shared RAM (cheap notebooks). On the latter, RAM performance is limited and quickly becomes a bottleneck (no, I don’t have hard numbers).
I wouldn’t be surprised to see high-end systems in the near future, powered by two general-purpose (multicore) CPUs and a GPU, each with its own RAM (that is, a total of 3 RAMs) and without transparent cache coherency between the CPUs, only between cores of the same CPU. Two separate RAMs means a certain amount of wasted RAM, but the performance might be worth it.
Now combine that with the idea of uploading bytecode scripts to server processes, possibly “on the other CPU”, vs. shared memory IPC.
“my response was aiming at the fact that there are many systems that do *not* use the traditional configuration. Hence shared memory is *not* the only possible form of IPC on such systems, and making the RAM a bottleneck through such IPC artificially limits system performance.”
“As an example to emphasize my point, consider CPU/GPU combinations.”
I expect the typical use case is the GPU caches bitmaps once, and doesn’t need to transfer them across the bus again. So I agree this helps alleviate shared memory bottlenecks, but I’m unclear on how this could help OS IPC?
Maybe Nvidia’s CUDA toolkit does something unique for IPC? That’s not really my area.
I’m curious, what role do you think GPU’s should have in OS development?
“I wouldn’t be surprised to see high-end systems in the near future, powered by two general-purpose (multicore) CPUs and a GPU, each with its own RAM (that is, a total of 3 RAMs) and without transparent cache coherency between the CPUs, only between cores of the same CPU.”
This sounds very much like NUMA architectures, and while support for them may be warranted, I don’t know how this changes IPC? I could be wrong, but I’d still expect RAM access to be faster than any hardware on the PCI bus.
“Now combine that with the idea of uploading bytecode scripts to server processes, possibly ‘on the other CPU’, vs. shared memory IPC.”
I guess that I may be thinking something different than you. When I say IPC I mean communication between kernel modules. It sounds like you want the kernel to run certain things entirely in the GPU, thereby eliminating the need to run over a shared bus. This would be great for scalability, but the issue is that the GPU isn’t very generic.
Most of what a kernel does is IO rather than number crunching, it isn’t so clear how a powerful GPU is helpful.
> This sounds very much like NUMA architectures, and
> while support for them may be warranted, I don’t
> know how this changes IPC?
NUMA is the term I should have used from the beginning to avoid confusion.
> > “As an example to emphasize my point, consider CPU/GPU combinations.”
> I expect the typical use case is the GPU caches
> bitmaps once, and doesn’t need to transfer them
> across the bus again. So I agree this helps
> alleviate shared memory bottlenecks, but I’m
> unclear on how this could help OS IPC?
It seems that my statement has added to the confusion…
I did *not* mean running anything on the GPU. I was talking about communication between two traditional software processes running on two separate CPUs connected to two separated RAMs. The separate RAMs *can* bring performance benefits if the programs are reasonably independent, and for microkernel client/server IPC, data caching and uploaded bytecode scripts improve performance even more and avoid round-trips.
The hint with the GPU was just to emphasize the performance benefits of using two separate RAMs. If CPU and GPU use two separate RAMs to increase performance, two CPUs running traditional software processes could do the same if the programs are reasonably independent.
Not to say that you *can’t* exploit a GPU for such things (folding@home does), but that was not my point.
> I could be wrong, but I’d still expect RAM access to
> be faster than any hardware on the PCI bus.
Access by a CPU to its own RAM is, of course, fast. Access to another CPU’s RAM is a bit slower, but what is much worse is that it blocks that other CPU from accessing its RAM *and* creates cache coherency issues.
Explicit data caching and uploaded scripts would allow, for example, a GUI server process to run on one CPU in a NUMA architecture, and the client application that wants to show a GUI run on the other CPU. Caching would allow the GUI server to load icons and the like once at startup over the interconnect. Bytecode scripts could also be loaded over the interconnect once at startup, then allow the GUI server to react to most events (keyboard, mouse, whatever) without any IPC to the application process.
The point being that IPC round-trips increase latency of the GUI (though not affecting throughput) and make it feel sluggish; data transfer limit both latency and throughput, and in a NUMA architecture you can’t fix that with shared memory without contention at the RAM and cache coherency issues.
“I did *not* mean running anything on the GPU. I was talking about communication between two traditional software processes running on two separate CPUs connected to two separated RAMs.”
I would have left out the discussion GPUs entirely, but I understand what you mean now.
“Access by a CPU to its own RAM is, of course, fast. Access to another CPU’s RAM is a bit slower, but what is much worse is that it blocks that other CPU from accessing its RAM *and* creates cache coherency issues.”
You are right, it throws another variable into the multi-threaded programming mix. So far though (to my knowedge), all NUMA configurations produced by intel are still cache coherent – I don’t know if they ever plan on changing that.
Obviously without cache coherency, we’d need explicit synchronization and a mechanism to invalidate cache. This is very tricky to get right since the cache entry size may not correspond to the size of the variables the programmer wishes to synchronize. This could break all existing multi-thread code.
So the ultimate scalable paradigm is more likely to be highly parallel clusters rather than highly parallel multi-threading (NUMA or otherwise).
As you hinted earlier, some type of SQL-like remote programming instructions would be nice to have in a cluster computing environment.
We could also take a lesson from IBM mainframe era by solving problems using job queues. Mainframe jobs are designed to minimize tight couplings/dependencies which add bottlenecks. They are fast, elegant, simple, and easy to migrate to different processors (with no shared memory).
Linux doesn’t really have a good equivalent, perhaps it should.
Well, it depends on your goals: if you want your OS to be used by a lot of people then the time to write/adapt the big number of drivers needed is probably much more than the time to adapt FreeBSD or Linux kernel(*)..
Remember than even with Linux, the number 1 criticism is that there is still not enough good drivers for it
(as seen with the recent discussion about Firefox and HW acceleration)!
For a toy OS/specialised OS, sure, writing your own kernel/reusing a small one make sense, but the number of usable HW configuration will probably stay very small.
*For example, con kolivas maintain his own scheduler for better ‘desktop usage’, there was BeFS for Linux but it’s obsolete (2.4).
If that’s your stance, then you’ve missed the whole point of the BeOS & why Haiku wants to recreate it. Be’s motto was a play on Apple’s motto: Don’t just THINK different, BE different. There are plenty of ‘would-be’ OSs using the BSD & Linux kernels. That’s not exactly BEing different, now is it? Haiku’s kernel is based off of an OS that was started by an ex-Be software engineer (NewOS – http://newos.org/ ). It’s has it’s own heritage; & it’s a heritage that’s just as valid as any other OS’s heritage.
Not really, there are plenty of distributions (which configures mostly the applications), for OS really different using the BSD kernels|Linux, I can only think of MacOS X (not really BSD, it’s a Mach kernel) and Android (Linux).
Uhm, the webpage doesn’t give any specific reason why it would as good as *BSD or Linux kernels which have a lot more drivers.
And frankly, marketing BS like ‘BE different’ is for fanboys not for developers..
More BSD peeks out from under the MacOS X hood than Mach. Mach is less exposed to the parts of the kernel that actually reach out into the userland. & though It does have Mach inside of it, most people (developers, users, Apple, & even journalists) tally MacOS X in the BSD column. Also, a distribution doesn’t constitute as a different OS, All Linux distributions are basically the same OS. And why is that? Because the systemcalls, memory management, & other kernel parts didn’t change & the basic userland framework didn’t change. Slapping on new paint & a different package management system isn’t enough to call it a new OS.
Nobody said that NewOS was better, better is relative. The key point is that it’s not the same rehashed bs that keeps getting praised simply for the sake of being the same rehashed bs. To be honest, purely from a desktop prospective, Be was more successful on the desktop in the 90’s than Linux is now. Though, Be’s OS would never have the slightest chance in the server world. And just to give you a hint, it’s not about who has more drivers, it’s about who’s drivers are better & which drivers are available. No one cares if NetBSD or Linux are ultra-portable with thousands of drivers, if they don’t have the drivers for the particular hardware that they are using at that particular time. Also, on the topic of marketing, there’s only one company that’s a marketing genuis & that’s Microsoft. They’ve constantly found a way to shove their OSs down most people’s throats whether they want it or not. With that in mind, surely, you understand that marketing isn’t a measure of how good a system is ‘technically’. However, great marketing usually means that you’ll be around a lot longer regardless of how good or bad your OS is. Fanboys are inevitable regardless of the OS & with or without marketing. So, if you think that ‘Be different’ is just marketing bs that’s for fanboys & not developers, bring your development skills to the table. Let’s see you create a better OS than the BeOS from scratch & come up with a better motto.
“First, heavy reliance on tutorials is bad for creativity.” – but good for arcane technical bits!
After you’ve got your very own Hello World OS running, you have the bare minimum platform with which to experiment. Don’t be afraid to make lots of these, and see how they work; compiling a bad idea is cheap, designing for a bad idea is expensive.
(Maybe this is obvious too? I dunno.)
Just a quick “Thanks!” to Hadrien for the information and needed “prodding” to continue with my own OS! It’s good to know that others “suffer” as I do and share the same concerns and constraints. Regardless of other comments, I greatly appreciated the article. Peace! Fritz