Post a Comment
How is this different from this document?
http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf
Edited 2008-03-16 22:09 UTC
There were no significant performance changes made from 4.8 to 4.11. Even if there were, if dfly never picked them up then that's still a problem.
Well, it's different in the sense that it's not the same :-)
In the PDF you link to I have a graph comparing freebsd to dfly 1.8 running mysql, but in the current tests I compared freebsd 7.0 with dfly 1.12 and freebsd 4.11 on a wider variety of tests.
http://leaf.dragonflybsd.org/mailarchive/users/2008-03/msg00025.htm...
There is a discussion too on ml.
DF BSD is a fork of FreeBSD 4.8, and a lot of work has been done in FreeBSD to remove the GKL(? Giant Kernel Lock), which I AFAIK was not ported to the DF kernel because Matt Dillion believed that this was the wrong approach. I wonder what he has to say in regards to recent benchmark improvements by FreeBSD?
I believe it would probably something along the lines that DragonflyBSD is really still in alpha (while it maybe usable for some people for daily use, it has implemented all its promised features yet of a transparent cluster). And that as long as the focus is on completing the core feature set, performance improvements are not being worked on beyond an as-needed basis.
Though I wonder: with these performance issues how will DF deal when it comes to work on them when the clustering is complete? Or perhaps it doesn't matter? Maybe the future of DF is in its niche of clustering where perhaps it will be unchallengeable, and then be adequate at the rest?
After all, OpenBSD isn't used for super-computing but has managed to find itself a nice niche in security conscious jobs.
Edited 2008-03-17 01:42 UTC
"DF BSD is a fork of FreeBSD 4.8, and a lot of work has been done in FreeBSD to remove the GKL(? Giant Kernel Lock), which I AFAIK was not ported to the DF kernel because Matt Dillion believed that this was the wrong approach."
They inherited the MP lock from FreeBSD 4.8 and it is basically where all SMP for any OS gets started (one giant lock protecting the entire kernel for one thread at a time). Matt has / had issues with the way that threading was being implemented in FreeBSD, and moreso the all out (over)use of fine grained locks sprinkled about the kernel to replace the Giant lock.
"I wonder what he has to say in regards to recent benchmark improvements by FreeBSD?"
From what I've seen of his writings WRT DF scalability ATM, he would not be suprised.
"I believe it would probably something along the lines that DragonflyBSD is really still in alpha (while it maybe usable for some people for daily use, it has implemented all its promised features yet of a transparent cluster). And that as long as the focus is on completing the core feature set, performance improvements are not being worked on beyond an as-needed basis."
Basically zero optimization has occured in DF to date, as much of the kernel still requires the MP lock, and from the release notes for 1.12 he claims that the biggest part of the kernel that needs more attention for SMP is anything I/O related. Large parts of the kernel have been mostly MP safe for a while (for example large parts of the network stack), but they end up needing to grab the MP lock because of the non-MP safe code.
"Though I wonder: with these performance issues how will DF deal when it comes to work on them when the clustering is complete? Or perhaps it doesn't matter?"
Optimization isn't likey to be a big issue until the kernel is mostly MP safe, and the core clustering work is done, however issues like the possible namecache flakyness encountered by Kris would definately be dealt with as soon as the problem can be tracked down.
Before release 1.10, the DF folks spent a bit of time yanking out mounted USB memory sticks and fixing bugs they found in so doing. That isn't related to either SMP or clustering, and I'm offering it only as an example of the fact that they will take the time to correct obvious problems.
"Maybe the future of DF is in its niche of clustering where perhaps it will be unchallengeable, and then be adequate at the rest?"
Well, DF is a general purpose OS, that has an additional goal to allow native SSI clustering at the kernel level. Clustering is still a ways off, but I find the system usable in the general purpose sense.
That said, as much as I like the DF project, I don't ever see it being widely deployed. Its just a case of Windows and Linux etc being "good enough" for most people.
"After all, OpenBSD isn't used for super-computing but has managed to find itself a nice niche in security conscious jobs."
Perhaps DF will come to fill such a niche roll. Time will tell.
I did it for you
.
Writing a fine grained kernel is hard, especially when you need to support some functionality that's not so friendly for large MP systems.
Maybe Dillon's approach is right. Instead of trying to coordinate actions between processors in a lock free or finely grained manner while trying to maintain tons of shared state, why not treat your large MP machine as many smaller ones that are coordinated at a higher level?
The Google cluster is a great example of this. They don't really need a scalable OS at all: a simple kernel that efficiently manages I/O and gets out of the way of the one executing task on a particular node is all that's needed. Everything else is coordinated from their aggregation servers and their distributed namespace/locking system.
. Writing a fine grained kernel is hard, especially when you need to support some functionality that's not so friendly for large MP systems.
Maybe Dillon's approach is right. Instead of trying to coordinate actions between processors in a lock free or finely grained manner while trying to maintain tons of shared state, why not treat your large MP machine as many smaller ones that are coordinated at a higher level?
This is the McVoy cache coherent cluster approach
http://www.bitmover.com/cc-pitch/
It has always seemed like hadwaving to me. The problem with this is: "what higher level?"
What does it buy you to program a multiprocessor system as a set of communicating cluster nodes, that you can't do as a monolithic kernel? As far as I can see, it only serves to place restrictions on the ways you can communicate and interact between CPUs.
So why do proponents of this approach think they can just assert that a system of communicating nodes -- that *must* still be synchronized and share data at some points -- can scale as well as a monolithic kernel? I don't think that is a given at all.
Well, they use Linux, and actually they are pushing some pretty complex functionality into the kernel (to do resource control, for example).
And I don't know what it's requirements are, but you can bet it's nothing like a regular UNIX filesystem / POSIX API.
Also, I don't see why you think Google is a great example of this. Google does not have a large MP machine. It has a big, non-cache-coherent cluster. So there is only one way to program it -- like a cluster.
I remember Matthew Dillon was pretty harsh and insulting with how the FreeBSD Project was trying to solve the BSD SMP problem. I don't know if in the future Dragonfly will scale better than FreeBSD however two things are now certain. 1) As witnessed by the very promising scalability results we are now seeing, the FreeBSD Project was right to push forward and play it safe. 2) As witnessed by the utter lack of progress of SMP scalability in DragonflyBSD, while the road the FreeBSD project took was hard, the road Matthew Dillon was proposing was no walk in the park. I really think he should own up and apologize to the project now.
Maybe everyone should just work on their own projects and drop the petty bickering and benchmark-waving. Using benchmarks, and benchmark comparisons with other projects, as tools to guide development is one thing. But using them as a way to taunt other projects is something else, and is substantially less constructive.
Hear! Hear!
Any benchmark of this sort is fairly useless across different systems for the simple reason that everyone has a different set of optimized cases and weak areas.
For instance, people used to use select() on UNIX and find that it performed worse on NT because NT was designed for a different approach as encompassed in the IoCompletionPort API.
Additionally, any application with highly specialized performance requirements, like a database, would probably go to great lengths to avoid paths within the host OS that are necessary in the general case but bad for the specialized application. Hence you see raw disk mode in Oracle and SQL.
I think the Linux-CFS vs. FreeBSD-ULE scheduler numbers should be taken with a grain of salt. It may just be that MySQL (or at least the benchmark) needs to be reoptimized for the new Linux scheduler to get the formerly high throughput numbers. The scheduler shouldn't matter to a well-written database because there should always be not much more than 1 work item per cpu being processed at a time.
IIRC Matt had some harsh things to say about FreeBSD5.. which is about the time he forked (and forked from FreeBSD4). FreeBSD6/7 was more or less a departure from 5's way of SMP. So in that respect, he was pretty much right on.. it just so happens that the FreeBSD team corrected the 'mess' that was FreeBSD5 before he could get his DFly work out the door in a state that satisfies him.
In any case, all the BSD's have bright people working on their teams and, since it's all open source, we should all benefit from the different ideas eventually.
No, you are quite wrong. FreeBSD 5 was the first release that had the bits of SMPng. The reason FreeBSD 5 has been (quite unfairly) derailed is because SMPng simply was not fully completed as some parts were missing and not in any way focusing on performance. FreeBSD 5 (5.0 and 5.1, which BTW was NOT PRODUCTION (as in stable) releases which is often forgotten) was rather about providing a correct implementation to build upon for future releases. IIRC, what M. Dillon criticized was the basic SMP synchronization model with fine-grained locks/MUTEXES. That is something that has NOT changed and so no he was not right on. If anything, FreeBSD 7 has shown that he was wrong on the basic architecture behind SMPng. Now he himself has to prove that he was at least partly right by making sure DragonflyBSD runs just-as-well by showing that it scales properly, has decent performance (performance winning not needed, enough performance is good enough). So far that has not materialized as to many things are still missing. It sure is an interesting experiment that he has embarked upon but it nowhere near being deployed on production servers. The level entry servers today has 4 cores, most standard servers ship with 8 cores and for that you need an SMP implementation that scales, FreeBSD 7 has clearly shown that it can scale and perform.
There was no "mess" to correct. It was just a matter on refining, and optimizing and that is still ongoing so expect new advancements in performance for FreeBSD.
Yes, they do. Just look at jemalloc, tmpfs as recent examples of BSD-licensed code that has "migrated" outside their *BSD.
Why?
Read the interview with Matt from 2002 and you will see
http://kerneltrap.org/node/8
Maybe just politics ...






