OSU Open Source Lab wanted to take the concept of benchmarking a little bit further with the Beaver Challenge 2004. In this competition they will be allowing a community of experts in each OS to tweak their configurations to ensure maximum performance. And these aren’t wimpy machines they are testing on, they are all Dell Poweredge 2650 with dual 2.8ghz Xeon, 2GB of RAM and 5 36GB U320 disks @ 10,000 RPM configured as RAID0 stripe. The following systems will be tested in the competition: Debian GNU/Linux, Fedora Linux, FreeBSD, Gentoo Linux, NetBSD, OpenBSD, Red Hat Linux, Slackware Linux, SuSE GNU/Linux. Rules and additional information can be found here, and remember this is a community event- all are welcomed on the forums and on IRC.
I am assuming that the winners will be asked to disclose what they actually did. If so, then this is good for open source as it will allow the best tweakers to push performance to the limits for everyone.
I like it
Gentoo.
It’ll smoke the competition IMO. ๐
Like the Oscars but with the following awards:
* Best desktop
* Best package management system
* Best hardware detection
* Best new (cutting/bleedig edge) hardware support
* Shortest boot time
* Best installer
* Best security response
and so on…
I think this benchmark has been sponsored by Dell and Intel.
Mario,
The Beaver Challenge 2004 is not sposnored by Dell or Intel. The Challenge was the solely idea of some people here at the Open Source Lab and is receiving no financial sponsorship from any outside firms.
Jason McKerr
Operaions Manager
The Open Source Lab
Oregon State University
In response to the earlier comment: “I am assuming that the winners will be asked to disclose what they actually did.”
Yes.
The teams will be required to record their tweaks. Once the results for the benchmarks are in, the tweaks and results will be published on http://osuosl.org/benchmarks/bc.
This will be very interesting. I hope it is very comprehensive, and gets a good representation from all communities.
I think a few people might be in for a shock, but as usual they’ll find some way to explain it away.
Looking forward to seeing the results.
Does the award go to the overall fastest final benchmark, or the overall greatest “delta” (improvement) of the tweaks over the baseline install? It seems this would be an important distinction. The first rule favors pre-optimized distros such as Gentoo. You aren’t going to squeeze much out of a baseline P4 GRP install. But the second rule favors default 386 distros. I believe Debian and RH still have default 386 binaries, so recompiling from source might make for a sizeable delta in performance, particularly on a P4, which I seem to recall does not optimize 386 code on the fly.
I was surprised to see to RH-family offerings on the list (Fedora/RH), but no Mandrakes.
Re: Gentoo, I meant to say you aren’t going to squeeze much MORE out of a P4 GRP install with tweaks. I’ve found the P4 optimized Gentoo to perform quite well.
Only Linux and BSDs? I know most of other open source OSes are not as mature as those two common OSes but… I wish to see the benchmarks of other OSes as well
Is stability going to be taken into consideration or is this just a flat out speed contest? I am curious to find out what the trade off is for speed/stability on these systems. Speed is nice, but if it means the system can’t stay up for more than 5 min, then I’m not going to be to interested in using their “tweaks” on any machine that I run.
> does not optimize 386 code on the fly
No processor (that I know) optimizes your code.
If you mean the internal reordering of instructions, I don’t think there should be any difference between 386 and 686 code, after all, it’s still the same architecture.
> I wish to see the benchmarks of other OSes as well
What OSs do you mean in special?
Hurd comes to mind, but benchmarking Syllable or other hobby OSs would be interesting.
If you want to volunteer, [email protected] .
Speed alone means nothing in a production environment. If you need more out of your server, it is just a matter of getting better/more hardware. Stability and security are far more important. The test should also take these into consideration. What use is it to have someone win this award by adding unstable features and getting rid of security checks.
> does not optimize 386 code on the fly
<<
No processor (that I know) optimizes your code.
If you mean the internal reordering of instructions, I don’t think there should be any difference between 386 and 686 code, after all, it’s still the same architecture.
>>
I remember reading somewhere, perhaps on Usenet (I know, I know), that P4 pipeline stalls are more frequent with 386 or 486 code due to their format. This could easily be tested by comparing speed gains of 386 vs. 686 optimization on Athlon vs. P4. If P4 indeed has this problem, you’d expect a greater improvement in the P4 due to its greater difficulty with the 386 code. I think? Lemme dig out that old Athlon XP laptop…
Last I heard the SMP code was very immature and the project of a Technical school or something. Does NetBSD have a mature SMP implemention?
OpenBSD is likely at a disadvantage then. It hasn’t faired well in prior widely publicized benchmarks anyway. A secure, stable and fast OS is a utopia when you think about it. I don’t fault OpenBSD for devoting their energies to the security aspect. The right tool for the right job…
Anyway, this is a great idea. With the usual caveats in mind I look forward to the results!
Cheers
Its an interesting idea for fun but I fail to see the point of testing largely the same code base. The only thing of value i could take from this would be to see how the BSD’s stack up to Linux. Otherwise its a soap box derby.
In Linux case I agree – it is almost the same codebase – same GCC, same kernel, etc.
But in *BSD case it is not true – kernels a pretty different.
And yes, it will be interesting to see how *BSD’s will outperform Linux.
Having 2 GB favors BSD. That’s an amount
that blocks the small-memory optimizations
Linux has, while not taking advantage of
the huge-memory features.
Try two boxes, one with 768 MB and the
other with 32 GB or more.
Some Linux distros contain custom patches and tweaks. Redhat kernels were often significantly modified with either development backports or various tweaks.
Despite being the same codebase its not a guarantee that they will perform equally. I would be surprised if there were major differences but you never know.
I think a lot will depend on who each team has in it, unless they share tips.
I’m not a pro at Darwin myself but I would imagine there is a lot of interest in Apple’s BSD OS.
This was just what I proposed a few days ago, except I wanted to see it in a massive public forum such as Comdex.
including Microsoft and Sun.
Darwin x86’s hardware support is quite limited. If it were possible, then I’d agree – throwing Darwin into the mix would be very interesting indeed.
Surely matters between (GNU/)Linux distributions. Not only because of kernel, distributed kernel version and kernel patches. Also because of the (default?) filesystem used, (default?) compilation optmalisations, compiler used, and much more.
A while ago there was a Mandrake Linux vs. Debian GNU/Linux vs. Gentoo Linux benchmark. The one which in most cases performed the best, was NOT the optimized Gentoo Linux. Such analysis isn’t interesting? I think it is.
PS: i’d like to know how Debian GNU/KFreeBSD performs versus FreeBSD, Debian GNU/KNetBSD performs versus NetBSD, and both Debian GNU/K*BSD’s perform versus Debian GNU/Linux. Even though both Debian GNU/K*BSD are not production ready, neither is the GNU(/Hurd).
But then you could say that you are favouring Linux by not making the hardware focus on where *BSD shines. The truth is that each OS is best suited for a niche (eg OpenBSD and security)and is geared towards/favours certain hardware. For example I’m sure NetBSD could get increase their performance by adding hacks to their code but thats not what they are about. Any hardware choice is a bias. Hell, choosing x86 and not PPC or not MIPS or not Motorola 68K is a bias. After all, when viewing any benchmark/review that bananas make poor hammers.
Really Mandrake ought to be included.
What would be REALLY interesting is if Microsoft would send a representative to tune Windows Server 2003 and compete in the tests. If they are so confident in their ‘facts’ ( http://www.microsoft.com/mscorp/facts/default.asp ) they should win handily…
josh,
Take a look at the name of this thread
Benchmark Competition for Open Source Operating Systems
none of Microsoft Products is Open Source.
-sathish
I think this is a great idea and I look forward to seeing the results, although a couple of things bother me…
Why isn’t Mandrake included and why does Red Hat get to include both Fedora Linux and Red Hat Linux?
My choices would have been for the Linux distrubutions:
-Debian
-Gentoo
-Slackware
-SuSE
-Mandrake
-Fedora
For the BSD’s:
-FreeBSD
-OpenBSD
-NetBSD
And finally for the last entry(although technically not open source) I think they should all be pitted against those SCO bastards with their weak SCO Unix so they can see just how crappy there operating system is compared to quality OS’s!!!
I’ve seen the mandrake vs. gentoo vs. debian comparison as well, and the results were significantly and surpisingly in favor of Mandrake, IIRC. I also saw a review of Linux vs. Solaris vs. BSD on the same hardware for (again, IIRC) file and web serving, and the Linux distro came out on top, by a sizable margin over Solaris which trumped the BSD distro. I believe this was in the SysAdmin magazine perhaps a year ago.
Up until the review of the linux packages, I was leaning towards picking up Gentoo. But after reading the results, simply picking up Mandrake seemed like the smartest thing, especially since I simply wanted to get to work and not have to “emerge” an entire system, if that is the correct term.
If these previous results hold true (a big if) once might expect a pre-packaged distro put together by experts and tweaked a bit will trump Gentoo and BSD. We’ll see!
I know at least one of those os’s will use only one processor, isn’t this like a 100m dash where the non smp os’s have their legs tied together. For a real competition, use single processor hardware, or make a seperate test with just one processor. Otherwise (for some) this test has already been lost.
Mandrake most likely can enter if they like … I would think that they would only need to contact and ask the OSU Open Source Lab.
There are only 4 machines, so only 4 OS’s can be done at a time…so I would think that they are going to let a team have the machine for a period of time and then after they are finished, another team gets the machine.
it even says at the site ( http://osuosl.org/benchmarks/bc/methodology/ ) that:
This list is not final and if people want to ante in to try this with their favorite distro, let us know at bc2004 at osuosl dot org or in #beaverchallenge on the Freenode.net IRC network.
I know at least one of those os’s will use only one processor, isn’t this like a 100m dash where the non smp os’s have their legs tied together. For a real competition, use single processor hardware, or make a seperate test with just one processor. Otherwise (for some) this test has already been lost.
Not at all … it is a benchmark of performance, not security and ease of use.
I’m sure you would say the same for a speed benchmark between a corvette and a VW Beatle …
Well, you have eight cylinders, so to make it fair, unplug 4 of your spark plug wires…but that wouldn’t take into account the fact that the corvette is heavier (just as an example).
It will be a great test, with exactly the same hardware. And when looking at the comparision, one will know that OS umptyscratch isn’t SMP enabled.
Maybe they can run a non smp compiled kernel test as well….
DragonFly BSD should really be included too!
The SCSI and Ethernet are also a huge source of trouble.
One “fair” method of benchmarking, if there can be such
a thing, is to let the participants buy the hardware.
Give a timeframe for purchace, money limits, and a
selection of places where the purchase may be made.
Do this for a few different money limits:
$500
$3000
$18000
This takes care of architecture choice too. Most would
choose Intel x86, but Darwin would choose PowerPC and
Linux might choose AMD x86-64.
Unfortunately, this contest requires serious funding.
The list of current benchmarks have been added to the methodology page at http://osuosl.org/benchmarks/bc/methodology/ (see the section on Benchmarks).
Please let us know if you don’t like the choices or if more benchmarks should be added.
Thanks!
-Kaite Rupert
The Open Soure Lab
Oregon State University
Well, DragonFly isn’t ready for this yet. If this benchmark still can be there in the next 4 – 6 months then the DragonFly should be ready by that time.
SCO’s Juergen Kienhoefer tells us that by mapping clone processes directly onto UnixWare’s native threads, huge performance gains can be realised. “Basically thread creation is about a thousand times faster than on native Linux,” he said. The performance boost could particularly benefit applications such as Domino, according to Kienhoefer. Other gains could be made by using UnixWare libraries, and he reckons that SETI at home shows a 4x improvement over native Linux, as it uses UnixWare’s own maths libraries.
http://www.theregister.co.uk/content/archive/12733.html
SCSI is a huge problem?? What? 2GB is a huge problem?? HUH? That hardware is pretty much the standard for *good* servers. Few servers have >= 32GB of RAM or <= 768 MB. Few servers use IDE as well, although maybe some are using SATA these days.
If a given operating system can’t function well with this hardware than we’d like to know. That’s the point of this benchmark! Equal ground here, and common hardware. Besides, if Linux is really not optimised for the *very* common RAM amount of 1-2GB, then I’d say that’s a design flaw!
I guess my 2GB server is just… “uncommon”
๐
How much is SCO paying you?
SCSI is a huge problem because there isn’t a
single standard SCSI controller. One OS has a
great driver for FooMatic SCSI, and the other
OS has a great driver for Bar3000 SCSI. Well,
which SCSI card should we benchmark with?
The same goes for Ethernet. Is your Tigon3
driver good, or would you prefer Intel parts?
Linux not being optimized for 1 to 2 GB is one
way to put it, a very negative way indeed.
I can be unkind too; FreeBSD doesn’t perform
well on small systems and FreeBSD crashes on
really large systems. So there! ๐ It’s a good
thing that Linux has special optimizations for
low-memory systems; FreeBSD is lacking this.
SCSI is a huge problem because there isn’t a
single standard SCSI controller. One OS has a
great driver for FooMatic SCSI, and the other
OS has a great driver for Bar3000 SCSI. Well,
which SCSI card should we benchmark with?
If you are worried about these kind of performance issues, then you are interested in a benchmark for the wrong reasons. Constant time improvements due to driver code really are irrelvant. What *is* interesting is how the different OS designs and algorithms work. Is one algorithm constant and another O(n)? How does the OS handle load? SMP?
Besides, since you are worried about Linux performance vs BSD, consider this. The BSD license allows BSD drivers to be integrated into Linux. So, if BSD really has drastically better SCSI drivers it’s only the Linux developers fault that Linux does not have them. They can simply port those great SCSI drivers. And if they *can’t* port those drivers due to kernel architecture, then the fact that the Linux drivers are slow may reflect the kernel architecture and therefore is valid critism of Linux.
And as for over 32GB of memory. Again, x86, what we are benchmarking here, has no real support for over 4GB. Sure, there are hacks, but those are not at all common. And as for small amounts of memory, it’s not relavant for most normal servers, what we care about here. This isn’t an “embedded OS” benchmark, nor it is a “64-bit cpu based” benchmark.
Besides, why are you so concerned with Linux’s performance? The benchmark hasn’t even been done yet! Are you *that* worried that Linux will somehow be far behind that you need to preemptivly attack the benchmark?
Finally, your solution of buying hardware is just silly if I may say so. The benchmark would then become “who can choose fast hardware for a low price” and “which hardware is faster “. Software benchmarks are meaningless unless they are on the same hardware.
benchmark of OSes including Syllable, SkyOS (5.0 if possible) and maybe plan9 (i dunno if it supports SMP tho).
If it isn’t open source OS benchmark Windows, Solaris, Unixware, BeOS and several others would be included.
For benchmark on a single proccessor i wish MenuetOS to be included as well. To see How much an OS developed fully in assembly language outperforms OSes developed in C or C++.
Just my thoughts. If you don’t like my ideas, please just ignore this comment.
SCSI is a huge problem because there isn’t a
single standard SCSI controller. One OS has a
great driver for FooMatic SCSI, and the other
OS has a great driver for Bar3000 SCSI. Well,
which SCSI card should we benchmark with?
The same goes for Ethernet. Is your Tigon3
driver good, or would you prefer Intel parts?
Well, why stop there? How about chipset drivers etc.
You pick a common platform and you benchmark on it.
And in response to your bsd trolls, if they have a better BSD driver go agead and integrate it
SCO’s Juergen Kienhoefer tells us that by mapping clone processes directly onto UnixWare’s native threads, huge performance gains can be realised. “Basically thread creation is about a thousand times faster than on native Linux,” he said.
Thats just amazing, thread creation on my Linux PIII-1000 here takes 31920 nanoseconds (its an SMP kernel, UP should be faster). That is for full POSIX thread creation, not just a clone, mind you.
So this Juergen chap reckons he’s got it down to about 32 nanoseconds. Hmm… 32 clock cycles, eh? Well thats very er, “impressive”. This clown^Wnice fellow is a wonderful diplomat for his company. It brings a tear to my eye to see him doggedly standing up for the honor and values of his company against all facts and reason. Juergen… Did you ever know that you’re my hero…
And Bucky mate, while you’re waiting for their next release of UnixWare, don’t forget to pay this nice company $699 for your continued use of Linux.
Though the internal 10K RPM drives are U320, the Dell 2650 is probably using the on-board PERC 3/Di hardware RAID card, which is just U160. That doesn’t matter for this test, since all are on the same platform, but it might be useful if you are comparing to other performance numbers…
>> The same goes for Ethernet. Is your Tigon3
>> driver good, or would you prefer Intel parts?
>
> Well, why stop there? How about chipset drivers etc.
>
> You pick a common platform and you benchmark on it.
I don’t see why. Do you normally buy some random
piece of hardware before you choose the OS to use?
I hope not. It is normal to choose the budget first,
then pick an OS-hardware combination that work well
together. (let Windows compete too, with the retail
OS cost coming out of their budget!)
If you pick a common platform for benchmarking,
then you’re partly examining hardware support.
That’s a fine thing to test, but it shouldn’t be
mixed up with general performance. Testing for
hardware support is best done differently, with
a variety of potentially troublesome machines:
eMachines desktop
HP 8-way Xeon server
Toshiba laptop
Sony laptop
Dell workstation
IBM server
…
It’s not just that I worry Linux may be at a
disadvantage. It could go the other way too.
If so, the BSD fans may rightly complain that
they would have done better on other hardware.
Victory would not be as solid.