Andrew Morton recently posted some interesting benchmarks comparing the current 2.4 IO scheduler, a “hacked” version of the deadline IO scheduler in 2.5.61, the CFQ scheduler, and the anticipatory scheduler. Offering a succinct “executive summary” of his results, Andrew said, “the anticipatory scheduler is wiping the others off the map, and 2.4 is a disaster.” Indeed, in many of the tests the other IO schedulers were measured in minutes, whereas the anticipatory IO scheduler was measured in mere seconds. Read the report at KernelTrap by Jeremy Andrews.
If there’s one place Linux truly shines, it’s its schedulers, and by the looks of this patch, things are only looking up in 2.5.
I did find this interesting, however:
Andrea Arcangeli responded to these tests pointing out that they do not in any way highlight the benefits of the CFQ scheduler, which instead is designed to maintain minimal worst case latency on each and every IO read and write. Andrea explains, “CFQ is made for multimedia desktop usage only, you want to be sure mplayer or xmms will never skip frames, not for parallel cp reading floods of data at max speed like a database with zillon of threads.” This lead to an interesting discussion in which Andrew suggested that such programs employ a broken design which should be fixed directly, rather than working around them in the IO scheduler.
I have noticed that Linux keeps getting tuned primarily for desktop use (e.g. preempt patches), and there isn’t quite as much attention on raw throughput. The last time I ran DBench on an SMP system with FreeBSD 5.0/UFS2 vs. Linux 2.4/XFS, FreeBSD was winning by a considerable margin, desipite the relatively incomplete state of SMPng.
This seems to be Andrea’s concern… and I’m curious if it’s warranted. Some time I’ll have to download a 2.5 kernel and run DBench on it and compare the numbers to FreeBSD.
If IBM and SGI really want to put Linux on their big iron, I think I/O throughput should be a major concern. I’m hoping Andrea will see to it that this is carried out properly.
Well if you do get around to running DBench on 2.5 i’d defintely like to see the results. Any chance of also factoring FreeBSD 4.7 (or is 4.8 out?) so that we get a broad enough analysis of the differences
as for the schedulers well it might be an idea to include both and have it that one or the other could be compiled in by the distro. That way say on Redhat AS they could compile in the Anticipatory Scheduler, while Lindows or some other distro. which is more aimed at the desktop could use CFQ.
The last time I ran DBench on an SMP system with FreeBSD 5.0/UFS2 vs. Linux 2.4/XFS, FreeBSD was winning by a considerable margin, desipite the relatively incomplete state of SMPng.
When did you test this bench? With 5.0-RELEASE or when -CURRENT? I am just curious, because I notice -CURRENT is getting a lot better lately.
If you run 5.0-CURRENT, there’s a new scheduler you can use. -RELEASE does not have it afaik. You must add the following to your kernel config.
options SCHED_ULE
It’s still in development stages, like many other things in 5.0.. But it might be a worth benchmarking..
The new scheduler “demonstrates processor affinity, HyperThreading and KSE awareness”
Cheers,
-JD-
Sorry for the lack of details. I dug my results out of an old post. Here they are:
DBench [ http://samba.org/ftp/tridge/dbench/ ] numbers for Linux 2.4.20 (XFS) versus FreeBSD 5.0-RELEASE (UFS2), on a dual 1.53GHz Athlon MP system with 512MB RAM, IBM UltraStar 18ZX. Tests were conducted in single user mode after a fresh boot.
Linux:
Throughput 17.5915 MB/sec (NB=21.9893 MB/sec 175.915 MBit/sec) 64 procs
FreeBSD:
Throughput 31.9033 MB/sec (NB=39.8792 MB/sec 319.033 MBit/sec) 64 procs
Obviously not a very extensive test… also DBench has drawn criticism for its validity as a real world test of I/O performance, see http://mail.nl.linux.org/linux-mm/2001-07/msg00126.html
“DBench [ http://samba.org/ftp/tridge /dbench/ ] numbers for Linux 2.4.20 (XFS) versus FreeBSD 5.0-RELEASE (UFS2), on a dual 1.53GHz Athlon MP system with 512MB RAM, IBM UltraStar 18ZX. Tests were conducted in single user mode after a fresh boot.”
I do not doubt that FreeBSD does indeed have a lead here, but Linux is set up with a journaled file system, whereas FreeBSD is not (correct me if I’m wrong, but I think UFS2 is not journaled).
Having this done with Linux and ext2 would perhaps be more interesting.
Sorry for the lack of details. I dug my results out of an old post. Here they are:
Would that be the magical Athlon MP that takes ten times as long as a comparable system to launch javac? Pardon me if I disregard your results… @_@
Needless to say, I’ve got a more comprehensive set of results which tells a completely different story, especially on multi-spindle machines. I’ll post them when I’m in the office next week.
I have noticed that Linux keeps getting tuned primarily for desktop use (e.g. preempt patches)
Uh, no.
Lots of work has been done on the desktop end of the spectrum, but a lot of these things (eg. preempt) are compile time options and plenty of work has been done on the high end, starting with a massive blockio rewrite in early 2.5.x.
Andrea is doing the opposite to what you think – the CFQ scheduler is something different to the anticipatory scheduler. It [CFQ] is designed to minimise latency on disk IO operations and is useful for multimedia applications. The anticipatory scheduler is a shot a maximizing throughput and isn’t tuned for interactive response.
Whups.
I do not doubt that FreeBSD does indeed have a lead here, but Linux is set up with a journaled file system, whereas FreeBSD is not (correct me if I’m wrong, but I think UFS2 is not journaled).
Uhm… UFS2 uses SoftUpdates (if enabled) instead of meta-data logging so the comparison is indeed valid since the functionality is almost the same. See http://www.usenix.org/publications/library/proceedings/usenix2000/g… for discussion regarding the merits of the different techniques.
Please try back later. This server is currently slashdotted. (If you’re looking for the anticipatory scheduler benchmarking story linked from Slashdot, a static copy can be viewed here.
http://www.kerneltrap.org/node-592.html
not to dish FreeBSD or anything which even though i’ve never used i regard as being a top-class OS (i like to keep well read) but is it all together a fair comparison comparing a brand new release (FBSD 5) against what is basically a kernel that hasn’t had any major changes in it since the rmap VM was ripped out.
Now people i’m not saying that there hasn’t being a fair bit of work done on 2.4 since then but no major changes to the guts have happen (that i know of).
as phil says being alot work done on the high end as well, block I/O, VM, NUMA. Great thing about Bitkeeper is that you can see through the web interface all the patches that have being applied
here’s the link for the last 48 hours
http://linus.bkbits.net:8080/linux-2.5/ChangeSet@-2d?nav=index.html
how about a sysctl to switch scheduler’s on a running linux system? would be nice to have 3 or 4 specialised schedulers and let the user chose whether they’re a high bandwidth server today or a low-latency multimedia box tomorrow.
i prefer this option to one jack-of-all-trades scheduler.
I ran ubench on Solaris, Linux, and FreeBSD on the same system, FreeBSD performed the worst but had the most gain from SMP and Solaris did the best but had the least to gain from SMP…
I’d like to add that Solaris was still the best, Linux still in the middle, and FreeBSD still the worst when only using one cpu.
(This was with a 2.4 kernel, and freebsd 4.2 abouts and solaris 8)
So I don’t know how far I’d trust these benchmarks…
Linux:
Throughput 17.5915 MB/sec (NB=21.9893 MB/sec 175.915 MBit/sec) 64 procs
FreeBSD:
Throughput 31.9033 MB/sec (NB=39.8792 MB/sec 319.033 MBit/sec) 64 procs
This is expected and known. Linux 2.4’s IO scheduler is quite dumb, and isn’t anything like optimal for either throughput or latency. Those of you that like beating up Linux for its poor NFS performance will already know this, the primary cause is the same – poor IO throughput.
Hence why the IO scheduler has already been replaced once in Linux 2.5, and looks like it will again.
I also think it’s a slightly unfair test given that Linux 2.4.20 is a solid, stable release and FreeBSD 5.0 most certainly is not at this time. It may say -RELEASE on the box, but there’s as much heavy and invasive development work going on inside it as there is in Linux 2.5. The -RELEASE tag is just a checkpoint to say ‘we’ve added enough new features, now let’s finish them off and fix the bugs until it can become -STABLE.’ Which, coincidentally, is about where Linux 2.5 development is at, given that it’s meant to be in feature-freeze. No-one in their right mind is using FreeBSD 5.0 on production servers, but there are plenty using Linux 2.4.20.
If you want to compare apples to apples, and oranges to oranges, compare Linux 2.4.20 to FreeBSD -STABLE or compare Linux 2.5.61 to FreeBSD 5.0. FreeBSD -STABLE is likely to comfortably win at dbench throughput against Linux 2.4.20 anyway, if that’s the point you’re trying to make. I wouldn’t put bets on FreeBSD 5.0 vs Linux 2.5 however.
I am running a debian for one year ( now a testing one ). I use xmms to listen to my mp3, and each time I am doing something heavy for the system ( apt-get update -> update of the pacakge listing, launching mozilla, etc… ), xmms is skipping some frames.
Can this kind of issues comes from the scheduler ( I use a 2.4.20 kernel ) ? Can I use a 2.5.* kernel without too much problems in a desktop/multimdia use ?
Everyone seems to have a bone to pick with the benchmark I posted.
Well let me say this: You’re all completely right. It’s a pointless, meaningless benchmark.
It’d be much more interesting if I benchmarked a Linux 2.5 kernel against 5.0-CURRENT.
Some people were just curious about a comment I had made in a previous post, and I decided to give them details.
That is indeed the scheduler.
You can cure the symptoms though. I run KDE and renice the xmms and the artsd processes to -20. I have never experienced it missing a beat after that.
If you don’t run KDE then the sound daemon will be something else than artsd. You do need to be be root when you increase a process priority.
You can of course do this to all applications that you don’t want to lag. I don’t know if debian does it, but Red Hat also runs X at a fairly high priority.
(This was with a 2.4 kernel, and freebsd 4.2 abouts and solaris 8)
FreeBSD 4.7+ runs a lot and lot better than 4.2, but about SMP stuff. Of course, that’s what FreeBSD 4.x is lacking on it, but is working on 5.x series.
I wouldn’t put much stock in those dbench benchmarks as a measure of disk I/O throughput. The kernel developers usually treat it a load-testing tool more than anything else, which should tell you a lot about how much faith they put in it as an I/O-benchmarking tool.
For example, when Linus ripped out the VM in 2.4.10-pre and hammered in Andrea Arcangeli’s in its place, I remember Andrew Morton noting a twofold improvement in dbench throughput with the new VM. Nobody believed that disk I/O had magically become twice as effusive. Which goes to show that dbench is quite a quirky beast, liable to be disproportionately affected by seemingly unrelated stuff.
Can this kind of issues comes from the scheduler ( I use a 2.4.20 kernel ) ? Can I use a 2.5.* kernel without too much problems in a desktop/multimdia use ?
I’d suggest you apply Robert Love’s kernel preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml/preempt-kernel/v2…
It makes a lot of difference wrt interactive usage.
Also make sure your disks are utilizing dma transfer mode.