OSDL has been running performance tests with hackbench to measure the improvement of the scheduler, compared to Linux 2.4.18. A write-up of these results with graphical plots is posted here.
OSDL has been running performance tests with hackbench to measure the improvement of the scheduler, compared to Linux 2.4.18. A write-up of these results with graphical plots is posted here.
The result is clear… constant improvements. I`m so happy!
I have no idea what those graphs mean, but 2.6 looks impressive =p
Who says Linux can’t scale? My god, can you imagine 3.0?
😀
I think the best way to test this is to try it yourself. I’m running test-10 at the mo’, it’s looking fine. KUTGW!
This is probably the best thing usability-wise to come to Linux so far. There’s nothing worse than skipping audio or mouse cursor.
“There’s nothing worse than skipping audio or mouse cursor.”
Sure there is, BOTH could be skipping
I just hope it keeps improving. I’m really not in the mood of upgrading my hardware to run Longhorn or whatever. I need to use this box for another 4 years. Oh, it’s an athlon. Don’t get excited.
what ever they did to the proc scheduler has made for some impresive performance, but I thought it was an O(1) scheduler, those graphs make it look more like a O(log n), not that that is shaby or anything.
Are those results using glibc with NPTL or Linuxthreads (On 2.6. NPTL isn’t available for 2.4 AFAIK, no TLS)? Or does the test not use threading?
I only ask because NPTL is supposed to be MUCH better than the Linuxthreads solution especially when dealing with a mutliprocessor system. I guess I have difficulty in accepting the huge improvements are solely due to the scheduler.
Excellent question – I’m sure that the threadin libraries will
make a significant difference. I must remember to dig up the pdf I downloads that shows the relative performance of LinuxThreads, NGPT and NPTL on single and multi-CPU systems.
So, I have to ask : How does this compare to other platforms?
I’m most interested in data on Win/2K/XP/2K3 and FreeBSD and Solaris (Sparc and x86)
Yes it be nice to see it compaired with a BSD system !
Some other test here on osnews show nat especialy FreeBSD performs good.
Nice to see some hardcore evidance showing how Linux is improving. Now if they were only to ditch X, agree on an audio standard and file structure, then I might look back at it again…
…show nat especialy…
show that especially…
The vanilla 2.6-test6 is not working as well as 2.4.23 with lowlatency+preempt patches for audio. I’m getting the occasional xrun with 2.6 if I play back a load of tracks and mess with plugins, or drag windows around. Has anyone got better results with a more recent 2.6 kernel?
You also still have to be root to get sched_fifo, softRR does not appear to have made it into 2.6.
@Bustarhymes
“So, I have to ask : How does this compare to other platforms?”
Have a look at this. It compares mmap, fork etc, so is not the same benchmarks, but does have results from Linux 2.4,2.6 and a few BSDS.
http://bulk.fefe.de/scalability/
Sure there is, BOTH could be skipping
My OR was inclusive, not exculsive
I agree with BustaRhymes, how does this compare to other platforms?
Without this comparison, these results are meaningless.
Truly they are, they should be compared with FreeBSD unix and the commercial unixes.
No, it’s definatly O(1). If you look at the graphs, 2.6 is a straight line, and as you get out towards infinity, the line is basically flat. That’s O(1). If it was O(log n) it would start off steep and then level off, but it’s always the same slope. On the other hand, 2.4 looks almost like O(n^2) to me. Either way, we have made MAJOR improvements. Kudos to the kernel team.
Actually it should be compared with SCO unix, because as we all know they are the premier UNIX on Intel :B
I don’t think these results are meaningless. They show that Linux has improved over it’s self, and could now run loads that would have killed it before. Considering that these tests were to show that the kernel has improved, and they show that, I think that these results hold plenty of meaning.
It WOULD be very interesting to see 2.6 and 2.4 compared to other *nixes (I’d like to see how they scale, especially Windows), but that was not something that this test was trying to show. For it’s purpose it’s fine.
I have compiled several of the test kernels / -mm patchsets. The latency is different with each test kernel. For me, test6 was a step back from the previous test kernels. The Morton (-mm) patchset (http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/) helps test6. test9 was an improvement over versions prior to test6.
test9 and test11, for my system, have a lower latency than any of the 2.4.x kernels + (colivas OR lowlat and preemt) patches. Unlike the patched 2.4.x kernels, the test9 and up even allowed for the virtual terminals to be switched without skips in audio!
A question for you: I found the 2.4.18 and .19 + lowlat and preempt to have lower latency than the 2.4.20, .21, and .22 + lowlat and preempt kernels. Was 2.4.23 + patches an improvement over .20 – .22?
It’s always great to see this kind of progress in any project.
Does Linux support different priority levels for processes and threads?
You have to note that these curves are not the scheduler latency. Instead, its runtime per process vs number of processes running. As more processes share the CPU, the runtimes *should* be getting longer. Ideally, doubling the number of processes will double the runtime for each process, because each process gets half the CPU. So with a perfect O(1) scheduler, your curve will be perfectly linear. 2.4 shows some inefficiencies in the scheduler, because there is a lot of curvature in its graphs. 2.6, on the other hand, is very linear.
Does Linux support different priority levels for processes and threads?
Yep, AFAIK. It wouldn’t be very good if it didn’t…
To me the amazing thing about these graphs is how straight the lines are in 2.5. Probably a lot of it is due to the O(1) scheduler… (Maybe they explained that in the text…)
Does Linux support different priority levels for processes and threads?
In Linux processes and threads are the same thing except that threads share memory and processes don’t.
You can set the priority of a process with the nice() command manually, but ideally your shouldn’t have to.
The linux kernel has code to seperate processes that need to react quickly (low latency) from processes that take a lot of CPU time. This code had to be completely reworked when the O(1) scheduler was included. Also it had to be reworked because there was a huge emphasis on low latency in the 2.5 kernel. People want the user interface to still be responsive while compiling a kernel in the background etc.
I’m running test9, but I don’t notice that much of a difference over 2.4.22 with the pre-empt patches. I can still get audio glitches while doing heavy 3d work in Houdini, but I’m thinking that’s more to do with nvidias driver?
I am on a dual processor machine, which I would’ve thought would help allieviate such problems.
I don’t know why you would like X ditched. X is actually a killer feature of the Linux desktop. It’s very nice to be anble to control remote machine as if you were sitting in front of them, or use it in thin client situations. Not to mention that it gives you similar facilities as windows XP rapid desktop switching, exept that in Linux this feature works over a network.
Linux value for the corporate desktop would greatly diminish if X was ditched. I can’t see why we should abandon a good and widely used standard just because XFree86 isn’t as well implemented as one would wish.
As for sound, Linux 2.6 brings ALSA as the standard on the low level side. But I guess you were thinking of sound servers like aRTS or esound that can mix sound from different applications and work well over a network. If this was the case I truely agree.
X is extremely slow. It should be rewritten to allow local video buffers to be accessed locally and remote ones remotely. The performance should get higher priority over network transparency.
When people speak of low latency, what part(s) of the OS are they referring to? Exactly what is effected and how does this make for an overall better system?
Also, does Linux or any other OS (BSD, Windows, Solaris) support the idea of prioritized I/O (memory, Disk, Network, etc)? I would think that there would be a benefit to allow users/administrators to set different priority levels for different kinds of operations. For example, giving one process a higher priority over disk I/O then another. Does this type of thing exist? Is it necessary?
If it was O(1), I’d expect it to be constant no matter how many processes were on the system, but instead as the number of processes rises, so does the time it takes to schedule them. Yes it does look like a straight line, but it’s still increasing, which sounds more like O(n) to me. But then I’m probably just not understanding the benchmarks or something…
Err, folks. Stop confusing algorithmic scalability (which Fefe measured), and CPU scalability which is what is being measured in this benchmark.
I’m sure the new scheduler has played a significant role is the performance improvements, but I need to ask if there are other areas of the OS that may also contribute to the improvements in the test results. Since the test being run is a network app, is it not possible that any new networking code may also contribute to the completion times of each test cycle?
There doesn’t seem to be a significant difference between the 4 and 8-way system in the Observation #2 test. I’m curious at to know at which point the difference would be negligible and where would you see diminished returns (I know that this is also a function of the hardware).
Also, I would like to know what the exact cause of the failures were in the Observation #3 tests. The 8-way machine had 8GB of RAM and plenty of disk space for virtual memory, so why the memory error?
X is a protocol. It is neither fast nor slow. XFree86 is an implementation of that protocol. XFree86 has used local buffers for local video display for a long time. It also does not require that you use TCP/IP to connect, normally you use either shared memory or a unix domain socket. XFree86 feels slow due to widget sets doing things in a less than optimal way. People are currently working on different ways to fix that.
>>>>>
X is extremely slow.
<<<<<
That’s interesting. Do you have any hard data?
How exactly have you meassured this? Could it possibly be that it isn’t X that’s slow but the desktop environment you are using?
Does performance increase if you use, say fvwm2?
>>>>>
It should be rewritten to allow local video buffers to be accessed locally and remote ones remotely.
<<<<<
Isn’t that what DRM (Direct Rendering Manager) and DRI (XFree86 Direct Rendering Infrastructure) is for?
To quote from http://dri.sourceforge.net/cgi-bin/moin.cgi/
>> The Direct Rendering Infrastructure, also known as the DRI, is a framework for allowing direct access to graphics hardware in a safe and efficient manner. <<
When people speak of low latency, what part(s) of the OS are they referring to? Exactly what is effected and how does this make for an overall better system?
Latency can be dealt with in the kernel and userland. Kernel ~ allowing interruption of system I/O (preempting)
Userland ~ making sure important functions (e.g. writing recorded audio to disk) are done before less important functions (e.g. drawing the scrollbar)
Briefly, latency affects audio and GUI responsiveness because processes share the CPU in a more responsive way. The downside is that the CPU and I/O intensive tasks take longer because the CPU is shared more.
The audio community has been pushing low latency for some time. This link is more comprehensive:
http://www.linuxdj.com/audio/lad/resources.php3
Also, does Linux or any other OS (BSD, Windows, Solaris) support the idea of prioritized I/O (memory, Disk, Network, etc)? I would think that there would be a benefit to allow users/administrators to set different priority levels for different kinds of operations. For example, giving one process a higher priority over disk I/O then another. Does this type of thing exist? Is it necessary?
Briefly (because “I do not know”), the XFS filesystem (Linux and SGI/Irix) has a real-time feature allowing for certain files (not processes?–the SGI/Irix version allows for processes) to have a guaranteed I/O. Linux XFS has Real-time subvolumes (for files)(http://oss.sgi.com/projects/xfs/todos.html) whereas Irix XFS has Guaranteed Rate I/O (GRIO) (http://www.sgi.com/software/xfs/overview.html#guaranteed).
Perhaps you all are right, but still. I’m using straight motif widgets and compare to W2K (dual boot). I honestly don’t care for all technologies available (i.e. DRI, DRM, Renderers, etc.) if they aren’t being used.
And point about widgets/toolkits. They only exist because default Motif widgets suck.
Now – the protocol.
You would be right if the protocol was designed well, but it isn’t. It has some brilliant ideas behind it, but it is slow by design.
1) People are confused here over what “scalability” means. Scalability means not just the fact that the operating system “detects” that there are x number of CPUs installed but whether or not they are able to scale the tasks over them in a “fine grained” manor.
With each CPU added, all things being equal, there should be an equivilant “boost” in throughput, if the operating system is fine grained, however, it is isn’t, you will see what happens in the 2.4 series once it hits 8 CPUs. There is an increasing deminishing return for each CPU and in some cases, as with FreeBSD, there can actually be a negative result from the CPU added.
People here point to SGI at their 512CPU monster, the fact remains that this “box” is specialised and thus only actually uses a very small subset of the whole kernel. The core, a few drivers and a couple of services. As long as they’re tuned, thats it. If one were to use it for something more “general purpose” with a much larger array of features required, you would quickly find that performance hit would be high and as a result it would perform worse than Solaris on the same “large” configuration (using SPARC64).
2) People here are surprised over these results. As I said previous, the Linux community has NEVER lacked people who are technically focused, what it does lack are people who are “user” focused, that is, people who can put themselves in the place of the user and work out what needs to be done to make something not only “technically” elegant but also easy to use for the end user.
As I also said previously, IBM would have been much better working WITH existing projects and porting their middleware to Linux. The fact remains, scalability would have improved even with out IBM. What the Linux community CAN’T do is get ISV’s on board. Had IBM said that it would spend $500million on porting their whole middleware (desktop and server) natively to Linux, then today we could possibly see Adobe and Macromedia considering it.
IBM and other “contributors” should stop trying to duplicate something and actually start working on the parts that are either being not able to be fixed or are neglected. Better X drivers would be one, more ISV’s for the desktop, a unified HIG for desktops, standardised “look ‘n feel” for GTK, QT and other tool kits, improvements to X syncronisation and scalability (IIRC, X isn’t multi-threaded, however, I may be wrong).
SUN actually SEEs the problems and have actually done something about it. They SAW the technical limitations of Xft/Fontconfig and as such created STSF. MAS is supported by SUN as a way of replacing the numerous servers with something endorced by the X.org consortium. These are important things, scalability on the other hand can improve without the input of the IBM’s of the world.
IBM contributes as much to Linux, if not more, as does SUN. I think it is unsafe to put all your eggs in one basket. IBM has decided to focus on the kernel side of things while SUN and Co focus on other aspects of open source.
As it where today, SUN is unclear about it’s business stratergies regarding Linux, or so it seems based on their love-hate relationship with it. On the other hand, IBM shown open and public commitment to Linux in terms of man power, funds and even stood to protect the legal threats beclouding the state of Linux development as it is at present.
IBM contributions to Linux has been valuable and appreciated and so is SUNs’. But one can’t help but observe SUN’s disoriented, unclear and unfocused disposition in their dealings with Linux and open source in general.
I mean the Java Desktop environment is an embarrasment. And if this is the products we are to expect from a supposedly experienced and professional commercial Unix vendor, then I’d much rather place my bets on IBM and their quality contribution to the kernel.
SUN contributions are welcome, but they could do more in terms of complete commitment and quality offering. Not half-arsed attempts as it seems to be the case. And definately not products like the Java Desktop Environment. Come on, even stock GNOME is better than that crap.
>>>>>
I’m using straight motif widgets and compare to W2K (dual boot).
<<<<<
Why do you want to compare a toolkit to an entire operating system?
If you want to compare, then do it fair and compare Motif against WinAPI oder against MFC.
>>>>>
I honestly don’t care for all technologies available (i.e. DRI, DRM, Renderers, etc.) if they aren’t being used.
<<<<<
If you do not use some quiet archaic (or historic) XFree86 version, your system should take use of those extensions.
>>>>>
And point about widgets/toolkits. They only exist because default Motif widgets suck.
<<<<<
About Motif and performance. At that point in history when The Open Group took Motif out of the ashes Microsoft was still using its non-graphical 16bit extension to an 8bit operating system.
Well a new implementation of the X protocol is being done a freedesktop.org, which aims to be a hell of a lot faster and leaner than XFree86, with real transparency etc.
Can’t wait!
Yesh. XFree86 is actually extremely fast. Just do the freaking benchmarks if you don’t believe me. It uses shared memory and very fast IPC locally. The new NX implementation makes things extremely fast remotely.
The problem is that X architecture makes writing applications that *feel* fast more complicated. A number of different components (the window manager, the toolkit, the X server) all have to synchronize precisely for things to appear fast.
The problem isn’t so bad for something like window expose handling. In Qt/KDE (3.2), it is sometimes faster (resizing one Konqueror window above another gives you no visible redraw on my 2GHz P4, while resizing one IE window above another gives you a very bad lag while the window below repaints), and sometimes it’s slower (Kopete doesn’t redraw as fast as AOL AIM).
Window resizing is another story. For simple windows (the KDE search dialog) you can’t see any redraw. For complex windows (Konqueror) redraw can very noticible. This is definately a synchronization problem. For awhile, the new kwin in CVS made KDE extremely slow to use. At best, resizes became slow and laggy like gtk2 apps, and at worst, the user lost complete control of the mouse for several seconds. After a single patch, the problem went away, and now the new kwin is almost as fast as the old one. Btw, this problem is being directly addressed in the new freedesktop.org X server. Eventually, it’ll do like OS X and explicitly synchronize everything.
Um, motif isn’t the default toolkit. Its missing some critical performance features, like double buffering, etc.
Is the main problem really the X protocol or the server?
In Win32 I can make a GUI app that is not very responsive and redraws everything in my display buffer, or I can write an app that is very responsive and will only redraw the parts of my display the need redrawing. My point is that some of the performance problems may actually be caused by the applications themselves and not the underlying protocol or X server. Unless you have hard data (profiler results) that clearly identify the X server as the source of the problems, then it is probably not a good idea to make such assertions.
Is XFree86 Multi-threaded?
From the profile data collected by the X developers, the problems are in the toolkits.
X isn’t multithreaded, but its just the graphics portion of the GUI. Their was a project that wrote a multithreaded X server (one of the developers was Keith Packard). However, they found that there was no performance gain, because the server spent most of its time serialized holding the framebuffer lock.
X apps, on the other hand, can be multithreaded. This was a problem before, but most apps have fixed this now. Even apps that aren’t multithreaded often don’t suffer from unresponsive behavior because they use asyncronous I/O. For example, all KDE apps use KIO, which as a completely asynchronous API, to do file handling.
From the profile data collected by the X developers, the problems are in the toolkits.
X isn’t multithreaded, but its just the graphics portion of the GUI. Their was a project that wrote a multithreaded X server (one of the developers was Keith Packard). However, they found that there was no performance gain, because the server spent most of its time serialized holding the framebuffer lock.
X apps, on the other hand, can be multithreaded. This was a problem before, but most apps have fixed this now. Even apps that aren’t multithreaded often don’t suffer from unresponsive behavior because they use asyncronous I/O. For example, all KDE apps use KIO, which as a completely asynchronous API, to do file handling.
IBM contributes as much to Linux, if not more, as does SUN. I think it is unsafe to put all your eggs in one basket. IBM has decided to focus on the kernel side of things while SUN and Co focus on other aspects of open source.
And who is going to concerntrate on getting the most important 2 things going; ISV support and better drivers? it seems that the hardest problems are being deliberate ignored and instead companies are working on the sexy but not entirely necessary parts.
As it where today, SUN is unclear about it’s business stratergies regarding Linux, or so it seems based on their love-hate relationship with it. On the other hand, IBM shown open and public commitment to Linux in terms of man power, funds and even stood to protect the legal threats beclouding the state of Linux development as it is at present.
Stop spreading lies. SUN has said numerous times, Solaris on the server and Linux on the desktop. How many times must they shout this into your ear for you to finally get it? They will supply a server with Linux, however, they have ALSO emphasised that tat their flag ship product, the one they put their weight behind is Solaris on the server.
IBM contributions to Linux has been valuable and appreciated and so is SUNs’. But one can’t help but observe SUN’s disoriented, unclear and unfocused disposition in their dealings with Linux and open source in general.
How on earth can they be “disorientated” when they donate a $50million programme to the community; OpenOffice.org and license it under LGPL? Please, again, stop spreading lies and stick to the topic at hand.
I mean the Java Desktop environment is an embarrasment. And if this is the products we are to expect from a supposedly experienced and professional commercial Unix vendor, then I’d much rather place my bets on IBM and their quality contribution to the kernel.
Why do IBM need to contribute when technical expertise have NEVER been a problem for the opensource community. All of what we see today could have been achieved without the help of IBM.
SUN contributions are welcome, but they could do more in terms of complete commitment and quality offering. Not half-arsed attempts as it seems to be the case. And definately not products like the Java Desktop Environment. Come on, even stock GNOME is better than that crap.
Again, stop making up stories and stick to the topic. They’re gaining customers, not losing. They’ve already won over Telstra in Australia, it is only a matter of time before the Microsoft loving CIO of ANZ see the writing on the wall and make the change.
Make a statement but stop lieing and making up stories you back up your fabricated and exadurated how much IBM has really done. Even SGI has done more within the limited time frame they have been involved.
I’ve used the gentoo kernel prepatched 2.4.20-8, and 2.4.23-ck1. They seem about the same on my box. I think the gentoo kernel is quite heavily patched, so it’s not as far from 2.4.23-ck1 as the version number makes it look. Both kernels work fine for multitrack and synth stuff.
I normally use 256 samples which is about 6ms latency, so I’m not really pushing it. As soon as I can get my damm usb mouse working in X in 2.6.0-test11 I’ll give it a workout, if it’s better than a patched 2.4 I’ll be most impressed!
“They SAW the technical limitations of Xft/Fontconfig and as such created STSF.”
What technical limitations are we talking about specifically?
Actually, I think STSF existed before Xft got off the ground. Looking at the comparison, I don’t really see the difference. STSF has more features, such as a text layout engine, but current toolkits already have their own. Then there are a few differences in how the two handle remote display and how they handle multiple X servers. Nothing really significant, not compared to the cost to switch.
Hey cheezwog, add ‘psmouse_resolution=200’ to your append section in your lilo, i believe you can also use modprobe psmouse to get it to work!
“They SAW the technical limitations of Xft/Fontconfig and as such created STSF.”
What technical limitations are we talking about specifically?
Hey, don’t shoot me, I’m only the messenger passing on what the SUN people said. If you want to rip into someone, rip into them.
Why are you folks discussing X and its related toolkits in a thread about Linux kernel performance?
I have done some reading on the low-latency and preemption patches for the 2.4 kernel. From what I have read and the code I have seen, they were hacks to work around the kernel’s non-preemptive design. Is this correct, and if so has 2.6 been redesigned to address the problems or is the code from both patches still being used? The low-latency patch in particular seems to scream out for kernel a redesign.
How is 2.6 addressing these issues?
>>From what I have read and the code I have seen, they were hacks to work around the kernel’s non-preemptive design.
No, that’s wrong.
There were 2 sets of patches that people were tossing around for 2.4 to address latency issues. One was the preempt patch which added preemptability to the kernel and the other one was the low-latency patch which located specific points that caused low latency and broke them up with calls to schedule().
Both patches are worthwhile.
The preemptability patch is more risky because there are places that assume the kernel is not preemptable. Each of those places needs to be found and changed. It’s hard to search for all the places. I found some possible bugs recently in some sound drivers caused by preemption.
On the other hand, preemption is cool. People have been using the code without problems for some time. Robert Love who maintains the patch is an awesome maintainer and has been good about fixing bugs right away.
The other patch is also worth while. Even with preemption, there are still point where locks are held for too long. Someone needs to go through and change all those places. That was a major focus in 2.5.
>> Also, does Linux or any other OS (BSD, Windows, Solaris) support the idea of prioritized I/O (memory, Disk, Network, etc)? I would think that there would be a benefit to allow users/administrators to set different priority levels for different kinds of operations. For example, giving one process a higher priority over disk I/O then another. Does this type of thing exist?
On modern computers most of this type of communication can be done without going through the CPU. Sort of “in the background” type of thing. So when you read from a disk, you can also read from another disk at the same time and also from the network at the same time.
On the other hand it would be nice to be able to say “My mp3 player gets to read from the disk before anything else.” I don’t think Linux has this cability right now, but there are patches that can let you do it. The problem with letting certain processes have a high priority is that they can DoS your system if you’re not carefull. It’s tricky work.
There was a lot of work done on the IO scheduler in 2.5 and there were a couple competing implementations. I didn’t follow closely which code went in.
>>There doesn’t seem to be a significant difference between the 4 and 8-way system in the Observation #2 test. I’m curious at to know at which point the difference would be negligible and where would you see diminished returns (I know that this is also a function of the hardware).
It really depends on the software you want to run. A four CPU Xeon can easilly cost $20 thousand depending on the drives and RAM etc, so you want to do some research before you buy it.
Some software will only ever use 1 CPU. Some software could be handled better in a beaowulf cluster.
>> Also, I would like to know what the exact cause of the failures were in the Observation #3 tests. The 8-way machine had 8GB of RAM and plenty of disk space for virtual memory, so why the memory error?
There may be places in the kernel with fixed sizes for certain stuff. It wouldn’t be hard to figure out the problem since it’s easy to reproduce. Probably you can just change a constant in the kernel or something. You can hire people to do this kind of kernel tuning if you don’t have someone in house to do it.
Odd that everyone is reporting better performance with 2.6. I’ve run 2.6.0-test[9,10,11] for a while and got the impression that it’s quite the opposite on my laptop. For example with 2.6 xmms had dropouts when mozilla was loading pages whereas under 2.4 it *never* does.
Use `top` to make sure that X isn’t reniced. A lot of distros renice X to -10. 2.6 takes nice values more seriously than 2.4 so renicing X is probably a bad idea.