“Anyone who’s ever set out to perform Linux benchmarks quickly realizes the difficulties involved in such an undertaking, not only with the availability of quality benchmarks (or lack thereof), but also in the way the test system(s) are configured. Most of the Linux benchmarks that I see on hardware review sites are simple things like kernel compiles or povray… maybe a game benchmark or two.” 2CPU.com gets serious over Linux benchmarking on the web.
He shouldn’t have used -O3. -O2 or -Os is generally considerd better among the devs and experienced users.
/me sighs
He’s forgiving anyway, he’s new to gentoo. ๐
> He shouldn’t have used -O3. -O2 or -Os is generally considerd better among the devs and experienced users.
It’s funny you mention that … you can read more about that in the http://osnews.com/story.php?news_id=6136“>OSNews: .
All about optimisaton in GCC, I think too
Yup, not to mention -O3 further increases code size and sometimes produces buggy code. However, -O3 is great for multimedia applications.
But I expected the author of the article to use flags like -fomit-frame-pointer, -pipe and -s for these benchmarks. Almost all experienced Gentoo users have this in their make.conf file.
Anyway it is a good overview overall. Now, I say, we need cross platform benchmarks. Solaris vs *BSD vs Linux vs Windows vs OS X. Oh boy, I can see the flamewars already.
Your comments about -O3 vs. -O2 and other flags may or may not be valid, but that’s really not what this article is concerned with. The author states his intentions, and then offers relevent benchmarks. He was only concerned with percentage increase or decrease in performance with and without hyper-threading enabled.
If this article was about comparing compilation flags on a system to see the effect then these comments would be valid. However, he’s trying to look at how a particular processor feature affects the performance of certain programs.
I understand that. I guess I was just nitpicking.
It’s OK. You (probably) use Linux (the kernel), which is free (as in GPL, not necessarily as in cost), so such nitpicking (referincing your original post) is to be expected (not to be too general).
His benchmarks are way off. First he is trying to test CPU performance with an I/O Benchmark. The only thing those benchmarks really state is that the two chipsets are pretty close in performance. Instead of focusing on CPU, he should rewrite and focus on the platform and chipset. He brags about how tough he benchs’ the stuff, but he really only shows his ignorance in x86 system architecture.
I’ve since moved on to Slackware, but my experience with -O3 on gentoo was that there tended to be a considerable amount of breakage with it. You could always go back and recompile those apps with -O2, but who knows when buggy code will bite you.
-pO3 only add two optimization flags: -finline-functions and -frename-registers. The latter is pretty useless on x86 architecture but the former is actually recommended by AMD and probably by Intel. It doesn’t mean that GCC will produce sane code, though… Then again, developers *need* users to test the flag. Otherwise it’ll always be broken. That said, I never encountered a major problem with Gentoo and -O3. Perhaps I’m just lucky. However, I do use -O2 on my server as you never know.
These results are consistent with Intel’s own white papers showcasing hyperthreading. When you have two physical processors in the system, both with hyperthreading, there is frequently a degradation in performance. The OS doesn’t know that two of the processors are virtual processors. You therefore end up having them tasked with the same priority as the physical processors. What you then get is load imbalance across the physical processors in lieu of what the OS sees as load balancing across four processors. I frequently ran across this on my DP Xeon machine and therefore had to turn hyperthreading off to boost performance back to normal.
“The OS doesn’t know that two of the processors are virtual processors. You therefore end up having them tasked with the same priority as the physical processors.”
Linux 2.6 is supposed to know the difference. If it isn’t using physical processors in preference to virtual processors, it is broken.
I was speaking of WindowsXP.
As always, it seems he’s running this in a test bed type setting. If the machine is to do one thing and one thing only forever I guess this is fine. However I use my work desktop for a lot of things, like playing ogg files and doing test runs while compiling.
One thing a P4 totally sucks at is context switching. Loading and unloading that damn long pipeline just kills it.
Every Athlon system I’ve ever used has ALWAYS felt far more responsive than the equivalent P4 system.
And all the benchies I see are geared towards doing one thing and one thing only at a time. Doesn’t anything simulate what a normal power user might do ???
Hasn’t anyone ever kicked off an emerge -u world or a divx encode and then gone and played quake3 or some other game at the same time? I’m far more interested in how a system handles tasks like this than crap like how long it takes to perform batch processing tasks.
“One thing a P4 totally sucks at is context switching. Loading and unloading that damn long pipeline just kills it.”
Yeah. I think this is the reason Intel added Hyperthreading and made a big deal about it. The second thread can keep going while the pipeline refills for the branch or context switch on the other virtual processor.
I use dual Athlon MPs myself, running Gentoo and Linux 2.6. It remains absolutely responsive while under 100% load. I agree with you about Athlons.
My friends with Hyperthreading P4’s claim Windows “feels” better with Hyperthreading turned on. I have no idea how a benchmark could measure this feeling.
And intel’s icc is generally considered better than gcc…
There’s not difference between the 4 “virtual” CPUs. HT isn’t like you have 2 “real” CPUs and 2 “virtual” ones. They are all “virtual”, but clustered by 2.
The OS knows how they are clustered. Or at least it’s supposed to.
I’m worried that the make -j2 test on the xeons is significantly slower when HT is enabled. This tells me that the kernel doesn’t make the tasks “stick” enough to their physical CPU (they can switch between virtual CPUs within a physical CPU at no cost). Would be very interesting to see if the scheduler has an arbitrary “stickiness” parameter and how this parameter affects performance.
I suppose better in this case should be qualified. It produces faster code(at least I have heard, but gcc is very cross platform. There are sacrifices to be made, but even Apple uses it I am sure.
AFAIK Linux 2.6.3 doesn’t know how to differenciate a virtual processor from a physical one. There are patches for it( they’ve been for a while in the -mm tree) and 2.4 has had them for ages too. But not 2.6 (they’ll get merged eventually)
Woulda-coulda-shoulda is what I keep hearing from those who claim that the OS is able to correctly schedule between the physical processors. XP doesn’t do it. Maybe it should do it, maybe it would do it, but it doesn’t do it. Don’t give me benchmarks, which also show I’m right, I’ll give you real world experience. Run four completely independant processes that simultaneously dominate a given processor. Time the resultant test with a 1x, 2x and 4x run. Do the same test with HT turned off. Did your performance improve or not? In each case I had performance degradation on builds, numerical integrations and other computational tasks. HT stays off under XP. If the Linux benchmarks are accurate, they seem to reflect the same thing I’m seeing under XP. HT helps for single processor breathren but not us MP people.
Why he would build a kernel with preempt enabled for server workloads is beyond me. Preempt is a tradeoff – interactive performance versus throughput. Sure, for a workstation or personal desktop, preempt is a great thing, but if you want to squeeze maximum throughput from a server, then preempt is not what you want.
Hi
Technically it does but how it manages processes is different from the mm tree and cfq trees. so yes it does recognise the differences but schedules it in such a way that they are treated as peers
regards
J.Hack
as does this one.
I don’t really believe in benchmarks, if i want to check if a new kernel is faster than te old one I compile my standard 2.4.oldages kernel a couple of times and see how long it takes and compare it to previous runs
If there is one thing that stresses your cpu it’s compiling, and the kernel is large enough to see a difference
I’ve since moved on to Slackware, but my experience with -O3 on gentoo was that there tended to be a considerable amount of breakage with it. You could always go back and recompile those apps with -O2, but who knows when buggy code will bite you.
It really has nothing to do with Gentoo. -O3 turns on -finline-funtions and that can cause problems. Setting -finline-limit=N higher than its default 600 helps in certain cases. In other cases GCC just chokes on complex inline functions when -O3 is used.
He shouldn’t have used -O3. -O2 or -Os is generally considerd better among the devs and experienced users.
/me sighs
He’s forgiving anyway, he’s new to gentoo. ๐
I wouldn’t say that necessarily. That kind of setup might benefit from -03. The binaries may be larger but that is less of an issue on a higher spec machine.
FYI, I have run my entire system with NPTL for a few days now. Actually, two systems – P4 3.06 with HT and Athlon 900. No odd delays, no extra CPU time, etc. that I have encountered.
Anyone who has that should try a full reboot. I had some oddities in responsiveness until I did that and booted the whole thing fresh with NPTL enabled.