Preliminary Pentium 4 numbers are here. Scott R. Ladd extended the tables, keeping the Pentium III numbers for comparison purposes. You’ll find specifications for both test systems. The new tests show that ICC is the choice if you need Pentium4 optimizations for your applications, while GCC is a good free alternative for the rest of the CPUs. ICC seems to compile up to twice faster in most cases.
I’d be curious to see athlon numbers from the Intel compiler. At work we do some very heavy computational stuff using lots of doubles. Of course we went from running dual P3’s over to running dual athlons. Stability (under linux) and price/performance (!!) have been unbeatable with the athlon systems compared to the P3 systems we have.
since I have a P4. Pity that Athlons haven’t been benched.
Are there things for GCC to learn from this?
Keep up this kind of article 🙂
The source and scripts for his benchmarks are all there, just download them and run them on your Athlon boxes.
Things for GCC to learn? I’m not sure about that. From what I understand Intel tried to contribute to the GCC project, but was disappointed because the changes they added were either not added, or not added in a timely manner. The optimizations for the bleeding edge Intel processor were not available in a timely manner, so Intel created their own compiler that they could control the release of. GCC also has to worry about being cross-platform compatible, and many OSes use it build themseleves, so I think it is good that they don’t rush patches (yes I know they have a stable/unstable series).
Pat
In the article it is mentioned that the intel compiler
now supports hyperthreading.
I thought this hyperthreading does show 2 processors for
1 physical processor and is only meaningfull for os developers.
Anybody a hint?
Most ppl says top down compiler is good!
but real truth says bottom up compiler is much better!!!
i think bottom up compiler is more brain-friendly.
only harmful ppl lies.
Ah! All by Myself!
Don’t believe things so easy until test it yourself!
“In the article it is mentioned that the intel compiler
now supports hyperthreading.
I thought this hyperthreading does show 2 processors for
1 physical processor and is only meaningfull for os developers.
Anybody a hint?”
The following assumes you know a little about CPU design, SMP and multi-threading.
Hyper-Threading does indeed simulate a second cpu on the same die. Because resources are shared with the ‘first’ cpu, there are some very unique advantages and disadvantages. Advantages: Threads can share data much much quicker. Also, there is less ‘wasted space’ on a cpu, as the cpu is doing more than a non HT-enabled cpu. Disadvantages: Resource contention between two threads on the same cpu: if both threads are trying to use the pipeline paths… you might have a problem. Current versions of HT show that Intel is getting better at dealing with this, though.
Hyper-Threading is best suited at running two threads from the same process, while dual-processors are a little better suited for running two seperate processes (one on each physical cpu).
Hyper-Threading can help speed up any process which is multithreaded. The heavier the multi-threading, the better it works with Hyper-Threading. If you use the new Intel Compiler, you’re even better off, as the the Intel C++ Compiler 7.0 does a very good job of checking for resource contention between threads.
On average, if you have a multi-threaded application, you can expect a 10-20% performance boost (on average) just by turning Hyper-Threading on. At its very worst most recent benchmarks show that HT doesn’t slow anything down.
…but I don’t have an Athlon system!
Don’t get me wrong — I’d *love* to have an Athlon MP system to play with. I just can’t justify one on my current budget. I’m just a po’ writer, you see…
..Scott
Wow, then how could you afford a P4 system? Or was your choice between a single proc P4 and a dual Athlon (about the same price for each system).
I needed to know more about hyperthreading, so a P4 was my only option. It was a business decision, not a political one. I already have SMP systems in-house; buying an Athlon MP would not answer questions my clients are asking.
I’ve posted a very basic set of new benchmarks, re:HT, on the front page of my web site:
http://www.coyotegulch.com
Is HT a benefit? Yes.
Does it replace MP? Definitely not.
Do I plan to buy an Athlon MP system? Yes, as soon as business permits.
..Scott
You are patently mistaken, Pat (I couldn’t resist). Intel has had their own compiler for eons, and did not just create it because they couldn’t work with GCC.
As for things to learn, these data indicate that GCC and ICC aren’t that far apart generally. And that the faster system exacerbated the OOPACK/Complex performance discrepancy dramatically for GCC – which probably means there is something fundamentally wrong somewhere.
In some of the OOPACK numbers it looks strange where some of the results are identical.
ALSO Btw I bet the author could get more reliable results out of the Stepanov test by supplying more iterations, e.g. $ ./stepanov 100000 rather than the default.
…and the benchmark crashed. With both compilers.
As the current review states: This is a work in progress, and there is definitely something suspicious about the OOPack numbers. They’re just too darned “neat”.
I’ve ran SMP boxes for years (I’m a dual CPU freak), and have noted something that might with your HT tests:
I know that, on average, SMP boxen work best with 2 threads per cpu. You might try upping to “-j 4” I’m not certain, but I suspect it might help speed up the HT’d compiles.
I’m afraid I’m going to have to agree with you, I can’t find any souces for what I mentioned. Oh well..
That is really weird. I’ve been running the stepanov test on gcc since 2.95.* and on icc since 5.0 and haven’t had one crash yet. (Hmm, maybe it was only bombing on your more-experimental-2.5.x system?)
Another thing that stood out for me that I’ve mentioned before is that “-funroll-all-loops” is used for gcc, but icc doesn’t do that (it uses heuristics to determine inlining by default, unless told to use a threshold via -unroll=[n].) but the manual states that it usually slows things down. Hmmm..
Ditto on the OOPack numbers looking too “neat.”