12 Machines Benchmarked

Submitted by Matt Simpson 2006-01-30 Benchmarks 54 Comments

GeekPatrol puts their new cross platform benchmarking tool to work and compares 12 machines including the new iMacs, PowerMacs, an Athlon64, a P4c, and a Xeon. “The PowerPC G5 is still a good processor. In fact, it’s still a great processor. Apple isn’t switching to Intel chips because Intel chips perform better, but rather because a G5 would melt through the bottom of a laptop. The Athlon 64 edged out the Pentium 4c on all the CPU tests, while the Pentium 4c edged out the Athlon 64 on all the memory tests. It seems to me that Intel and AMD have had their different strengths all along, so I don’t find this surprising. The Intel Core Duo is a great processor. It performed as well or better than the PowerPC G5 at similar clock speeds (1.8GHz and 2.0GHz), and has nowhere to go but up.”

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

54 Comments

2006-01-30 8:59 pm
Shaman
…is there anywhere that listed the OS they used in each test? One would think that some of the benchmarks would be affected by the OS.
Which is my way of saying that conclusions from this test are questionable in terms of raw performance, since the answer should be “this OS did this, using this test,” even though on the face of it most of these tests are benchmarking raw hardware performance it seldom works out that way in practice.

2006-01-30 9:05 pm
yanik
We’ve put together a preview of Geekbench, our cross-platform benchmarking program, for Mac OS X (as a universal binary) and Windows. While Geekbench isn’t terribly pretty (it’s just a command-line application), we hope some people will find it useful.
2006-01-31 3:11 am
Dark_Knight
Unless the article was edited recently the author does state what systems used Windows XP and what systems used OS-X. Though the author didn’t clarify if the 64-bit AMD processor was running Windows XP 64 or the 32-bit version. Also it wasn’t clear whether the Xeon processor used in the test is the older 32-bit version or the newer one supporting EMT-64 and Hyperthreading. Was the Hyperthreading even enabled? While a nice attempt at benchmarking I would of preferred more detail as well use a more detailed benchmark such as those performed by http://www.gamepc.com/labs/index.asp or one that uses SPECviewperf.
Anyway I would be interested in seeing how Intel’s Merom (Core Duo EMT-64 with Hyperthreading) does when released against AMD 64 X2 mobile processor. I don’t know what AMD’s next dual core mobile processor will be called but it’s definitely going to be an interesting year. Hopefully laptop manufacturers using either mobile processor will offer more memory options to support the higher memory capability of 64-bit.
Edited 2006-01-31 03:12

2006-01-31 7:58 am
PowerMacX
Also it wasn’t clear whether the Xeon processor used in the test is the older 32-bit version or the newer one supporting EMT-64 and Hyperthreading.
From the article:
“12. Intel Xeon Dual 3.2 GHz HT”
“… (the Xeon has two hyper-threaded processors, …”
“… the Xeon (which has four logical processors) …”
It seems clear enough, unless they added that later?

2006-01-30 9:09 pm
Shaman
I meant for each test. Since OSX runs on Intel hardware now, I’d like to see the OS listed for each system they tested.
I wonder what they mean about universal binary, too.
Geeks don’t like vague tests. We want crunchy data and most of us need lots of fibre.

2006-01-30 9:16 pm
Thom Holwerda
Since OSX runs on Intel hardware now, I’d like to see the OS listed for each system they tested.
OSX only runs on… Macs… And Windows only on generic non-Mac Intel machines… Do the math. What’s so hard to grasp? :/
I wonder what they mean about universal binary, too.
Geeks don’t like vague tests.
How has living under a rock been treatin’ ya? Universal binaries wrt OSX are binaries that run on both OSX/PPC and OSX/Intel.

2006-01-31 3:50 am
Tuishimi
Wow. Thom++. Thom with attitude!
2006-01-31 2:23 pm
jaapjan
Universal binaries are the new name for what OSX has been supporting for ages but has been called Fat binaries.
Fat binaries though, while OSX supported it, have not been very commonly used for more then the PowerPC target. You can think of Fat binaries as multiple executable files linked into one (fat) file. Depending on what sort of machine you’re running at, OSX takes the one needed.

2006-01-30 9:13 pm
phoenix
Not much of a geek if you have to ask what a “Universal Binary” is, when discussing MacOS X.
A “Universal Binary” is a MacOS X application that will run on MacOS X, whether it be on a PowerPC-based or Intel-based system.
Hence, the article tells you what OS is being used: MacOS X on the PowerPC and Core Duo systems, and Windows on the P4 and AMD systems.
2006-01-30 9:15 pm
rayiner
Some of these numbers look sketchy. The G5, which has the highest-latency memory controller of any machine in the test, does better than the Ahlon64, which has the lowest latency memory controller in the test? The use of the standard library functions for the memory tests is also a bit sketchy — you’re basically testing the speed of the C library, not the processor. Also, the Core Duo comes out looking massively faster than an equally clocked Athlon64, which differs with Anandtech’s recent benchmarking of the CPU.
2006-01-30 9:15 pm
Anonymous
These benchmarks are useless!
Even comparing numbers across the same family (P4c vs. dual Xeon), we can see that there are some huge inconsistencies. Why would a DP Xeon 3.2 GHz score 4x higher than a P4c 2.4 GHz in a four-thread integer test? It doesn’t make too much sense.
What takes the cake are the memory scores for the P4 vs. Athlon64. Anyone with half a brain knows the Athlon64 screams past any P4 when it comes to both memory bandwidth and memory latency. The graphs show the Athlon64 getting its ass handed to it.
Let’s run some real benchmarks on these systems, please.

2006-01-31 2:39 pm
Ronald Vos
What takes the cake are the memory scores for the P4 vs. Athlon64. Anyone with half a brain knows the Athlon64 screams past any P4 when it comes to both memory bandwidth and memory latency. The graphs show the Athlon64 getting its ass handed to it.
I’m just a noob when it comes to these kind of things, but I believe it’s ‘known’ that Intell has better quality memory.

2006-01-31 5:20 pm
Anonymous
> I’m just a noob when it comes to these kind of things
Then why share your opinion?
> but I believe it’s ‘known’ that Intell has better quality memory
Whaaaa? Intel doesn’t make memory. What are you going on about?
I wasn’t talking about sticks of RAM and who makes them. I was talking about memory controller implementations, memory bandwidth, and memory latencies as a result of those implementations.

2006-01-31 11:29 pm
Get a Life
If the AMD Athlon 64 3200+ in this case uses a single-channel memory configuration the Xeon will definitely beat it in synthetic memory benchmarks.

2006-02-01 1:47 am
Anonymous
If the Athlon 64 is using a single-channel memory configuration, then the benchmark engineer ought to be *slapped*.
Hell, why even bother using a Socket 939 platform if you’re going to put one stick of RAM in it. 😕
2006-02-01 1:57 am
Get a Life
To the best of my knowledge they didn’t even specify if the 3200+ was a 754-pin or a 939-pin model.
2006-02-01 7:43 am
Anonymous
Oh my … :’C I just took a look at their hardware list again. It says “Athlon 64 3200+ (2.2 GHz)”. I assumed that they were benchmarking current technology, not a discontinued Athlon 64 line. I was sorely mistaken.
This is definitely a S754 CPU because the 3200+ in S939 is a 2.0 GHz model. S754 3200+ is 2.2 GHz.
2006-02-01 1:34 pm
Get a Life
Ah, I didn’t even catch that when I looked at it the other day. Having looked again I noticed that user-supplied scores for a Pentium D and X2 seem to defy plausibility. That benchmark really seems retarded.

2006-01-30 9:22 pm
Owiber
Not only do the benchmarks seem sketchy, but the common desktop x86 chips used are not anything near representative of current generation PC hardware. The Athlon 64 and Pentium 4 used were available, what… a year or two ago? (not bothering to look up exact date).
Both Intel and AMD have dual core chips readily available which would match up better to the high end Macs in those benchmarks. While I don’t doubt that the G5 is still competitive, using these benchmarks to justify it doesn’t mean much because the competition is outdated.

2006-01-30 9:28 pm
MikeekiM
A comparison, where you specify what you tested, is still a comparison. Don’t cry so hard.
Maybe Apple actually had a good horse in the race afterall. this is going to crush the Die-hard x86 fans.
But, life is about Learning.

2006-01-30 9:37 pm
Owiber
“A comparison, where you specify what you tested, is still a comparison. Don’t cry so hard.”
Wasn’t saying it’s not a comparison. Was just saying the comparison doesn’t mean much. I could compare the speed of my car with the speed of a child’s tricycle, but that would not give me much indication of how my car compared to other vehicles of its class.
When the article states that the “G5 is still a good processor” I take that to mean it compares well to its current competition… which it is not benchmarked against. Again, I don’t doubt that the G5 may stack well against current x86 desktop chips, but this article doesn’t help me reach anything conclusive.
Edited 2006-01-30 21:38
2006-01-30 11:21 pm
rayiner
Maybe Apple actually had a good horse in the race afterall. this is going to crush the Die-hard x86 fans.
Right. This benchmark will “crush” them, despite what SPEC, Anandtech, independent benchmarkers, and hell, even Apple developers say about the G5 versus the x86 competition.
Yep, I’m crushed. I must reverse my previous opinion about my G5 versus my Athlon64. I was just imagining that it run most programs I through at it (POVray, Blender, SciMark, GCC, Latex, Firefox, Matlab, etc) more slowly than my Athlon64.
Edited 2006-01-30 23:32

2006-01-30 9:26 pm
Shaman
> How has living under a rock been treatin’ ya?
Oddly I have two Macs and have only run into “universal binary” WRT 680×0 binaries that run on PowerPC and vs. vs.
> Universal binaries wrt OSX are binaries that run on
> both OSX/PPC and OSX/Intel.
OK. Well, with Intel OSX only weeks old and me just back from a caribbean vacation, that rock doesn’t feel all that heavy.
A universal binary makes these numbers utterly meaningless, I think. Imagine the system libraries (primarily C) doing intercepts, emulations, etc. etc. to provide support for a universal binary. Unless you mean that there is actually an Intel binary instance that is in the same package.
Guess I’ll have to read up about OSX on Intel now.

2006-01-30 9:30 pm
MikeekiM
You guessed correctly:
Unless you mean that there is actually an Intel binary instance that is in the same package.

2006-01-30 9:43 pm
Shaman
Right, but it’s more like comparing the performance of the same engines in two different cars without listing the weight of the two vehicles. A 180hp motor will push a 2400lb car with a lot more authority than a 3400lb car, to use as a metaphor.
Likewise, an older NetBSD without any GUI will likely run these simple tests somewhat faster on the same hardware.
Because there flat-out isn’t going to be parity between different operating systems on this same hardware, it would be much more useful to know what OS is running for each test. I don’t think that’s all that hard to see but I’ve been wrong before.
It’s also worth considering that these are very early days for OSX on the Intel platform, whereas Windows has run on Intel for something like 20 years.
2006-01-30 9:49 pm
Ralf.
…a “Forbidden….”
2006-01-30 10:04 pm
sp29
Face it, no test will come close to your own personal criteria.
2006-01-30 10:14 pm
Hands
[sarcasm]
Since Mac OS X now runs on Intel hardware, it should be possible to get two machines running on practically identical hardware (minus the extra stuff Apple used to lock OS X to their hardware) on distinct operating systems. That would make it easier to tell how much the OS was affecting the benchmark. Of course, if Windows, Linux, or some other OS were installed on Apple hardware, that would make a direct comparison very trivial… Oh wait! Linux has been installed plenty of times on Apple hardware.
[/sarcasm]
Sarcasm aside, it really is much better when comparing different hardware running on more than one OS to have at least one system that is benchmarked with the OS being the only variable. I would like to see benchmarks run on both OS X and Linux (since it isn’t possible with Windows XP) on a G4, a G5, and a Core Duo system to see how well they could each take advantage of the processor architectures.
2006-01-30 10:19 pm
stew
The only thing a benchmark tells you is how fast a system will run that benchmark. Since a chain is only as strong as its weakest link, you can’t benchmark a CPU without at the same time benchmarking the compiler and the source code of the benchmark itself.
2006-01-30 10:20 pm
Shaman
You see, whenever you test something, the idea is to block out all the other variables to test the precise things that you are after. It’s that simple.
This test failed to do that. Ergo, it cannot be used as a fait-accompli.
2006-01-31 12:06 am
smitty
I’m not sure how useful and accurate these types of microbenchmarks really are. I prefer to see actual apps being tested, although that makes it difficult to test crossplatform.
2006-01-31 12:45 am
Nicholas Blachford
No processor is good at everything, they’ll all act differently according to what you throw at them as they all have different strengths and weaknesses.
These particular tests turned out to be G5 and even G4 friendly, you could no doubt find tests which are the exact opposite if you looked hard enough. The same goes for applications and optimisation will accentuate the differences.

2006-01-31 12:56 am
rayiner
That’s a cop-out answer. That basically says “there is no point in trying to achieve an ordering because its too hard”. Its strictly correct, but not a very useful result. In practice, its fairly easier to find benchmarks that run faster on an Opteron than a G5, and fairly difficult to find benchmarks that run faster on a G5. It is accurate to say that the Opteron is a faster CPU, in that if you take a random piece of C code and compile it on both platforms, the Opteron will likely run it faster. It’s quite a conditional statement, and in some problem domains, such as signal processing, it won’t hold true, but at least it is more useful than saying “oh, each one has its strengths and weaknesses”.

2006-01-31 2:39 am
Milo_Hoffman
Dear lord why are these people who do these things always tards!?
It is too much to hope that someone out there with enough skillz will install linux on all of these and use the same benchmarks for all the tests.
First rule of benchmarking….CHANGE ONE THING AT A TIME.
This lame test breaks that rule 1000x different ways.
2006-01-31 3:03 am
specter
This benchmark gets added to the list of ones that, for whatever reason, do not benchmark against the Opteron. The power of the Opteron is truly amazing. The X2 Athlons should be more competetive that the 3200+.
2006-01-31 3:55 am
Hakime
Personnally i found those test particullarly interesting even not being the real truth of performance, because anyone can say or show that one processor A will outperform processor B, with a given test (for example a floating point performance test) butt at the same time someone else will find another test which measure the same performance and give the processor B more powerful that the processor A.
I found those test interesting because they show that for example a G4 can be more powerful than an Athlon 64 bit in some integers benchs, it shows that the Athlon is not the performance champion that so many people seem to think (it is beaten very often by the Core duo which is clocked less, its not 64 bits…..).
I dont give more thruth to Anandtech (those guys dont not know what is a thread) that test the floating point performance of the G5 and Opteron with only a single test, which basically says nothing than to GeekPatrol (but i notice that they do basically four different tests for the floating point performance), so i guess its better not to say that guys do right or notg, but rather to be open to a large set of tests to better find out where a given platform is strong.
When Apple says that the Core duo is more powerful than the Powerpc, thats true for the G4, and thats true compared to a single core G5. Apple says that the imac core duo is faster than the Imac G5 mono core, which is true according to those tests, they dont say anything about the comparison core duo/powermac.
Performance measuring is not a defined metric that produce always the same result, so care has to be taken….

2006-01-31 4:14 am
rayiner
I found those test interesting because they show that for example a G4 can be more powerful than an Athlon 64 bit in some integers benchs, it shows that the Athlon is not the performance champion that so many people seem to think (it is beaten very often by the Core duo which is clocked less, its not 64 bits…..).
Yet, you run a real integer code on either processor, you can immediately see which one runs faster. The Core Duo does beat out the Athlon64, but note:
1) They compare two chips several generations apart (the 3200+ is 2003 technology);
2) The Athlon64 is available at substantially higher clockspeeds than the Core Duo.
Also note that in the tests, the Athlon64 was handicapped because it was running 32-bit code. 64-bit code gives the Athlon64 another 10-15% on integer benchmarks. On Blender, for example, it gives an approximately 20% speedup.
I dont give more thruth to Anandtech (those guys dont not know what is a thread)
This coming from a guy who thought you needed special declarations in your code to use 64-bit?

2006-01-31 7:58 am
Get a Life
Also note that in the tests, the Athlon64 was handicapped because it was running 32-bit code. 64-bit code gives the Athlon64 another 10-15% on integer benchmarks. On Blender, for example, it gives an approximately 20% speedup.
The performance benefit or penalty of 64-bit programs in AMD64 environments is pretty variable. This is even pretty obvious in the CINT2000 subtasks. It’s more so in larger applications. I’m far too lazy to track down what precisely in Blender sees a 20% performance increase in integer processing.
The Althon64 is probably most-handicapped in the Blowfish part of this benchmark. I don’t know though because since this benchmark doesn’t appear to include source code I don’t care-enough about its results to do much more than urinate in its general direction.
This coming from a guy who thought you needed special declarations in your code to use 64-bit?
The OS X benchmarking done by Anandtech was deficient in a few ways. And I suppose if you’re talking about making full use of a 64-bit general-purpose register, using the LLP64 model would require changes to the codebase. That’s sort of the point of using that model, though. If you just mean using a larger address space then that’s confusing.

2006-01-31 3:58 am
Tuishimi
I’m going to test my wife’s iMac 1.8 G5 running OS X against my son’s NetVista 1.8 P4 running Edubuntu. The apps I will be testing are:
Mac Mail vs. Evolution
Safari vs. Epiphany
Pages vs. OpenOffice.org
I expect the Mac to come out on top. I’ll post the results later.
2006-01-31 4:05 am
Tuishimi
…I’d be using “Measuring Cup” as the benchmarking application. It’s actually a recipe program, but I think I can tweak get the results I need by typing “date” before and after I start it while the other programs are running.
… later …
Hmmm, I found that opening and closing Measuring Cup showed that the Mac applications were much slower than I thought they would be, partly because I couldn’t get “Measuring Cup” to run on Edubuntu. Here are the results:
x o x
x o o
x x o
As you can see, this graph clearly shows that “x” has achieved “tic-tac-toe”. This is very difficult to achieve and often results in thermonuclear war.
2006-01-31 10:39 am
martin.k
WinXp, OSX and where is Linux???
2006-01-31 11:51 am
cerbie
Yes, we know the G4 and G5 can do great pure math ops. We knew this back when they came out. The only moderately non-synthetic becnhmark was the emulation ones, which, surprise surprise, reduced the differences seen between them all.
I’m fine with not having the latest and greatest CPUs to test with (a 3800+ would be nice to stick in there, but somebody has to buy it or donate it, right?), and even using common OSes, rather than sticking a common Linux on them; but the tests really didn’t do a decent job of comparing the systems.
There are audio and video encoders for all those platforms, aren’t there? File compressors and decompressors? These types of things can give a real clue, and are very easy to time. These are also the kinds of tasks that generally make you really wait for the system. I haven’t done much massive square-rooting of fractals on my box, recently, or know anyone that regularly does…

2006-01-31 12:29 pm
biteydog
Not in dispute with your general arguments, but AV encodind/decoding seems a very variable feast in my experience, even on the same macnine using the same codecs but different front-ends. I agree that it probably would lead to a more meaningful comparison if sufficiently standardised, though.
I believe, however. that square-rooting of fractals and similar operations – Fourier FP transforms etc. – play a large part in Gimp or Photoshop type filters and plug-ins for image transformations. Agreed. not a lot of people do these, but a bit of speed is nice (some operations can take up to 1/2 hour or more on large hi-res images).
Anecdotally my experience is that the G5 Macs are the quickest in photoshop (Mac), with AMD64 (photoshop Windows under Wine on 64-bit Ubuntu) or (photoshop Windows 32-bit on AMD64) a little slower. Gimp on 64-bit Linux is roughly as fast as the G5s, but the filters are differently constructed, I should think, so not a valid comparison.
This may explain why the Mac is still the weapon of choice for photographers/designers – in the UK at any rate (apart from its stunning looks – sorry modded-case fanboys!).

2006-01-31 2:36 pm
cerbie
I don’t disagree, but testing an actual, and common, work load would be better for making real comparisons. You can’t compare GIMP directly to photoshop, but it should be ‘close enough’, as long as it’s GIMP vs. GIMP. Maybe it will be a bit slower, or faster, but if it isn’t at least 15% one way or the other, considerations other than performance should be more important (like how loud it is, how much power is uses, how much it costs, how much upgrades and/or maintenance will cost, available and preferred OSes, etc.) for most uses.
Overall, I like the idea of a common cross-platform test suite, because it’s hard to gather performance info on anything but the newest x86 parts. With a test suite like this doing actual app tests, you could figure out how much difference there may really be, even if it turns out basically insignificant.
IMO, people like the Macs because OS X has the best vertical integration of anything. It often annoys folks like me (on Windows or Linux, I can have any audio player up and working in a minute or so…not so in OS X, even after I get porting software working), but if you just need to get work done, it’s quite nice, almost rivaling my own Windows desktop (If I had to use Explorer for a shell, I’d not run a Windows for anything but games ). The performance gap has pretty much closed at this point.

2006-01-31 12:35 pm
biteydog
SORRY – got posted twice by accident – Can somebody nice remove the first one please? (and this?)
2006-01-31 1:44 pm
nimble
Why the heck does anyone pay any attention to closed-source synthetic benchmarks? They’re totally useless for comparisons across platforms.
There’s no telling how well their code is suited and/or optimised for each platform. You can’t even be sure that the code for each platform actually does the same thing.
And what exactly keeps those geekpatrol people from publishing the source code and the compiler settings for their “benchmark”?
2006-01-31 2:06 pm
ultrajimmy
Every time an article pops up like this it just irritates the hell out of me. If you care remotely about getting the best performnce for an application, you use an optimised compiler for that platform, as well as some well chosen librarys. That means you don’t use the standard librarys which come with your compiler – with a few exceptions they simply won’t be taking advantage of the hardware features of the processor.
There is nothing wrong with this sort of “oooh look the Xeon is thrashed on Integer performance by a Mac Mini”, but I just wish it wasn’t put across as a serious test.

2006-01-31 2:42 pm
evangs
The problem with such an approach is that it is highly unrealistic. I’m assuming that you mean the Intel C++ compiler when you talk about optimizing compilers. How many people actually use that compiler? How many use GCC? That alone makes using these esoteric compilers questionable.
Let’s assume that using GCC, the G5 beats the Xeon, but when using ICC, the Xeon wins. What does this tell you? That the Xeon is better than the G5? No, it doesn’t. All it tells you is that ICC generates better code on the Xeon than GCC. When benchmarking, standardize as many factors as possible (i.e. compilers, OS, etc). Using esoteric compilers is out of the question.

2006-01-31 5:46 pm
rayiner
It’s not just that. The claim “anybody remotely concerned with performance uses a specialized compiler” is crap. What does Id use to compile Doom? Hello! Visual C++! What does every Cocoa app on OS X use, regardless of how performance-sensitive it is? GCC!
The quality of Intel C++ versus XLC, etc, is largely academic. Very few applications that users will encounter are compiled with those compilers.

2006-01-31 6:41 pm
ultrajimmy
OK let me clarify myself here – I was really speaking about a much lower level than applications like Doom. The examples they are giving here are broad brush comparisons of Int, FP and Memory performance. Now of course talking about which compilers are being used when someone is using Doom as their benchmark tool is pretty pointless. But when they are using benchmarks to measure and compare some fairly fundamental aspects of the CPU, it would make sense to try and get the best performance out of each CPU. Hence the comment about using an optimised compiler.
If you are going to quote me, please do so correctly:
“If you care remotely about getting the best performance for an application, you use an optimised compiler for that platform…”
The reason I’m nitpicking is because (and I didn’t make this clear enougth – duh me), I am NOT referring to general purpose day to day computing. When I talk about application – I mean one you wrote yourself and now you’re off hunting for a few systems to run it on. I really didn’t make myself clear, and you called me on it, bah. Obviously this is all nonsense for day to day computing, but to compare integer performance between processors you need to optimise for each different architecture. Imagine if you were comparing floating point performance between cell and a mac mini, but didn’t bother using the VPU’s, how could you conceivably say that the result that you got was worthy of comparison?
I guess what wound me up was seeing a Mac Mini beat a DP Xeon on Int, which even the biggest Apple fanboy must see is complete rubbish. There is something very broken with their benchmark, and I would rather assume its their compiler than inept programming.
GCC by the way has a lot of code contributed by IBM, so of course they will have optimised it for PowerPC, although the Visual Age compiler would give better performance for these benchmarks (but then of course they would need to be run under Linux…).
Bottom line is, its really very hard to tell which processor is “best”, what you need to do is get the code you want to run, and work with the vendors to get it running as fast as possible on their platfrom, then see which ones give you the best results.

2006-01-31 7:10 pm
rayiner
But when they are using benchmarks to measure and compare some fairly fundamental aspects of the CPU, it would make sense to try and get the best performance out of each CPU.
The purpose of the benchmark is to measure the prospective performance of a CPU running real-world code. If it is known that the real-world code will be compiled using mediocre compilers (as most desktop/workstation code is), then benchmarking it with a super compiler doesn’t make a whole lot of sense. It tells you the the theoretical limits of a design, but it doesn’t tell you anything about what you can expect from it in practice. CPU design involves a trade-off between compiler complexity and CPU complexity. Depending on your target market, the details of the tradeoff have to change. In the scientific computing market, you often get to run highly optimized code generated by special-purpose compilers, so putting more of a burden on the compiler is okay. This is the trade-off IBM made in the POWER4/5 design, which the G5 inherited. However, in the desktop/workstation realm, you’re usually dealing with people who are more concerned about porability, standards compliance, and availability more than absolute performance. There, you must expect code to be compiled with commodity compilers. Thus, the appropriate tradeoff lies towards having the CPU pick up some slack from the compiler.
Understanding this tradeoff is critical in practice. While 20-30% differences doesn’t matter a huge deal in reality, a 30% performance delta will kill you marketing-wise. It will mean that your $500 chips will be at the performance level of $250 from your competitors, and that’s something that nobody can sustain for very long. One of the things that made the Opteron so successful is that its designers realized they were in a hostile software environment. They knew the code they’d be expected to run was going to be optimized for the Pentium III and Pentium 4, and if they were 30% slower in benchmarks because of that, Intel would kill them in the market.
“If you care remotely about getting the best performance for an application, you use an optimised compiler for that platform…”
By your logic, Id doesn’t care about getting the best performance from Doom. That seems like a silly statement to me.
When I talk about application – I mean one you wrote yourself and now you’re off hunting for a few systems to run it on.
That doesn’t mean I won’t use GCC on those platforms. Programmers don’t like to switch around compilers, for various reasons. They get used to the debuggers in their existing ones, they are familiar with the existing ones quirks and defects, and their code is often dependent on the level of standards-complience in their compiler. Switching compilers is often a lot of pain for a developer, and usually not worth the extra 10-20% in performance you could gain from the switch.
Imagine if you were comparing floating point performance between cell and a mac mini, but didn’t bother using the VPU’s, how could you conceivably say that the result that you got was worthy of comparison?
That could be entirely justifiable. When a CPU designer makes a “weird” architecture like the Cell, they run the risk of people not using its extra features. If your job is to write portable scientific code, testing the Cell without the VPUs is a perfectly legitimate thing to do — it shows that the Cell is crap as a general-purpose CPU, which is the truth. Hardware exists to serve software. Performant hardware that software can’t readily take advantage of is near useless.
Your idea that you get code to run as well as possible on each platform is becoming outmoded. Platforms are becoming commodity, and software is becoming platform-agnostic. Nobody is going to super-optimize Mozilla or OpenOffice, and even things like Matlab and CATIA are only optimized for the major platform. More and more code will never see hand optimization, and will be dependent on whatever optimizations a JIT compiler can bang out in 100 milliseconds. The time is rapidly approaching when if your CPU can’t run generic code fast, nobody will pay attention to it. That’s exactly what happened with the G5.
Edited 2006-01-31 19:13
2006-01-31 8:08 pm
ultrajimmy
The purpose of the benchmark is to measure the prospective performance of a CPU running real-world code.
I read it as more of a comparison of the processors overall perfromance.
By your logic, Id doesn’t care about getting the best performance from Doom. That seems like a silly statement to me.
Hell, even I’m not that stupid! Id have to sell copies of the game to run on a wide variety of platforms, hence there is no point in limiting the potential market by selling code only 2% of people can use.
That doesn’t mean I won’t use GCC on those platforms. Programmers don’t like to switch around compilers, for various reasons.
True, although how hard is it to learn a new tool, and if your code is that far from the standards to begin with you gotta wonder…
That could be entirely justifiable. When a CPU designer makes a “weird” architecture like the Cell, they run the risk of people not using its extra features. If your job is to write portable scientific code, testing the Cell without the VPUs is a perfectly legitimate thing to do — it shows that the Cell is crap as a general-purpose CPU, which is the truth. Hardware exists to serve software. Performant hardware that software can’t readily take advantage of is near useless.
Please don’t mistake me for a Cell fanboy, I was using it as an illustration, I am well aware of the Cell’s uselessness as a General Purpose processor. My point was rather that if you are comparing best possible performance of a processor, you should use all the features availiable, ie Altivec, SSE1/2/3, 3DNOW etc. Some of the specialised compilers will be able to auto-vectorise some of your code, which of course will give you a performance boost.
I think we are now debating general purpose v specialised applications. My point was to compare the integer performance of one processor with another, it would be appropriate to compare the best performance you could get on each processor, regardless of how you run the code.
There are many benchmark articles on the net that cover real world comparisons, but this one is comparing Int FP and Memory performance across processors, I think they should have made more of an effort to accurately measure the actual max performance of these aspects of the processors reviewed.
More and more code will never see hand optimization, and will be dependent on whatever optimizations a JIT compiler can bang out in 100 milliseconds. The time is rapidly approaching when if your CPU can’t run generic code fast, nobody will pay attention to it. That’s exactly what happened with the G5.
I happen to think that in a few years time (well, 5 to 10..) we will see a return to processors where there is a higher priority put on having a good compiler – eg EPIC (It’s crap now, but its gonna be round for a long time yet – company’s as big as HP and Intel don’t throw money away without figuring out how they will get it back.). Either that or quite a change in how processors work, and I don’t mean more cores.
I promise thats not Flame Bait….anyway I’m off to the Pub.
Edited 2006-01-31 20:08
2006-02-01 12:27 am
Get a Life
Am I the only one that remembers that id shipped the Linux port of Doom 3 without the hand-optimized SSE2 code in the Windows version? Or that the Linux version shipped with a 10-25% performance difference to the Windows version?

2006-01-31 2:12 pm
eamon
This benchmark is useless.
– it doesn’t specify the compiler or settings used to compile the source code
– it isn’t clear that the source code they’re compiling really is the same across the various platforms; they should probably just publish it and be done with it
– the performance differences are very very large and suggest that there is a software, rather than hardware, issue which is causing problems.
– the tests are needlessly synthetic with tiny code sizes, and were mostly tested on macs. This suggests the developers were mostly acquainted with macs and might mean the test does things in an idiomatic (and thus well-optimized fashion) for macs, but not for pcs.
– the tests flatly contradict other such test results, which are much more convincingly executed, suggesting further research is necessary to guarantee there’s nothing fishy going on (probably just innocent mistakes, but still)
– the platforms that were tested are very very badly defined, and badly chosen. There’s no mention of motherboards or chipsets, memory timings and other details which probably don’t have much impact but should be verified for fishiness. Furthermore, very similar processors aren’t present, i.e. the core duo on the mac wasn’t tested under windows even though windows runs on this processor as well. Had they released source code and compiler settings, it would be trivial for others to verify the accuracy of the test results across those two platforms by themselves…
– They’re changing so many variables between platforms, that you really can’t say why the benchmark runs faster on one system than another. Furthermore, since this is a completely synthetic, and highly unrealistic testset, you can’t really consider this to be a platform test either because they aren’t testing the platforms as any user would ever use them.
In summary: this benchmark is unverifiable, returns results which are very surprising given other benchmarks, doesn’t test platforms because it’s running a completely synthetic testset, and doesn’t allow isolation of causes, meaning you can’t identify which component in a system causes performance differences.
Useless!