Geekpatrol benchmarks Rosetta performance, and concludes: “I’m impressed with Rosetta; Geekbench performance running under Rosetta is 40% to 80% of what it is running natively. Plus, running Geekbench under Rosetta is comparable to running Geekbench natively on a Power Mac G5 1.6GHz (our baseline system), at least in the single-threaded tests.”
The good side of control freak Apple is that they plan their moves way ahead and don’t go to market on half baked technologies. Quite impressive.
huh? Since when was Apple a “control freak” ? The parent post sohuld be pointed down but OSNews is still broken.
Apple likes to control everything they do.
iTunes only supports iPods, iTMS only works on iTunes, only Apple can make MacOS compatable systems, etc….
I’m not saying that any of this is bad, not saying it’s good either, just pointing out how they like to control.
Its true Apple do like to have a lot of control. What we got to learn is the reason we Mac users do have a great user experience is that Apple do a lot of intergrating on behalf of the end user so it all nicely “just works”. This is their bussiness model. But we got to understand on defence of this is that Apple are at a smaller market share so they have to do a lot of the “parenting” for the user. If/When Apple market share increases their will be a lot more companies out there who will do their own spin on things that it will be very hard for Apple to do a lot of this so called “controlling”. (e.g. another iTMS-rival for the mac, ipod-killer, iphoto-rival).
Microsoft on the other hand bussiness strategy is to give more control over to the other companies. This is good for bussiness but bad on the user since this leads to a lot of broken and different approaches to interfaces. (ironically though Microsoft get told off for doing more controlling than Apple but that another story)
Both strategies have good and bad points. Im not argueing for either side.
For better or worse apple is very controlling. Amost totally controlling.
I’ld much rather an Apple that still used the Gx processors rather than the dead-horse-being-flogged-x86-thing called coreDuo (or whatever CISC we are going to see). This article is just `the bitter coffee of sympathy’ for those who own Gx based macs. Too litlle too late anyways. I’ve been told that with one of the latest patches for OsX Apple actually crippled the performace in some areas for Gx macs. Even if such things haven’t happened yet, they soon will. And I believe that simiar universal-binary-related performance hits will become more and more evident on Gx macs as time goes by. I don’t mean to rant but Apple is using a very `nice’ way to talk a mac-owner into to the new x86-arch. Very persuasive ways. Not sleazy at all.
oh don’t start the whole CISC vs RISC thing. its pretty pretty much a dead argument right now.
As someone who owns both a PowerPC and Intel Mac, I’m going to disagree entirely with you.
The Intel Mac slaughters it in responsiveness. Seriously.
Safari feels so fast its like IE for Windows… as are most other things.
This thing about responsiveness *might* be true – this I give it to you. But have you ever asked yourself `why?’. You don’t prove *anything* by example. Apple was given all of the Intel’s processor specs and made *all* possible optimizations for the x86. ALL OF THEM. They kicked the hell out of the processor to achieve the result you are seeing. In the case of the Gx processors all binaries were optimized for memory footprint (!) and still they run suprizingly well. Do you know how a binary runs on Intel processors when it’s optimized for memory footprint? Believe me when I tell you mate … I wish I didn’t. Very few `out-of-the-box’ binaries where taking advantage of the full potential of the Gx processors (altivec e..t.c.). As if that wasn’t enough we get benchmarks where the Intel processors are faster e.t.c. and everybody overlooks that the machines that the Intel processors run on have faster memory and stuff. Let’s optimize the heck out of a binary for the most recent G processor (IBM has rolled out one more) give it a fast memory and _then_ let’s see `who is who’. Don’t buy into what Apple and `the world’ tells you. Gx processors are not worst *performance* wise – far from it. They are worst *marketing* wise (marketing rules engineering not). And yes Gx processors had their quirks and problems (memory latency e.t.c.) but the fact remains that inspite of all the lack of funding in R&D (which doesn’t compare to Intel’s fundings in R&D not even by par) they could look any x86 processor straight in the eyes. Also compare BIOS and OpenFirmware. Compare the motherboards that Intel processors run on with those of Gx, Compare all those silly limitations that x86 processors have. Why should we do everything the hard way? Don’t you think that you are _imposed_ to put up with 3rd and 4rth class quality stuff here?
Jeez – don’t forget your tin foil hat.
No one says that PPCs are worse performance wise, but the PPCs you can put into Mac minis, laptops and so on are.
Intel Macs don’t use BIOS anyway.
Edited 2006-03-01 15:47
“Intel Macs don’t use BIOS anyway. ”
Thx for giving me a chance to clarify this … I was meaning to contradict the `what most of us could have’ with `what most of us get’. Yes intel macs don’t use bios but they suffer from the quirks and hilarious limitations of the x86 processors.
Anyways the whole point I was trying to make is how sometimes I get the feeling that we get what we’re given by Intel/IBM/Apple/whatever all happy and excited without questioning or doubt. I don’t mean to preach or crusade but it all sounds like compromising and closing eyes before a truth that may be hard to face. But that could just be me ofcourse.
If you think you know how to build a better processor than IBM or Intel, by all means do so. I will look forward to the improved performance in my next computer. After all, “Only those who attempt the absurd can achieve the impossible.”
Please release us from the shackles of dogma. I am ready to open my eyes and face the terrible TRUTH.
P.S. Will you deign to enumerate, oh enlightened one, the “hilarious limitations of the x86 processors?”
Apple are moving to Intel end of story. This whole G+ VS. intel debate is not good really. Intel overall strategy plan is different to IBM’s PPC one. Sure IBM may have some of the deepest pockets on earth for cash, but Apple looked at their long term strategy and overall it did not fit in with theirs. Remember the G5 Portable low power chip? Neither do i. Lets move on from this whole intel vs. PPC debate, its done, its finish. If you want PPC get an IBM server with Linux. For me its MAC OS X that sells the mac. Sure Apple have optimized it, and why not? When the other developers read their Intel manuals and use their intel compilers expect other optimized native apps that use the full power of the CoreDuo, its a new chipset for these guys give them time and we will ask these questions again 12 months down the line. So what about this non-optimized apps on the G5, yes the G5 is a very nice chip but in long term strategy it did not fit in with Apple’s goal, so they switch over and went on then to use the full power of Intel, finally get some decent apps that use the proccesor.
Edited 2006-03-01 15:58
This really shouldn’t come as a suprise if you think about it.
OS X as we know is, for all intents and purposes, NeXT OpenStep 6 or 7. OpenStep was heavily optimized to x86 (and Sparc) well before it adopted the PowerPC and the rather questionably performing GCC compiler for PPC.
The thing that gets overlooked here is Mach and assumptions that Mach makes regarding calling conventions of the processor. Considering that Mach was designed in the days of the Moto 8030’s and the Intel 8086, it would be logical that it made assumptions based upon those designs.
Those assumptions are invalid on PowerPC and performance suffers accordingly. PowerPC was however where Apple was, and this was a transition that was clearly in the works since NeXT took over Apple (yes I know that’s not the correct way to say it, but it is more or less the truth of what happened), and now you are seeing some of the reasons. The PowerPC is a faster and more efficient chip, that’s not really in question. The problems surrounding Mac performance on the PowerPC are at several layers, and are difficult to quantify reliably. But keeping in mind that the raw performance numbers of the x86 designs are inferior to those of the equivalent Power designs, the design assumptions and compiler technologies equate to a ‘better’ end user experience which is ultimately the core principle of the Mac OS.
None of this changes the fact that there will be a group of people that feel left out in the cold by this transition, there always are. It also doesn’t change the fact that PowerPC compilers will get better, and other architectures with more easily changeable foundations like Linux will continue to propser on the PowerPC, it’s just that OpenStep, or Mac OS X as we now know it, really wasn’t suited to getting the most out of the PowerPC. The truth is that for raw server performance, it will probably lag slightly behind Linux on x86 as well. It’s unlikely that any X-Windows based user interface will ever outpace the Mac user interface. What will be interesting is to see how the battle of the eye candy will play out with Vista and Leopard in the next year though.
Expect the Video chipset vendors to be in for a rich couple of years cashing in on these high GPU demand systems, and watch to see how the two vendors work to balance speed and system integrity with video and network driver’s and interfaces…
The PowerPC is a faster and more efficient chip, that’s not really in question.
PowerPC isn’t a chip. It’s an instruction set. There are low performance PowerPC chips and high performance ones. Same for x86. It’s just that the highest-performance x86 chips are faster than the higest-performance PowerPC chips.
The problems surrounding Mac performance on the PowerPC are at several layers, and are difficult to quantify reliably.
Not really. The problems surrounding Mac performance can be isolated to 3 major points:
1) The G5 has architectural features that give it rather low IPC on integer code.
2) The G5’s architectural features make it hard to optimize for, and GCC isn’t up to it.
3) The G5 is coupled with a relatively poor northbridge that has very high memory latency.
But keeping in mind that the raw performance numbers of the x86 designs are inferior to those of the equivalent Power designs
The raw performance of x86 designs is superior to that of equivalent PowerPC designs. POWER5 still has a leg up in SPECfp, but that’s about as relevant to mainstream x86 chips as Itanium is to mainstream x86 chips.
Edited 2006-03-01 18:04
I guess I didn’t make that clear enough, the Mac platform has performance issues beyond the raw performance of the PowerPC chip design, that are the result of choices at layers above the chip design.
I also maintain that the Power family of chip designs are superior designs to equivalent x86 designs, but that superiority is irrelevant as it doesn’t translate to user experience performance. For that matter, the Alpha was superior to both of the formentioned designs as well, and yet it’s no longer even in production.
Being better in raw numbers doesn’t translate into a positive user experience, which is the heart of the matter where the Mac platform is concerned.
The items you mentioned are minor contributors, and I think can even be legitimately discredited by comparing the performance numbers of comparing roughly equivalent x86 hardware to PowerPC hardware running Linux on both platforms.
The issues that concern the Mac users aren’t those numbers, the relevant data for the Mac user is comparing the user level performance of an Intel Mac to that of a Power Mac, and that’s where things get interesting. I hope to get some hard numbers together this weekend, but I’ve mentioned in other posts that subjective evaluation concludes that an iMac CoreDuo 1.6 with it’s rather crappy 5200RPM 120gb seagate is eating my PowerMac G5 (dual 1.8) with it’s 7200RPM 120gb seagate for lunch, and leaving leftovers for dessert. Admittedly, that’s subjective, and based upon development and light testing of an application that I’m making Universal.
This is where things get quite interesting. Under Rosetta, the CoreDuo performance is roughly that of the 1.8 with one processor turned off (eg, my quick test runs finish within a second or so of each other), while native the iMac is usually sitting idle about 35 seconds (of a roughly 5 minute process) before the Power Mac with both processors on.
What does this have to do with SPECint and SPECfp? not a whole lot, it has everything to do with user experience though.
I hope that clarifies the point I’m trying to make, I’m not the best of writers when it comes to getting my point across, I guess I should go back to writing code, which I do much better than this :-).
I guess I didn’t make that clear enough, the Mac platform has performance issues beyond the raw performance of the PowerPC chip design, that are the result of choices at layers above the chip design.
I think you’re wrong on this count. The fact that NeXT was optimized for 68k doesn’t mean it runs particularly well on x86. The main aspect of the Mach-O ABI that’s optimized for 68K is the assumption of IP relative addressing, something which neither PowerPC nor x86 has. AMD64 has it, but OS X doesn’t run on that yet. The other various performance pitfalls of OS X are kernel-specific, not processor specific. Mach context-switching will remain slow, whether its running on a CISC or a RISC. Moreover, the big chunk of code that is responsible for most of the “feel” of the user interface, Aqua, was rewritten for OS X, which means it was presumably optimized for PowerPC from the beginning.
I also maintain that the Power family of chip designs are superior designs to equivalent x86 designs, but that superiority is irrelevant as it doesn’t translate to user experience performance.
The POWER family is irrelevant here. We’re talking about PowerPC. Today, that means the G4 and G5. Neither design is superior to competing x86 chips.
The items you mentioned are minor contributors, and I think can even be legitimately discredited by comparing the performance numbers of comparing roughly equivalent x86 hardware to PowerPC hardware running Linux on both platforms.
The various benchmarks that have been conducted comparing the same hardware under Linux has shown the current generation of x86 chips to be generally superior. SPEC backs up this conclusion. A look at the internal architecture validates the benchmarks. Let’s compare the Core Duo and the G5, piece by piece:
1) The Core Duo is 3-issue, the G5 is 4+branch. However, the Core Duo’s effective issue width is higher, because each of those operations can be a fused micro-op (e.g: memory + ALU). The practical issue rate of the G5 is lower, because those 5 total issue slots can only be filled according to complex group formation rules. One reason why POWER5 performs much better than POWER4 on integer code is that those formation rules were tweeked heavily.
2) Both processors have two integer units. The G5’s are pseudo-symmetric, while the Core Duo’s are asymmetric. However, the G5’s have two-cycle latency while the Core Duo’s have one-cycle latency. This is a substantial disadvantage. It means that to get full integer performance out of the G5, your code has to exploit 4-way parallelism. On the Core Duo, it only has to exploit 2-way parallelism to get the same performance. On most integer code, anything past 3-way parallelism is asking too much. There is also the fact that the G5’s two units are statically load-balanced (two dispatch group slots go to one unit, the other two go to the other unit), while the Core Duo’s two units are dynamically load-balanced. The net result of all this is that in the worst case, code with 2-way parallelism with instructions ending up in the even numbered dispatch slots, the G5’s integer units can do only 1/4th the work of the Core Duo’s.
3) The Core Duo has a shorter pipeline (12-14 stages versus 16 stages), and superior branch predictor. The Core Duo’s branch predictor was designed to keep the P4’s 20+ stage pipeline full, and has features like indirect branch prediction that loop detection that are very helpful for certain types of code.
4) The Core Duo’s memory latency is about 2/3’s that of the G5’s, and its cache line size is half as large. On floating-point code, the high memory latency of the G5 means peak performance is hard to achieve except for “streaming media” type computations. On integer code, the large cache-line size doesn’t mesh well with the small objects typical of such code.
5) The G5’s dual FPUs are superior to the Core Duo’s single FPU in almost every way, save one: their latency is 50% higher. Well-scheduling FPU code can still take advantage of the theoretical power of the G5’s FPU, which shows up in benchmarks, but unoptimized FPU code will have a much harder time.
None of these factors, individually, are deal-breakers. Their impact is perhaps 5-10% apiece. All together, however, they account for the 40% gap between the per-clock performance of the Core Duo and G5 in integer code.
All of this makes sense if you think about it. Why are the Core Duo Macs reported to be snappier? Because the UI is almost completely integer code, and the Core Duo is just plain better at it. Why does Safari render faster on the Core Duo? Because KHTML is all integer code, and spends a lot of time making random-accesses to a large graph structure that doesn’t fit in cache! Basically, because UI-type code hits all the weaknesses of the G5, and takes advantage of none of its strengths!
One of the most excellent posts I’ve seen on G5 vS CoreDuo. Hats of to you rayiner – I’ll save your post as soon as I hit `submit’ here. Just a quick question. The two processors have a 40% gap right? How much founding in R&D and years of development did Intel put in it’s processors to achieve this and how difficult would it be for the G processor to close it (if not surpass it). I think you know the answer as well as I. You mentioned:
“One reason why POWER5 performs much better than POWER4 on integer code is that those formation rules were tweeked heavily. ”
This is what I am trying to say: Every next vintage the performance skyrockets. Really how difficult would it be to improve RISC processors like these? My guess: `(relatively) not at all’. Sticking with a processor that is inherently hard to improve is what I find suspicious and what bothers me.
You seem to believe that PowerPC processors are RISC and x86 processors are CISC. This is not true. RISC and CISC are textbook idealizations that do not exist in pure form in any modern processor.
All advanced processor designs are “inherently hard to improve.” What evidence do you have that x86 designs are inherently harder to improve than PowerPC designs? The actual rate of recent processor improvement suggests the opposite.
This is what I am trying to say: Every next vintage the performance skyrockets. Really how difficult would it be to improve RISC processors like these? My guess: `(relatively) not at all’. Sticking with a processor that is inherently hard to improve is what I find suspicious and what bothers me.
That’s the thing. x86 chips aren’t “inherently hard to improve”, at least not within the design envelope which most workstation/server processors occupy. These are generally aggressively OOO designs with deep pipelines. Once you incur the complexity of OOO and a long pipeline, a few extra stages to handle ISA translation isn’t that bad. Indeed, it’s enough of a win that even RISC chips like the POWER4/5 are doing it now, because PowerPC isn’t quite RISC-y enough (not all instructions are 2-src 1-dst).
So once you’ve got a few pipeline stages devoted to translating x86 or PPC to the internal ISA, improvements to the chip are decoupled from the limitations of the ISA*. At that point, you’re competing on the quality of the internal architecture and the performance of the process on which the chips are fabbed. Like with most things, these get better the more money you throw at them, and the x86 world simply has more money to throw at them.
Now, outside the world of highly OOO chips, things are different. A shallow-pipeline, in-order x86 simply wouldn’t perform as well as a shallow-pipeline, in-order RISC. Things like Niagra would likely not be possible using x86 cores. However, for at least the forseeable future, highly-OOO chips will remain the standard for the desktop/workstation/server market.
* To be fully accurate, it’s not completely decoupled. The need for the processor to be able to reconstruct the original ISA machine state puts some limitations on the internal ISA. These limitations are fairly minor, however, and usually hits paths that are slow relative to the speed of the core (eg: exception handling, interrupt handling, memory access, etc). At the u-op level, x86 doesn’t look much different from a plain-jane RISC with fancy memory addressing modes.
A shallow-pipeline, in-order x86 simply wouldn’t perform as well as a shallow-pipeline, in-order RISC.
Sure? The 486 and the Pentium were shallow-pipeline in-order and compared alright to their competitors at the time, except for the tweaked-to-the-last-gate Alpha perhaps.
Of course the core itself would be quite a bit bigger, but it makes up for that in reduced instruction cache/memory requirements. Where x86 really falls down for embedded stuff is the increased power consumption that comes with the more complex core.
The 486 was about half as fast as competing MIPS chips, while the Pentium was almost as fast in integer, and about half as fast in FP.
From everything I’ve read over the years, the Power and the PowerPC are derivations of the same core chip design, though the Power traces it lineage more to the PPC620 design while the PowerPC’s up to the G3/G4 traced more to the 601/602 designs. I could be wrong, but my understanding was more than your assertion that the Power and PowerPC are completely different families of processors. The Power4 and Power5 are chips optimized for different tasks, but the instruction set is still essentially that of the PowerPC 620. The major difference between the PowerPC 970 and and the Power4 as I understood it was that the 970 was a single core implementation of the Power4 with ‘Altivec’ extensions added to the die. As such, they share the same core design and implementation correct?
Anyways, about the 68k, actually, though NeXTStep was on the m68k chips when it was on the black hardware, it went x86 when it became OpenStep, and has been improved and enhanced in that environment for at least 10 years now, while it’s had less than half that time on the PowerPC.
Now Your assertions about Quartz not leveraging the FPU are probably fairly accurate, but your point only helps make mine. If FPU performance doesn’t directly effect the user experience, then users probably won’t care, which is really the cruz of my point of view. Users really only care about if it’s faster for them. You also suspect that Aqua/Quartz was optimized for the PowerPC. I don’t think it was. It’s still essentially a Postscript rendering engine, which is what the old NeXT presentation layer was (Display Postscript), which I don’t think was particularly optimized for any platform, and relied upon the underlying technologies for speed. Quartz Extreme on the other hand is optimized, but not for the CPU, it’s optimizations are for the GPU.
That does raise some interesting questions about how the new Mac Mini’s will perform with the Intel GPU’s, but that’s a discussion for another time.
I look forward to your comments, and corrections of my understanding.
As such, they share the same core design and implementation correct?
Absolutely correct. POWER versus PowerPC is, for IBM anyway, a matter of marketing, not ISA. That difference, however, is a very key one. While the G5 and POWER5 are very similar cores, they are two key differences:
1) POWER5 fixes a lot of deficiencies in the POWER4/G5 design. The G5 is very much a chip with untapped potential. By modifying the group dispatch rules and increasing the number of rename registers, IBM improved the integer performance of the POWER5 to Core Duo levels (per clock), while offering class-leading FP performance.
2) POWER5 is a very high-end chip, and has a very high-end memory subsystem to accompany it. Part of the reason it gets such steller benchmarks is because it has 16 GB/sec of memory bandwidth through an integrated memory controller, and 36MB of L3 cache.
At the end of the day, however, the G5 is a G5, not a POWER5. IBM didn’t consider the desktop market important enough to keep the G5 updated with the POWER5 technologies, and they could never afford to put a POWER5-class memory subsystem in a G5-class system anyway. So while both the POWER5 and G5 are PowerPC chips, only the latter is really relevant to Apple and thus to this discussion.
You also suspect that Aqua/Quartz was optimized for the PowerPC. I don’t think it was. It’s still essentially a Postscript rendering engine, which is what the old NeXT presentation layer was (Display Postscript), which I don’t think was particularly optimized for any platform, and relied upon the underlying technologies for speed.
First, Quartz isn’t based on the old NeXTStep code — it’s a rewrite. Also, Aqua, and Quartz in particular, was very heavily optimized on PowerPC. Software-rendered Quartz got 5x faster in Tiger, according to ArsTechnica’s benchmarks. That’s why Apple’s percieved UI performance has been getting better every release since 10.0, despite the addition of features.
Interesting to note that they don’t appear to run the test a second time.
Apparently Rosetta will cache the translated code and make it appear “snappier” second time around.
That would be a very interesting real world test, if Rosetta is caching the results a second through to show increased performance. Maybe we’ll see that in a future benchmark.
The reason that the PowerPC apps were compiled for memory optimization was due to the fact that they were to be running on machines with only 128 MB of memory – they would actually run faster unoptimized for speed if it meant not having to page to disk. The newer machines are different. They have not that low memory factor to take into account.
Most if not all apps are optimized for *size*. If you wish search msdn, in and you can find an article about ms windows being optimized for size!
Browser: Nokia6230/2.0 (05.40) Profile/MIDP-2.0 Configuration/CLDC-1.1
Ofcourse I agree with you. They are optimized for size. Ms compilers (just like all compilers other than icc) don’t know any better. None but Apple has the specs from Intel to know how to optimize an executable to run well on Intel processors. So it’s rather an `I can’t’ rather an `I don’t want to’ case for compiler-vendors when it comes to optimizing for speed on Intel processors. Optimizing for performance using a compiler other than icc won’t get even close to the full potential of an Intel processor (see benchmarks of any compiler against icc). The _only_ Os out there that is _fully_fully_fully_ optimized for performace on Intel processors is OsX. Maybe Gcc with it’s latest release produces better executables (in regard to execution-speed) for Intel processors, but from what I’ve heard it’s nothing too fancy compared to the past (could be wrong – in a month or so we’ll know for sure).
None but Apple has the specs from Intel to know how to optimize an executable to run well on Intel processors.
Nonsense. Intel’s optimization reference manual is publicly available. Sure it won’t have every detail in it, but what exactly do you think Apple could do with the information if Intel did really give them special access? They don’t even write their own compiler!
And with out-of-order execution the effects of optimising for a particular processor are limited anyway. icc outperforms gcc on AMD just as much as it does on Intel, which means its advantage isn’t due to Intel-specific optimisations. It simply produces better code.
The _only_ Os out there that is _fully_fully_fully_ optimized for performace on Intel processors is OsX.
You mean really really really fully? Funny that, considering that Apple uses gcc to compile MacOS, because icc doesn’t even support Objective-C/C++ (yet?).