IBM Previews the POWER6

Thom Holwerda 2006-10-17 IBM 37 Comments

At the MicroProcessor Forum, Dr. Brad McCredie of IBM continued to tease out particulars regarding the POWER6. The presentation discussed a lot of general microarchitecture features, but did not reveal many specific details; a full revelation of the microarchitecture will likely have to wait till ISSCC, next February. However, from the details that were revealed, it is clear that the POWER6 inherited many characteristics from its predecessors, yet made substantial improvements in others.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

37 Comments

2006-10-17 11:45 pm
cptnapalm
One more reason to dislike close source: you become limited as to hardware architectures because your binary blobs won’t work. It doesn’t matter if POWER 6 costs $3.95, is ten times faster than a Core 2 Duo and it makes a mean cafe mocha with whip cream. Will it run nVidia or ATI’s binary drivers? Nope. Will they port to it? Nope.
Is this trolling? Maybe. Didn’t intend on doing so, but that is what this article called to my mind.

2006-10-17 8:16 pm
NiceGuyEddie
…or will this mean a significant speed hike in their MPP machines? Desktop machines aren’t everything.
2006-10-17 11:54 pm
suryad
Aren’t Intel and AMD processors closed source as well? Are you talking about what is accepted and is the norm ie x86 vs PPC? Because that is the reason why x86 is dominating on the market, not because they are open source which they aren’t.

2006-10-18 12:04 am
cptnapalm
What I meant was that that you can’t just recompile software for PPC and run the thing on a new architecture as you don’t have the code to recompile. We’re stuck on x86 because of the need to run programs which were only compiled on the one platform. With the the code, then it could be ported to any newer, better architecture without too much of a fuss.
I could have posted the same thing if the article was about Sun’s GPLed CPU design. It wasn’t meant to be about the openness (or not) of the hardware itself.
Like I said, it may have been a troll post, but that’s what came to mind upon reading it.

2006-10-17 11:52 pm
pcummins
You can actually run code from PPC to x86 or vice versa using advanced binary translation, dynamic recompilation or emulation. You get a 4x speed slowdown on average with dynamic recompilation, that would be less if the code was recompiled with binary translation. In fact, in some tests, binary translating from x86 to x86 for hotpaths and some simple optimisations gives a modest speed increase (maybe 5-25%) say in the Dynamo-RIO system.
Basically, x86 is garbage. It hangs around like a bad smell to cause misery to programmers and holds people back from real improvements. It’s a wonder Intel keeps up, basically their CPU’s are RISC-like underneath with a shockingly bad x86 architecture thrown on top. I’d imagine if a better, more reliable architecture turned up that everyone suddenly jumped onto Intel could probably rewrite the microcode to run it in a matter of weeks on their existing chips if they wanted to support it (and it was supported).
The main issue with legacy code isn’t the architecture – it’s the API calls and OS calls that make it a problem. Look at Wine – taken years to rewrite most of the Windows APIs. Executor got a good chunk of the way on the MacOS API’s. GNUStep a good effort at OpenStep.
The fact is the different OSes refuse to play ball with each other API or driver wise. EFI -may- resolve some of the low level hardware problems, but as other people (like Linus et al) are saying, it’s too hard as it reminds them of something they didn’t like (a long long time ago… maybe in a distant galaxy far far away).
Guys like SciTech Solutions (http://www.scitechsoft.com/products/dev/sdk_home.html) are doing a pretty good job of cross-platform drivers (notably video and audio). If a small company can get the job done, just think what Microsoft, Apple and GNU/Linux developers could do with the right incentives.

2006-10-18 6:22 am
Jody
Basically, x86 is garbage.
You know, people keep saying x86 is crap an PPC allows for cheaper/faster/more efficient design but it doesn’t look like AMD or Intel are slowing down to listen.
2006-10-18 12:12 pm
renox
And? What’s your point?
It’s true that x86 won the desktop and server CPU war, but it still doesn’t make it a more eleguant ISA: it’s still crap.
To make an analogy, you can make *any* object fly with engines powerful enough, but that doesn’t make it eleguant as beautifully designed planes.
2006-10-18 1:01 pm
nick
x86 hasn’t won the server CPU war yet…
http://www.tpc.org/tpcc/results/tpcc_perf_results.asp
It is creeping upwards, but isn’t going to get to
the top for quite a while, if it ever does.
2006-10-18 8:23 pm
renox
Benchmarks are only part of the picture: Alpha had very good benchmarks most of the time, it’s still dead.
What were missing to x86 against PPC/Itanium is:
– high end RAS (Reliability Availability Supervision) features, x86 is getting more and more these features.
– bullet proof OS: if Linux isn’t enough for some, Solaris is good enough for most.
One problem remaining with current x86s is the memory: FB memory has power problem and will be dumped, so getting box with many GB of RAM can be a problem (well FB works, but do you want to use a deadend solution).. But replacement is coming.
2006-10-18 2:41 pm
pcummins
It is a more efficient design. PowerPC powers a significant number of embedded devices (games consoles being a biggy). Which, if I might say, work a lot more reliably than the average PC these days it seems. IBM handles the high end with Power 5 and Power 6. Freescale (nee Motorola Semiconductor) handles the embedded market. Just since PowerPC doesn’t keep up with the Joneses (ie, AMD and Intel) doesn’t mean it’s a bad chip, it just means more astute people use it for what they need it to do.
2006-10-18 3:12 pm
rayiner
What real improvements is x86 holding CPUs back from, anyway?

2006-10-18 2:03 pm
the_thunderbird
the power platform is open…

2006-10-18 1:24 am
anonOmus
Well…I don’t think you are trolling.
However, I do think you might be mistaken about the true nature of the problem.
You _can_ recompile code for a different processor architecture. It happens all the time. Take a look at gcc some time.
The problem you are talking about, that of using hardware drivers (aka binary blobs) on a different processor architecture — this is partially a problem of recompiling code, but it is far and away MORE a problem of writing the drivers to fit the driver model of the operating system at hand.
Talk about a nightmare, the driver models for X Window/Unix systems, Windows XP, Mac OS X, and now Windows Vista (which has a completely new and built-from the-ground-up driver model for WIndows) — all of these are vastly different. So the real problem is how do you write a driver from one OS to the next.
That said, it _IS_ very much a problem that proprietary hardware vendors make it difficult for developers to get the information they need to develop drivers for their products — for _ANY_ operating system.
This is, I suppose, an unfortunate consequence of fierce competition in markets such as graphics processors. In that regard, it’s the hardware vendors such as NVidia that make life difficult (probably) out of necessity. CPU makers like IBM and Intel are generally forthcoming with specs for their CPUs.
2006-10-18 7:39 am
gelosilente
i think you’ re right.
2006-10-18 7:55 am
l3v1
Will they port to it? Nope.
Uhmm, so there’s no Linux on PPC ? Uhmm, so there’s no os/aix/linux on PPC with working video drivers ? Uhmm, so we now should port video drivers to a processor architecture, and not compilers and kernels instead ? Uhmm, do you smell like a bad morning foss rant ? Uhmm, do you even have anything worthy to say about power6 ?
Edited 2006-10-18 07:58

2006-10-18 10:04 am
Lobotomik
What exactly do you mean? You don’t seem to know much about what you’re talking about, or have not read the entry you are ranting about, or have not understood what it says.
There is already Linux on PPC, and compilers and kernels are already ported. There are also working video drivers for either ancient video chipsets or Intel video chipsets which have open source drivers available. It is the availability of open source drivers that made it possible to port them to a different architecture than x86.
What’s not available is nVidia or ATI drivers for Linux on PPC, because nVidia and ATI provide only binary blobs for x86, which cannot be ported to another architecture.
Anything worthy to say about power6? Please, flood me with your wisdom. The post just said that even if it was 10 times better, it sadly was no competition for the x86 in the absence of good video drivers. A bad morning foss rant? Sure, that is exactly what it was. What was yours? An inarticulate babble about nothing.

2006-10-18 1:31 pm
viton
> no competition for the x86 in the absence of good
> video drivers
If you need to play games, buy PS3.
If you need serious hardware, usually you don’t care about GF7950 support!
What you DO care is reliability, etc.

2006-10-18 7:56 am
Dubhthach
>>Will it run nVidia or ATI’s binary drivers? Nope. Will they port to it? Nope.<<
I don’t think IBM particulary cares about ATI/Nvidia, the Power6 is a big Iron chip, basically they are going to migrate all of their big-iron systems to it (i/p/z series) most of these systems don’t even have a graphics head let alone a accelarated 3d card. Other then using it for those systems they’ll use it in their Unix (AIX) servers, they aren’t designed for use in workstation so complaining that nVidia et al won’t port their drivers is abit beyond the point don’t ye think?
2006-10-19 1:40 am
rdean400
You wouldn’t want to run nVidia’s or ATI’s drivers on it, because it’s not built for gaming. It’s built for fast servers.

2006-10-19 3:24 pm
cptnapalm
Whether or not it is good for gaming is not something we will ever be able to find out.
The old Alpha blew away the then current x86 chips in floating point, if I’m remembering correctly. It would have been great for gaming. Actually, you could play an id game or two with it.

2006-10-18 4:17 am
10pound
and I quote
“That said, it _IS_ very much a problem that proprietary hardware vendors make it difficult for developers to get the information they need to develop drivers for their products — for _ANY_ operating system”
./agree
2006-10-18 7:34 am
hraq
Another electron based CPU…I am still waiting for the next generation photon based CPU, that will put all CPUs to shame.
This CPU will benefit only very narrow spectrum of users with servers or very specialized workstations and not only that, but only AIX OS not other OS could run on it and benefit from it. So, they will keep locking it in its cave rather than expose it to the public and show how good it is and make even more money out of it.
I still prefer Sun based CPUs to IBM because they opened their solaris OS. The more they are away from Open Source Software the more they are away from our hearts and mind!

2006-10-18 9:30 am
stone
you are wrong on basicly all your points.
– the entire power/powerpc community will benefit from this. the technology will migrate down through the pipe and a rather large range of different companies will benefit, and so will you as the end consumer.
– it will run several operatingsystems, aix and linux just being a couple.
– locking it in rather than exposing it? go take a look at power.org and see for yourself how available power technology is.
– and solaris will run on powerpc too. besides that, ibm does a lot to opensource in both the shape of technology and software.
/stone

2006-10-18 7:38 am
gelosilente
i’ m impressed, i seems a good piece of hardware.
i’ d like too se also a ppc implementation (like power4 ppc970) expecily used like new ppe in the cell…
2006-10-18 12:56 pm
ector
Sure, x86 might be really ugly, but the old hairy byte-based instruction encoding has about a 33% size advantage over PowerPC, which means better instruction cache utilization. It’s not a big factor, and it doesn’t completely make up for the ugliness of the ISA, but it helps even things out.
And as Intel and AMD has proven again and again, ISA isn’t really important anymore. The size of the nasty instruction decoders, as a percentage of the chip area, is shrinking fast and is already almost negligible.
x86 won and will not go away. It might be augmented with on-die SIMD stream processors much like Cell in the future, but it’s here to stay as the main cores of consumer PCs.

2006-10-18 2:25 pm
michalsc
> Sure, x86 might be really ugly, but the old hairy
> byte-based instruction encoding has about a 33%
> size advantage over PowerPC, which means better
> instruction cache utilization. It’s not a big
> factor, and it doesn’t completely make up for the
> ugliness of the ISA, but it helps even things out.
Not to mention that the variable length of the instructions in x86/x86_64 ISA allows loading of immediate values into register with one instruction only (even load immediate of 64-bit integer into 64-bit register on x86_64 is done with one instruction). Try the same and load 64-bit immediate into 64-bit register of 64-bit PowerPC CPU.
No, I do not like the typical answer, which is “load immediate is evil”
2006-10-18 2:42 pm
rhyder
Does the actual instruction set even matter any more? Do any of the desktop chips that used Power technology actually run faster, cheaper or cooler?
As a former .asm programmer I’m sad to admit it but throughput is probably more important than an elegant front end for the low-level programmer.
2006-10-18 3:10 pm
rayiner
It’s not just that either. From a programmer’s point of view, x86 really is easier to deal with. It’s not particularly elegant, but it’s not complex and opaque either. Even from a systems programmer’s point of view x86 is pretty decent. For example, if you’re writing a linker, handling relocations on x86 is much nicer than handling relocations on, say, SPARC, because x86 instructions can have full-width immediate values.
x86 is somewhat ugly to generate code for, but so what?
2006-10-18 3:24 pm
pcummins
… the old hairy byte-based instruction encoding has about a 33% size advantage over PowerPC
So? So what? Let’s remind ourselves that chip real estate is practically worth its weight in gold (I haven’t checked exactly, but it’s pretty expensive). For that 33% size reduction, Intel has to throw significant amounts of transistors into:
1) Complex instruction decoders to micro-ops
2) Significant space to support caching of decoded micro-ops
3) Complex in-order to out-of-order micro-op rewriters
4) Significant space to support out-of-order execution units and micro-ops dispatch systems
5) Heavy penalties for cache misses (and bottlenecks on instruction decoders if branches are mispredicted)
6) Complicated L1, L2 instruction and data cache systems due to unpredictable data and instruction boundaries (which vary from anywhere from 8 to 64 bits)
Well, you get the idea, there’s probably more. All that extra baggage chews up space that PowerPC (and other maverick CPU designers) have found ways to use more efficiently. If you read the article on Power 6, I note that they expect to clock from 4 to 5 gHz for Power 6. Good thing Intel and AMD fans aren’t chanting “Clock speed is everything” any more, eh?
ISA isn’t really important anymore.
It is to some people, but to the majority of people – ignorance is bliss. Try telling an embedded video player engineer they should be using Intel or AMD desktop chips. I don’t think so. (If they’re smart, they’d use Faroudja or Sigma Designs custom solutions, or something else that’s out there. Maybe an FPGA or two until they can get the ASICs nailed).
The size of the nasty instruction decoders … is shrinking fast and is already almost negligible.
“Almost” is the key word there. I recall reading that back in the good old days (over 25 years ago), a certain engineer who prided himself on component minimisation who managed to out engineer, out price and out design the single largest computer engineering corporation on the face of the planet, and also he did it in a fraction of the time that it took that corporation to fail to accomplish the same task.
The engineer was Steve Wozniak. The product was the Apple II disk drive. Time taken was 1 month for the initial design. The corporation was IBM. Now, the important fact is that Woz managed to make a design so simple and elegant that if records are to be believed, Apple made $415 profit on a drive that cost $495. Woz himself rationalised the simplicity by realising that even though IBM had massive economies of scale, no circuitry is still less expensive than some. (Paraphrased from Fire in the Valley).
So, by all means, please explain to me how some hardware is less expensive than no [unnecessary] hardware that was made redundant by intelligent engineering if you’re playing on a relatively even playing field. I should point out that Woz also popularised GCR and MFM (its less popular and inferior cousin) coding techniques which are still being used today for self-synching recording and playback. Yes, ignorance is bliss, indeed.
x86 won and will not go away. … it’s here to stay as the main cores of consumer PCs.
It probably won’t go away, but I expect people will come around eventually when technology outstrips the IA-32 architecture. I’m working on it in my spare time, so I do hope I succeed eventually to show everyone the way forward.

2006-10-18 5:29 pm
Nicholas Blachford
… the old hairy byte-based instruction encoding has about a 33% size advantage over PowerPC
So? So what? Let’s remind ourselves that chip real estate is practically worth its weight in gold (I haven’t checked exactly, but it’s pretty expensive). For that 33% size reduction, Intel has to throw significant amounts of transistors into:
I hate to tell you but the latest PowerPC ISA specs include multi-length instructions. The ability to include smaller instructions means they not only take up less room in RAM but also less bandwidth and less room in cache – this has a direct performance benefit.
It’s done in a completely different way from x86 so I doubt the decoders will be increased hugely. Certainly nothing like x86.

2006-10-19 2:39 pm
pcummins
Not that I could tell from the PowerPC architecture books (v 2.02). Post a reference (may be a custom ISA addon for one of the PowerPC variants, they add and remove instructions rather frequently).

2006-10-19 3:18 am
rayiner
So? So what? Let’s remind ourselves that chip real estate is practically worth its weight in gold (I haven’t checked exactly, but it’s pretty expensive). For that 33% size reduction, Intel has to throw significant amounts of transistors into:
All chip real-estate isn’t created equally. The effectiveness of the L1 instruction cache is a major performance driver in most designs.
1) Complex instruction decoders to micro-ops
Which Power4/5/6 have as well. Because even PowerPC isn’t quite simple enough for the execution core.
2) Significant space to support caching of decoded micro-ops
The only x86 chip that caches decoded micro-ops is the Pentium-4.
3) Complex in-order to out-of-order micro-op rewriters
Are you talking about the issue queue and dependency scan logic? Any out-of-order design has that, because both RISC and x86 are conceptually in-order.
4) Significant space to support out-of-order execution units and micro-ops dispatch systems
Any good RISC devotes significant space to OOO execution too. There is little reason to believe that OOO for x86 takes more space than OOO for any other architecture.
5) Heavy penalties for cache misses (and bottlenecks on instruction decoders if branches are mispredicted)
And how are these any worse on x86 versus PowerPC?
6) Complicated L1, L2 instruction and data cache systems due to unpredictable data and instruction boundaries (which vary from anywhere from 8 to 64 bits)
The data boundaries aren’t any more unpredictable on x86 than on any RISC that supports unaligned loads (including PowerPC). You do have a point that the L1 instruction cache is complicated by the variable-length encoding.
If you read the article on Power 6, I note that they expect to clock from 4 to 5 gHz for Power 6.
There aren’t a lot of details on Power6, but the initial indications are that the microarchitecture is a lot like Power5. If that’s the case, Power6 is hitting 4-5 GHz because Power5 is a narrower, shallower, and longer design than Core 2. It’s got one or two more pipeline stages, and one fewer integer unit. It’s got the same issue width and reorder buffer size (4+1, and ~100), but in practice both are substantially less because of the group dispatch scheme. It’s also got a much higher TDP. Core 2 could probably hit 4+ GHz quite easily if it didn’t have to scale down to sub-35W laptop chips.
In practice, they’re expecting Power6 to get comparable IPC to Power5+. That means about 800 SPECint/GHz. A 4 GHz, the integer performance will be very good, but will probably not beat the top-end Xeon at the time (probably 3.33 GHz?) It’ll get obscenely high SPECfp scores, just like Power5+ does today, but what do you expect with 75 GB/sec of bandwidth on tap?
Try telling an embedded video player engineer they should be using Intel or AMD desktop chips.
That’s not a function of ISA, but of what the chip design is targetted at. Intel’s desktop chips aren’t appropriate for a video player not because they’re x86, but because they’re desktop chips…

2006-10-19 4:03 pm
pcummins
All chip real-estate isn’t created equally.
True. I just look at FPGAs and think they’ve got a pretty good deal on yield given that the transistors are all pretty regular and if you’re willing to sacrifice some logic slices/CLBs/LUTs and ship with a certain number disabled you can get some damn good yields over a standard CPU or complex ASIC.
To follow on from your comment on the L1 cache, I’d say the micro-op execution trace cache has a lot more to do with filling execution units rather than the L1 cache. (Though, you need a fast L1 cache to keep the decoder units going to get the trace cache full to keep the execution units going).
Because even PowerPC isn’t quite simple enough for the execution core.
Of course not. When I wrote about complexity, I’m talking about the possibility of a IA-32 instruction being anywhere between 1 byte and 16 bytes depending on the prefixes, instruction itself and accessory data floating behind it. So, if you had to look at a raw stream of data and process it, what’s easier – a standard 32 bit instruction (PPC) that you can easily decode, or a instruction set that varies from 1 byte to 16 bytes, forcing you to decode in order? (If you can? I think the best solution off the top of my head is to decode the length first, or predict it and flush if you got it wrong).
The only x86 chip that caches decoded micro-ops is the Pentium-4.
And AMD doesn’t? The Althon does. Every Intel processor since the Pentium 4 has one from what I can tell. If you got rid of it, you can pretty much say goodbye to Intel leading the performance pack. It’s needed to avoid slugging the instruction decoder units with all the work while the execution units sit around doing nothing – it’s a telling sign when decoding instructions is almost more important than executing them.
Are you talking about the issue queue and dependency scan logic?
That’s right. I’d estimate the PowerPC is probably better off with a simpler ISA, but I’m curious to see how the Cell goes with it’s in-order execution – but by how much PowerPC leads over IA-32 is difficult for me to determine at this point in time. I’d say in-order execution will be a big thing in the future, it pushes what used to be in hardware into software, where optimisation is a lot cheaper to accomplish.
Any good RISC devotes significant space to OOO execution too.
Have you looked at the IA-32 ISA lately? Can you tell me that all the interesting tricks Intel pulled off to get it working effectively with the evolved architecture and register renaming is really as simple as a good RISC chip which actually has a decent number of registers to begin with? All that logic and compiler futzing to get code to run well is really a white elephant nowadays. It’s high time to ditch it and use something that works, not something that was designed to run pocket calculators (8088).
And how are these any worse on x86 versus PowerPC?
Basically, if the inbuilt hardware of the processor somehow stuffs up and mispredicts, the IA-32 ISA fully supports all sorts of exciting ways to jump where ever, drag data up from anywhere, and generally assume it’s in the good old days and talking direct to the RAM. If your pipeline is long, you can get some hefty delays on flushing it and getting back to the real deal. Even if it executed both paths simultaneously, that’s a lot of execution units and logic devoted to guessing that could be used for something else.
PowerPC skips over this issue (however, it’s not the be all end all of the problem) by providing convenient link registers, counter registers and branch prediction hints so programmers can tell the CPU what to do. Generally, programmers and compilers know better than the CPU what they’re really trying to do (unless you’re a really bad programmer). The CPU tries its best to make silk purses out of sows ears, most of the time.
AMD and Intel – their solution? Try and use conditional moves and conditional sets instead of branches. Yup. Now, this wouldn’t be a problem if you were running new code, but if you’re running legacy code that doesn’t believe in that… well, you see the problem. It’s high time binary translation kicked in and cleaned up the mess.
You do have a point that the L1 instruction cache is complicated by the variable-length encoding.
I made a mistake, it’s actually 1 to 16 bytes. Bit of a problem, also IA-32 can mix data/instructions quite a bit without a problem. Good for the programmer, hell for decoding and caching and keeping track of everything. PPC tries to minimise misaligned data where possible. You don’t get that option on IA-32 depending on the quality of the code.
Core 2 could probably hit 4+ GHz quite easily if it didn’t have to scale down to sub-35W laptop chips.
I don’t see them scaling to 4+ gHz anytime soon. Especially as they tried and failed on the Pentium 4. Might be able to do it on the new Xeon’s, though. They can toss some pretty good cooling and other enhancements onto the problem just like IBM with Power 6.
Intel’s desktop chips aren’t appropriate for a video player not because they’re x86, but because they’re desktop chips…
Some people on this forum could have fooled me, what with the rampant misconceptions floating about and blind faith that IA-32 is the be all, end all solution. I’m a strong believer of using the correct tool for the job.
You’re a smart guy, isn’t it your job to re-educate people not to follow the crowd and accept face value misconceptions? If it isn’t – well, no problem there either. Let’s face it – computing and society is still in the dark ages IMHO – if we’re still wasting time like this on trivial stuff like PPC vs IA-32 or XBox 360 vs Playstation 3 it’s likely to stay there for quite a while yet while bigger issues go wanting.
Anyhow, if you want to discuss further you know where to contact me.

2006-10-18 8:47 pm
renox
>the old hairy byte-based instruction (x86) encoding has about a 33% size advantage over PowerPC
About PowerPC yes, but not about ARM Thumb2 ISA.
So it’s quite possible to have a “RISC” like architecture with an intruction density comparable to x86.
Note that the real problem of x86 were not so much its byte-length instruction, but not enough registers, stupid segmentation, lack of decent paging (NX protection), non-orthogonal ISA, brain-dead ‘FP stack’, stupid reuse of FP registers for MMX, etc..
Some problems are fixed now: NX protection, 16 registers, flat address space and other have workaround: SSE instead of FP or MMX operation, but x86 is still an ugly ISA.

2006-10-18 4:27 pm
cptnapalm
I realized that in my second post, I said that code simply isn’t a recompile away from working on a new architecture. If someone didn’t read the original post, as many did not considering their comments, then it would be unapparent that I was talking about closed source as that was the context of the original comment. If, say, Lobotomik had actually read both comments to which he was responding, which he obviously didn’t (ironic given his own comment), he couldn’t have posted what he did without being shown as a dimwit.
Closed source *can* be recompiled for new architectures by the company. But if it isn’t and it usually is not, then what does it matter to you what the new architecture can do when you need the old software?
We’re stuck on x86 for the forseeable future because we need it for those binary blobs we need to run. Had the source been opened (even if not free), then there would be more competition amongst CPU companies as you could pick between x86, PPC, MIPS, UltraSparc… What would things look like now had the DEC Alpha been viable as a desktop CPU years and years ago when x86 was 233 MHz, while the Alpha was at 500?
Differences between OS API’s were remarked upon and are, of course, quite valid. But that simply returns to my comment about closed source…

2006-10-18 5:57 pm
pcummins
… we need it for those binary blobs we need to run.
Drivers yes. Applications, not really. It depends on the application and how timing sensitive it is (most apps aren’t, apparently – not anymore). Drivers as I mentioned earlier are still a huge sticking point as they are generally not platform or OS agnostic, and given that EFI seems to be given a bad stigma that option doesn’t seem likely anytime soon either. Basically, any well behaved application can be subject to binary translation or dynamic recompilation.
From this, you’ll get about a 4x slowdown on dynamic recompilation (the favoured technique for Transmeta, Transitive/Rosetta, numerous other systems). Hybrid binary translators like Digital FX!32 and Dynamo-RIO show promising (albeit very old) results. Put simply – if a new CPU was at least 4x as fast as the fastest popular CPU with better power savings and significant support software, it could possibly sway quite a few people away from IA-32.
Essentially, the better the emulation environment, the faster it’ll run. Emulating an entire PC is a huge slowdown, that’s why VMware and other virtualisation companies provide drivers for their environments to run optimally. If you just needed to emulate the application (and not the hardware or OS API’s) you’d see some pretty nifty speed gains over flat emulation alone.
What would things look like now had the DEC Alpha been viable as a desktop CPU … when x86 was 233 MHz, while Alpha wa at 500?
Well, the DEC Alpha was viable as a desktop CPU. An Alpha running at 500 mHz with Windows NT (Alpha) and Digital FX!32 could run as fast, if not faster than a 200 mHz Pentium Pro at the time. (However, it averaged about 150 – 200 mHz, not bad however). Unfortunately, Alphas cost a lot and they didn’t get economics of scale. You wouldn’t see the average person running out to buy a DEC Alpha unless they wanted to do some serious number crunching; the Windows NT (Alpha) and Digital FX!32 was a good effort (and one I consider unmatched today) to make it even more of a good value proposition from the hardware.
They have DEC Alpha emulators today to run really old OSes like OpenVMS on emulated DEC 3000s or AlphaStations. You’d be surprised how important they are to making sure things work properly (esp when they run hardware that costs several million or more dollars). So yes, the problem is starting to crop up now on all those old workstations that are now starting to fail (or the business no longer exists).
Differences between OS API’s were remarked upon and are, of course, quite valid. But that simply returns to my comment about closed source…
Well, I agree closed source is an issue. The Open Source community isn’t exactly fantastic about documenting or co-operating to make open source usable (think about all those hand-coded Linux drivers that don’t work anywhere else, and have no real documentation). If they can get more professional and develop the correct black box reverse engineering tools like the Samba team, they could have more clout with companies.
For example, if a new bit of hardware comes out and the vendor is unwilling to release the required specifications, a well positioned OSS developer company with a co-operative legal team and set of programmers can basically reverse engineer it anyhow without a problem. (Read: they better have a good legal team, that is. From memory the DMCA enables reverse engineering for interoperability, with some legal restrictions).
Once companies realise it is futile to try and hide whatever they think is important in binary blobs, they’ll come to the party and figure out how best to deal with the situation. Otherwise, you can always recommend people buy from OSS friendly companies instead…
2006-10-18 6:50 pm
Nicholas Blachford
We’re stuck on x86 for the forseeable future because we need it for those binary blobs we need to run.
This argument is just plain silly.
IBM sell POWER & PowerPC workstations, how do you think they do graphics?