A Peek at Faster Power6, Cell Chips

Thom Holwerda 2006-12-29 IBM 30 Comments

Judging by details revealed in a chip conference agenda, the clock frequency race isn’t over yet. IBM’s Power6 processor will be able to exceed 5 gigahertz in a high-performance mode, and the second-generation Cell Broadband Engine processor from IBM, Sony and Toshiba will run at 6GHz, according to the program for the International Solid State Circuits Conference that begins February 11 in San Francisco.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

30 Comments

2006-12-29 10:20 pm
Nublu
I realize that the Cell processor is a powerhouse for gaming. I would love to explore its capabilities in the realm of Music. Right from sound design through production, recording and live performance.
Wonderfully powerful and running quiet with a low electricty bill.
A fragrant flower, maybe not a Lotus, but sweeeeet.
2006-12-29 10:49 pm
CowMan
Let’s hope they succeed on these grounds, we’ve heard the same about the netburst architecture, and obviously it did not scale as Intel had hoped.
Though, with the launch of cell based PS3’s, the Apple switch to Intel, and the launch of Core 2 Duos—> it’s been a very exciting time in computer processors, it seems it’s just the start.
2006-12-30 1:07 am
hobgoblin
If they come out with a cell that can work alongside more conventional ram then what the ps3 use things may turn interesting. That and a port of the amiga os;)
Btw, anyone know the heat and watts of the cell? As in whats the potential for using the cell in a laptop or similar?
Browser: Opera/8.01 (J2ME/MIDP; Opera Mini/3.0.6540/1558; nb; U; ssr)

2006-12-30 9:20 am
Flatland_Spider
According to Wikipedia’s Cell Entry: <a href=”http://en.wikipedia.org/wiki/Cell_microprocessor_implementations“&…
Voltage(V)/Frequency(GHz)/Power(W)/Die Temp(C)
0.9/2.0/1/25
0.9/3.0/2/27
1.0/3.8/3/31
1.1/4.0/4/38
1.2/4.4/7/47
1.3/5.0/11/63
IBM does use the Cell in blades, but I’m not sure how easily that would translate to a laptop though. How much and how obscure would it be is another question.
Edited 2006-12-30 09:21

2006-12-30 6:13 pm
rayiner
That’s the power dissipation for an SPE, not a whole Cell chip

2006-12-30 1:39 am
mattv427
Someone at Apple is eating a shotgun as I write this.

2006-12-30 4:34 am
rayiner
You think Apple didn’t already know about Power6? Power6 represents the application of the advantages of the POWER system architecture to the design of the CPU. The chip is designed around an insane amount of memory and I/O bandwidth, huge caches, etc, to compensate for lower IPC from a simpler core. A Power6 CPU is going to be several times bigger than an Intel CPU, an order of magnitude more expensive, and require massive supporting infrastructure. The damn thing is going to require more than a dozen channels of DDR2-667 memory just to provide full memory bandwidth!
It’s a huge honking server chip designed for huge honking servers. Apple doesn’t sell huge honking servers, what it sells are laptop and desktop machines that need high performance with lower power dissipation and with very cheap supporting infrastructure. Intel’s Core provides that, in a way no Power6 derivative is going to.
Power6 is what you get when you tell the designers they can have 350mm^2 of die space, 6000 contact pads, a huge external cache, industrial-strength cooling, a multi-thousand dollar per piece budget, not to mention a dozen channels of RAM and fat inter-CPU links on an extremely expensive multi-layer MCM. Core 2 is what you get when you tell designers that 150mm^2 of die space is pushing it, that they have to fit into an existing 775-pin socket, that the core has to scale from a 4 lb laptop to a 50 lb workstation, from a 5W ULV chip to a 80W workstation chip, and from a $500 desktop to a $50,000 server. Apple needs the latter, not the former.

2006-12-30 7:58 am
raynevandunem
I’m a tad ignorant concerning the RISC vs. CISC debate.
If Apple is moving to x86 for their desktop systems (they have server systems, though) just like all their other desktop competitors, does this mean that RISC processors (SPARC, ARM, POWER/PPC/Cell, etc.) are meant more for every other hardware system (embedded, server, console, etc.) except for the desktop?
If so, is it because of the processors being RISC, or because of the maker (IBM, Sun, etc.) being geared toward explicitly non-desktop systems from the getgo?
I read that Apple left PowerPC because IBM couldn’t make the processor exceed a certain amount of megahertz for their desktop systems, while Intel could for Dell and their other clients.
And does this mean that one can’t use a…say…SPARC processor for their desktop system and still be as productive as a counterpart system with x86?
I’m just wondering about this because Apple, prior to the big switch, was pimping their “Think Different” image even in their hardware architecture in a “PPC is us, x86 is them” kind of way. The infamous bunny suit commercial comes to mind.
In fact, that was so far and so long impressed into the Apple brand that it became a part of the very difference between a Mac and a PC (“G4” has a certain “rad” essence to it).
This is why the diehard Mac fans have been so wary about the switch. These people bought the candy-colored iBooks and iMacs with (today, the much-derided) Mac OS 8 and 9 on it. These people bought the uber-expensive G4 Cube and terribly-glassy Bondi Blue PowerMacs.
Fercrissakes, these were the folks with the “X” tats on their chests in “The Cult of Mac” (that was a good book, btw)!
The different processor was Apple’s ultimate symbol of being different from the rest for the longest time, more so than the operating system.
What does this switch mean for all the other processor architectures and makers out there, at least those who may want to target desktop systems? Are there any others left?

2006-12-30 8:39 am
rayiner
It is very important in this discussion to seperate instruction sets from microarchitectures. The instruction set is how the processor exposes operations to software. The microarchitecture is how those operations are implemented. RISC and CISC are general design principles of the instruction set, not the microarchitecture.
Back when RISC chips were introduced, this distinction was less meaningful. CPUs implemented instruction sets in a very direct way, and so the instruction set largely dictated what the microarchitecture looked like. Since RISC was created, a great deal of complexity has moved into the microarchitecture. Things like superscaler execution, pipelining, out-of-order execution, etc all have a major effect on the microarchitecture, and are affected only indirectly by the instruction set (and even more indirectly by whether the chip is RISC or CISC).
You also have to remember that Core 2 CISC is not like 8086 CISC (and PowerPC RISC never was that RISC anyway). Modern x86s are in some ways RISC, because they translate x86 code to internal RISC operations. In other ways, they are like souped-up CISC, because they take advantage of things like memory operands to improve performance.
The suitability of a particular chip for a task is directly only a function of the microarchitecture. On a complex chip like POWER6 or Core2, this microarchitecture is largely independent of the instruction set, so in this realm the question really becomes “which microarchitecture is more suitable, Power6 or Core 2”, without consideration of x86 versus PPC or RISC versus CISC. As CPUs get simpler, and microarchitectural features are removed, the instruction set increasingly drives the microarchitecture, and thus becomes a bigger deal. Simple embedded CPUs can probably save a few tenths of a watt by using an easier-to-decode RISC instruction set than a more complex-to-decode CISC one. At the very bottom of the ladder, you have microcontrollers, whose microarchitectures are almost completely driven by their instruction set. Interestingly, most of these are CISC chips, because of the code-density advantages of CISC versus RISC.
At the level of Power6 versus Core 2, instruction set doesn’t play a huge role either way. Core 2 leverages x86’s CISC-y memory-operand model a good bit to do some instruction dispatching optimizations, and Power6 benefits from PPC’s floating-point multiply-accumulate instruction, but essentially, Core 2 is suitable for workstations because Intel designed the microarchitecture for that role, while Power6 is suitable for servers because IBM designed the microarchitecture for that role, while Cell is suitable for consoles because IBM designed the microarchitecture for that role. All of these chips could’ve been designed with a different ISA without dramatically changing their performance characteristics.

2006-12-30 6:07 pm
Nicholas Blachford
At the very bottom of the ladder, you have microcontrollers, whose microarchitectures are almost completely driven by their instruction set. Interestingly, most of these are CISC chips, because of the code-density advantages of CISC versus RISC.
Low end Microcontrollers are liable to be 8 or even 4 bit CPUs which pre-date the RISC / CISC debate and as such, don’t really count as either.
If you move up into 32 bit embedded controllers CISC ISA based processors are pretty much nowhere to be seen, it’s dominated by ARM which is a very RISC ISA.
but essentially, Core 2 is suitable for workstations because Intel designed the microarchitecture for that role, while Power6 is suitable for servers because IBM designed the microarchitecture for that role, while Cell is suitable for consoles because IBM designed the microarchitecture for that role. All of these chips could’ve been designed with a different ISA without dramatically changing their performance characteristics.
That is probably true for Core2 probably not in the case of POWER6 and definitely not in the case of Cell. POWER6 is rumoured to be quite a bit simpler than POWER5 for out-of-order execution, this is more likely to hurt x86 performance than PowerPC.
In the case of Cell it’s power comes from the SPEs, these are very much RISC designs and are highly dependant on their ISA, making them decode and execute something like x86 code would just plain hurt.
I see your point but it’s only true for highly complex CPUs, they do do a very good job of hiding the “internal” ISA. Better examples would be POWER5, Core2 and Opteron.
It’s a huge honking server chip designed for huge honking servers. Apple doesn’t sell huge honking servers, what it sells are laptop and desktop machines that need high performance with lower power dissipation and with very cheap supporting infrastructure. Intel’s Core provides that, in a way no Power6 derivative is going to.
They had to build a whole new processor when they made the 970, with POWER6 it’s designed to be scalable so building a cut down cooler version is pretty much a case of putting the same chip in a smaller box.
POWER7 is even more aggressive in that direction – a version of it will fit into an Opteron socket.
2006-12-30 6:55 pm
rayiner
Low end Microcontrollers are liable to be 8 or even 4 bit CPUs which pre-date the RISC / CISC debate and as such, don’t really count as either.
By your logic, all the CISC architectures that precipitated the RISC design don’t count as either because they pre-date the RISC / CISC debate! Classic 8-bit microcontrollers like the Zilog Z80 and the Motorola 6800 are most definitely CISC chips. The only one I can think of that doesn’t really count as either is the PIC, and then only because it’s in some ways RISC-y (single-cycle fixed-length instructions), and in some ways CISC-y (accumulator-based register model with memory operands).
If you move up into 32 bit embedded controllers CISC ISA based processors are pretty much nowhere to be seen, it’s dominated by ARM which is a very RISC ISA.
Those aren’t really “micro” controllers as such — I lumped them into my embedded category.
That is probably true for Core2 probably not in the case of POWER6 and definitely not in the case of Cell.
I wasn’t implying that you could do an x86 Cell properly. You really wouldn’t, because you really don’t want to do an in-order x86. Of course, for the PPE’s role in the chip, I don’t think an in-order anything is a particularly good idea.
POWER6 is rumoured to be quite a bit simpler than POWER5 for out-of-order execution, this is more likely to hurt x86 performance than PowerPC.
It is doubtful that POWER6 has a simpler OOO core than, say, the Pentium Pro. IBM is espousing 2x the performance for POWER5, and at 4-5 GHz, Power6 will have to retain comparable IPC to POWER5 to meet that goal. That level of OOO is likely enough to make up for any deficiencies of x86. Sure, you’ll have to make the pipeline a couple of stages longer to decode x86 efficiently, but that’s not going to change your performance drastically.
In the case of Cell it’s power comes from the SPEs, these are very much RISC designs and are highly dependant on their ISA, making them decode and execute something like x86 code would just plain hurt.
Yep. Entertainingly, PowerPC is apparently not RISC enough for the SPEs (which is another reason why this RISC versus CISC thing is so silly to talk about).
They had to build a whole new processor when they made the 970, with POWER6 it’s designed to be scalable so building a cut down cooler version is pretty much a case of putting the same chip in a smaller box.
Every engineering design is a point in the design space. That point is decided via numerous trade-offs which are made to achieve a particular final result based on particular given specifications. You can’t move a design to a radically different point and still expect it to perform as well as another design that’s targetted for that specific point.
POWER6 has a specific design point: 100W+ TDP, 32MB+ external L3, 75GB/sec memory bus. It is designed to that specification. The circuits are designed for high-clockspeed, not low power consumption. The large L3 cache puts a lower burden on the OOO core to cover memory latency, allowing it to be simpler. The huge memory bandwidth influences the design of the prefetch algorithms. Core 2 is designed to a different point: 35W TDP, no external cache, 10GB/sec memory bus. It’s circuits are designed for low-power consumption over ultimate clockspeed, it has a deeper OOO core to cover memory latency, and it has to be more judicious about its prefetching.
POWER6 is simply not going to scale down to Core 2’s design point while performing competitively with Core 2. That’s just not how things work. What Apple, rightly realized was the fact that the design point they needed was precisely the one Intel was targetting with their processors. They could get a chip that was actually designed for the tasks they needed, instead of having to use a chip that was drastically scaled up or down to fit their market, with sub-optimal results.
Edited 2006-12-30 18:58
2006-12-30 7:25 pm
andrewg
Hi Rayiner. Not going to try and contradict you but I read through the link (http://realworldtech.com/page.cfm?ArticleID=RWT101606194731) posted by Dubhthach which was very good. It went over a presentation on Power6 by IBM.
So interesting points.
1. The Power6 does seem have been designed to allow a lot of configurability.
“From the start, IBM has designed the POWER6 systems to be extremely configurable. The intra-node busses, which normally operate on 8 bytes/cycle can be chopped down to 2 bytes/cycle for low-end systems, and the inter-node busses can also operate at 4 bytes/cycle. Similarly, the two integrated memory controllers can both operate at half-width, and one of them can be removed entirely. The external L3 caches are optional, and are available either in the MCM, or in an external configuration. …”
Now I realise that does not mean it would scale to a consumer level notebook but it is interesting.
2. It appears that Power6 will be better at out of order execution thanks to a change to configuation of pipline stages,
“The basic pipeline for the POWER6 is the same number of stages as the POWER5, but they have been rebalanced across the different phases. Most significantly, dependent ALU operations now can execute back to back, eliminating a vexing kludge in the original POWER4/5 architecture. This makes the out-of-order scheduling easier, and is probably the reason that the instruction issue/dispatch phase uses 2 cycles in the POWER6 (compared to 4 in the POWER5).”
2006-12-30 8:27 pm
rayiner
There is a really informative new presentation on Power6 here:
http://www2.hursley.ibm.com/decimal/IBM-Power-Roadmap-McCredie.pdf
It reveals some details that I haven’t seen published before, specifically the fact that the core isn’t really any narrower than Power5+: Power6 has 2 integer units, 2 FPUs, one branch unit, presumably 2 load-store units because of the dual-ported data cache, and is 7-issue over two threads and 5-issue on one thread.
In response to your point, you’re right that Power6 seems very scalable for IBM’s server line. It looks like its going to go from blade systems all the way up to very huge servers. However, the thing to keep in mind is that even the cut-down configuration of Power6 puts it in the high-end Opteron/Xeon range from a system architecture point of view. A half-width memory bus on one controller is still in quad-channel FB-DIMM territory, and even a quarter-width elastic I/O bus is still in Hypertransport territory. And of course the core is still huge, with 4MB of L2 per core, and on-chip L3 directories, etc. Such a system is pushing it even for a hypothetical $5000-range PowerMac, much less a $1500 iMac
2006-12-30 9:14 pm
andrewg
Maybe smaller still. If I remember correctly it can be configured without L3 and the L2 cache is twice the size of the admittedly big 2 meg per core cache found on the current Core 2 Duo chips – some of them anyway.
I realise that Apple really had no where to turn, they had to go with x86 or end up using chips which were designed with other purposes in mind. Actually they had already been doing that for years and it was starting to hurt and get worse.
But it is also interesting to see that the Power6 comes with Altivec. Does Altivec have much use in servers or more accurately server application software?

2006-12-30 6:38 pm
flywheel
The RISC/CISC debate is a tad more complicated, since the in the IBM world, RISC does not strictly imply a reduced number of instructions, but also include composite instructions. I once saw an alternative IBM akronym for RISC, cant remember where.
And does this mean that one can’t use a…say…SPARC processor for their desktop system and still be as productive as a counterpart system with x86?
The differences is at low level – whether you use a SPARC or a AMD64 hardwareplatform doesn’t matter.
2007-01-02 1:02 pm
renox
> does this mean that RISC processors (SPARC, ARM, POWER/PPC/Cell, etc.) are meant more for every other hardware system (embedded, server, console, etc.) except for the desktop?
The reason why embedded CPUs are RISC is that they are new CPUs: every “new” CPU is RISC, even Itanium has RISC (additionally it has VLIW characteristic) features: lots of register, orthogonal ISA, load/store ISA, easy instruction decoding..
In the desktop space, software compatibility has proven more important than CPU performance alone: the big number of x86 sold reduced the price of x86 CPU which are very good in performance/price ratio even when they had less power than RISCs so x86 won the desktop space..
As it happens x86 are CISCs.
It doesn’t mean that RISCs are ‘bad for desktop’, it just means that in the desktop space, software compatibility has more importance than in the embedded space..

2006-12-30 7:17 am
Rayz
Well, Apple would still have to pay for the custom work they needed to make PPC derivatives run on laptops. The Mac market just isn’t big enough for IBM to do the work for free (I think MS pays for the custom chip work on the XBox360). They would also still have to design the motherboards and stuff, which they don’t have to do with x86; they use any box-standard motherboard.
And of course, x86 is still the best way to run Windows, which has certainly helped with the Mac’s popularity.
2006-12-30 6:22 pm
flywheel
Hardly – the only 64-Bit PPC development IBM has done the last year has been CELL related.
At this moment and in years to come Apple will have no use for either CELL nor Power.
Even though I love the PPC, I think the x86 change was a wise step (I would have prefered an AMD/ATi connection instead).

2006-12-30 1:52 am
helf
no. Power is the server chip. just ’cause it is getting speedboosts does not mean there would ever be a consumer ‘G6’ chip. specially since Apple was pretty much ‘the’ customer for the G5. Also, the cell might be fast, but it only has one multipurpose core. the rest are more or less DSPs. so they would let you boost the speed of some things, but only stuff written especially for them.
correct me if I’m wrong.
In performance and heat/power use scores, the new x86 chips rock. you have to admit that even if you don’t like the architecture.
in the long run, the switch to x86 was a good idea, even if I I didn’t like it.
Browser: Mozilla/4.0 (compatible; MSIE 6.0; Windows 95; PalmSource; Blazer 3.0) 16;160×160

2006-12-30 2:34 am
SamuraiCrow
On the forums at Power.org they’re talking about merging the POWER and PowerPC lines into one Power architecture.

2006-12-30 4:36 am
rayiner
That’s from an ISA point of view, not from a uarch point of view. A desktop chip and a server chip might use the same instruction set, but the cores would be completely different.
2006-12-30 4:46 am
nathanw
They are already mostly merged. I think the last time POWER and PowerPC had different instruction sets was with the POWER2 back in 1996 or so (I think it was a hardware sqrt instruction).
2006-12-30 4:48 am
Yoda
Since nobody uses PowerPC anymore, this merge shouldn’t be to difficult anyway

2006-12-30 10:13 am
Doc Pain
“Since nobody uses PowerPC anymore, this merge shouldn’t be to difficult anyway “
Please pay attention to not making all-quantified claims, because they can be falsified with only one anti-example.
If you’d said: “Nobody buys new PowerPC based systems today”, that would be correct. PowerPC based systems are still in use, surely, not wide spread, but fact is: they are used. If I only said, “I still use one PowerPC system”, you’ve been proven to be lying.
Apple Power Mac G5 (Quad): IBM PPC970MP 2.0 / 2.3 GHz
Pegasos
IBM RS/6000 pSeries
IBM Blade JS20
Motorola PowerStack
Nintendo GameCube
Nintendo Wii
MICROS~1 Xbox 360
I’m sure nobody uses them anymore. 🙂
Just for correctness. But if your “;)” indicated sarcasm, I didn’t say anything. 🙂

2006-12-30 3:22 am
mcmv200i
Once I read in a German newspaper things like they are written here http://www.itmanagersjournal.com/articles/8552?tid=78:
Hofstee said that IBM envisions a Cell platform that is as open and as popular as IBM’s Power. “I would say we plan to open it up to the same level that the Power architecture is opened up, which is very open. There are really no secrets we intend to keep.”
What does “Cell is going to be Open Source Hardware” mean?
1.) Is it only the interface specification to the hardware, which is open?
2.) Or is the whole hardware design (including verilog source code etc.) public available?
3.) Or is even the hardware design licensed under a free licence which fulfills the 4 stallman freedoms resp. the OSI Definition?
2006-12-30 11:12 am
peskanov
This table of consumptions comes from a good source (appears in IBM docs) but only refers to an SPU, one of the nine cores of a Cell.
I think one Cell running at 3.2 Ghz is consuming less than 80 Watts.
2006-12-30 5:01 pm
Dubhthach
David Kanter over at realworldtech did a overview of Power 6 back in october it’s best i’ve found on net.
http://realworldtech.com/page.cfm?ArticleID=RWT101606194731
2006-12-30 6:54 pm
Wes Felter
Open hardware isn’t necessarily open source hardware. In the case of Power/PowerPC/Cell, “open” just means that you can get documentation, like for x86, MIPS, ARM, SPARC, etc. Some over-enthusiastic journalists don’t understand this.
2007-01-01 12:46 am
bousozoku
IBM’s Power series has been running large machines and professional applications such as banking, healthcare, and manufacturing quite a while in the zSeries (mainframes), iSeries (midrange), and xSeries (UNIX) lines.
I believe the largest of those was up to 32 processors before the multiple cores were available and I recently read something about 128 cores being utilised. The Power6 will be a great processor where lots of money is available for lots of cooling.
Obviously, desktop machines can’t cost $7000 and be sold to consumers. Even the Mac-o-lytes won’t buy them. Intel doesn’t really sell to IBM’s market, so they can produce better desktop processors. The PowerPC 604e was quite a performer compared to the P3 but it didn’t go above 350 MHz for consumer machines.
IBM never considered clock speed a priority when they could just add more or special function processors. However, they had 3 GHz G5 samples and we know that those were never repeated in production.

2007-01-01 6:39 pm
foobar
IBM’s Power series has been running large machines and professional applications such as banking, healthcare, and manufacturing quite a while in the zSeries (mainframes), iSeries (midrange), and xSeries (UNIX) lines.
zSeries only has embedded powerpc. z9 has ppc 4×0 in the channels, and service processors. None of the zSeries machines have ever had a Power3/Power4/Power5. The latest mainframe processors are codenamed “bluefire”, and they are very distinct from Power.
Lots of z9 info. will be published soon…
http://www.research.ibm.com/journal/rd51-12.html
xSeries are AMD and Intel only. xSeries is different from the Blade Center and IntelliStation brands.
DS8x000 storage does use Power5, and Power5+.
I believe the largest of those was up to 32 processors before the multiple cores were available and I recently read something about 128 cores being utilised. The Power6 will be a great processor where lots of money is available for lots of cooling.
The biggest machines had…
Power3 = 16 chips/16 cores/16 threads
Power4/+ = 16 chips/32 cores/32 threads
Power5/+ = 32 chips/64 cores/128 threads
In IBM speak, 1 core = 1 processor.