Cell Architecture Explained

Submitted by Nicholas Blachford 2005-01-20 Hardware 110 Comments

Designed for the PlayStation 3, Sony, Toshiba and IBM’s new “Cell processor” promises seemingly obscene computing capabilities for what will rapidly become a very low price. In this 5-part article you can look at what the Cell architecture is, look at the profound implications this new chip has, not for the games market, but for the entire computer industry. Has the PC finally met it’s match?

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

110 Comments

2005-01-20 10:45 pm
Anonymous
They said the same thing with the Emotion Engine.
2005-01-20 10:47 pm
Anonymous
Now one just needs a matching language.
2005-01-20 10:55 pm
Anonymous
Not unless Microsoft supports it with Windows and it’s too late for that.
We’ve seen MIPS, ARM and Alpha all fall by the wayside – and Microsoft tried to support at least two of those. Then there’s been the Sparc, PA-RISC etc, etc.
All of them were years ahead of comparable 8086 systems when they were introduced, but the combination of Windows and Intel along with backward compatability (bye Itanium) have so far proved unassailable.
And no, I didn’t RTFA
2005-01-20 11:04 pm
Anonymous
A very interesting article and it looks a very powerful processor. But after observing the computer industry couple of years I would say the words of public enemy are applicable:
“dont beleive the hype”
PS: Any one interested in this sort of thing might want to check out the madness that is the console forum on beyond3d.
2005-01-20 11:05 pm
Anonymous
Not unless Microsoft supports it with Windows and it’s too late for that.
That’s where you are wrong. XBox 2 will be using the Cell Processor. So, guess what…
2005-01-20 11:08 pm
Anonymous
MIPS and ARM both enjoyed usage in PDA’s (particularly PocketPC 2000’s). When Microsoft dropped support for MIPS in PocketPC 2002, it dropped almost entirely out of use. However, ARM still remains as the dominant platform for all handhelds, MS, Palm and others.
2005-01-20 11:11 pm
Anonymous
“That’s where you are wrong. XBox 2 will be using the Cell Processor. So, guess what…”
XBox2 won’t be using cell processor.
2005-01-20 11:13 pm
Anonymous
UPDATE
There’s also an index here:
http://www.blachford.info/computer/Cells/Cell0.html
—
They said the same thing with the Emotion Engine.
Not unless Microsoft supports it with Windows and it’s too late for that.
Read part 4…
2005-01-20 11:13 pm
Anonymous
[quote]That’s where you are wrong. XBox 2 will be using the Cell Processor.[/quote]
That’s where *you* are wrong. The Xbox2 will be using a 3-way variant of the G5 processor that is used by Apple. The cell processor is a very different animal geared more toward linux. Interesting stuff, not earth-shattering, but definately one to watch.
2005-01-20 11:19 pm
Anonymous
There was a third embedded CPU in that picture: SH3.
Microsoft dropped both MIPS and SH3 in favor of StrongARM. Why? Well I can only speak for myself, I develop PocketPC software for a living. I got tired of having to compile/test on 3 different platforms, then have to make sure my users install the right one. The latter obstacle is a lot harder than one would imagine, even with an install wizard.
The PocketPC market isn’t as grand as everyone would hope for, so I wouldn’t cry for MIPS or Hitachi. They have business elsewhere.
As video gamers, I’m sure some people have gotten tired of keeping up with all the different consoles. Kudos to Sony for keeping backwards compatibility through all these years. Well as much as we, the consumers, hate swapping consoles, I bet the developers get tired of having to relearn new platforms. That’s big money going into training and the overall cost to make one title (on multiple platforms) goes up.
Make it simpler and cheaper for developers to make games, and we might see games for cheaper. Oh but piracy will hurt sales and those companies still lose money. Hey, there are people out there that buy games. Maybe if it was cheaper to produce a game, those companies will see wider margins.
2005-01-20 11:20 pm
Anonymous
No, the Xbox 2 will not be using CELL…do you think SONY would actually let M$ leech off of something they invested in(heavily) …especially for a competeing product against the upcoming PS3? I think not…
2005-01-20 11:34 pm
Anonymous
http://www.wired.com/news/games/0,2101,61065,00.html?tw=wn_tophead_…
2005-01-20 11:40 pm
Anonymous
the big difference with Linux though is that it is cross platform.
Isn’t any software portable? 🙂
Porting Linux would requier first writing a GNU compiler for the Cell architecture, rewrite almost every line of code (ok, 99%) in the kernel regarding memory managment, rewrite all the drivers,…
2005-01-20 11:44 pm
Anonymous
That wired article leaves room for possibility that it is cell in the next XBox – Cells are PowerPCs and are in IBMs family of chips (since they are involved with Cell).
No?
PS I still don’t think cell will be in XBox 😉
2005-01-20 11:47 pm
Anonymous
and take it with a grain of salt…
IBM sold it’s PC business because they wanted to free themselves of the licensing contracts they had with Microsoft. They have this cell-processing due out very soon that will “replace” x86 simply because the performance differences are on different magnitudes. IBM is betting cell-processing will the next “PC” and they want Linux to run on it instead of Windows.
All conjecture of course.
2005-01-20 11:48 pm
Anonymous
If the Cell processor can really “accelerate” the emulation of other hardware, then this would be a great competitor for Intel and AMD.
If it really is so fast, even in emulation, what’s to stop MS from releasing a Cell version of Windows, that has a Native OS core, and provides emulation for legacy apps while providing a Cell native .NET compiler (since .NET is their big new technology and happens to be hardware independent)?
2005-01-20 11:49 pm
Anonymous
What this means is that if you do need to run a specific piece of software you can emulate it. This would have been impossibly slow once but most PC CPUs are already more than enough and with today’s advanced JIT based emulators you might not even notice the difference.
I don’t know much about Cell except what I read in this article, but if this author is suggesting that emulating on Cell (which, if I read correctly, is still partially based on PPC architecture) he doesn’t seem to know how extremely difficult it is to emulate PowerPC code on x86. Granted, emulating x86 on PPC is another story, but ffs the Cell doesn’t even use local memory. x86 will stay with us for a looong time so get used to it
As the author ‘almost’ mentioned in the article – Cell vs x86 is history repeating itself (Amiga vs PC).
2005-01-20 11:51 pm
Anonymous
I think the author was talking about emulating x86 on Cell.
2005-01-20 11:58 pm
Anonymous
Porting Linux would requier first writing a GNU compiler for the Cell architecture, rewrite almost every line of code (ok, 99%) in the kernel regarding memory managment, rewrite all the drivers,…
Now that’s where you’re slightly wrong.
“writing a GNU compiler” for the cell architecture requires adding a new backend to gcc.
Porting the kernel doesn’t require anything to be rewritten. Low level architecture memory and task management, boot code, and possibly platform specific drivers need to be written. It can be entirely done in < 20000 lines (depending on the details) without touching any core code at all. Much of that 20000 lines is template style code that goes along the lines of “fill this in with your architecture’s method to do an atomic increment”, “test and set”, “flush TLB”, “switch page tables”, etc.
Lastly, none of the drivers have to be rewritten. Or even modified.
2005-01-21 12:05 am
Anonymous
The wanted to sell the PC business because the wanted to free themselves of MS’s licensing contracts, sell-off an unprofitable division, and gain a partner into the Chineese market (which up till now has been reluctant to accept a foriegn corporation the size of IBM in.
IBM has also been wanting a way to pay back MS for what happened with OS/2. I know it’s tinfoil hat talk, but I’m not stupid either. As a result, they want patents to hang around, thus the reason they’ve been supporting patents and aquiring new ones. So when cell processing comes out, Linux will be in the clear but MS will have to deal with IBM to make use of the technology or face IBM’s patents. I don’t think IBM is going to cut MS out, but I think they’re going to make them pay for OS/2.
Cell processing is a sound technology. It’s not the end all be all, but it’s like a beowulf cluster vs a single server, except it’s processing cores instead.
Backward compatiblity was a must in the past because most apps were proprietary and closed source. Opensource means the conversion from one architecture to another will be realively quick due to accumalitive action from the community.
Linux and cell processing may help IBM come out on top if they play their cards right and deliver what they promise.
2005-01-21 12:10 am
Anonymous
There is more than Windows support that decides the success of failure; Windows isn’t the deciding factor, what is, is whether third party vendors jump on board and make their software available for the platform.
Linux could easily survive without Office or Windows compatibility, what is the problem is the lack of high profile third party vendors like Adobe, Macromedia, Corel/Procreate, 4D, Quark, etc. etc.
2005-01-21 12:39 am
Anonymous
Maybe I’ll be proved wrong, but I really doubt it. New architectures are fine, but there are a lot of things wrong with this. There is NO WAY they could fit 8 APUs per cell and 4 cells on a system with 4 4.6 GHz pipelined FPUs in each plus integer processors. That’s totally impossible. Even with a 500 million transistor chip, it’s not going to happen.
Also, the memory bandwidth problem is not addressed. Ok, sure 8 8MB memories attached to the system…where? What about system memory. Last I checked 8×8 = 64 MB and that’s not enough, so there must be some high latency system memory somewhere. Also, does he realize that high bandwidth/low latency doesn’t happen. Nothing works that way without being HUGELY parallel. No motherboard designer in his right mind would ever want a 1024 bit memory bus.
Also, if this processor is so amazing, how come Sony contracted NVIDIA to do the graphics for the PS3?
This guy really has no clue what he’s talking about. I’m eager to find out more about the Cell, but not from this guy.
2005-01-21 12:40 am
Anonymous
Ok, this is great, but I thought Internet Explorer’s User Agreement forbides the use of IE in other system than Windows.
Has it changed?
2005-01-21 12:42 am
Anonymous
The Cell is an exciting development, it heralde the availability of a platform on which everyone (not just the NSA) can put together truly distributed, parallel applications.
The comparison of the Cell against x86, SParc, MIPS, ARM etc. doesn’t stand up because the architecture is truly different – more like comparing apples and toilet-roll rather than apples and oranges!
When I look at the Cell, and where Cells would fit into a mesh of linked Cells, I’m reminded of the Transputer.
To truly get the most out of teh Cell architecture, people need to be thinking about the types of OS, types of Sw they want to run on it.
The whole concept of an “application” fails to apply … start thinging of objects/agents/avatars (apulets or whatever) each allocated to a Cell, comunicating in different ways.
x86 (or other) will still be there .. running a MS-Word on the Cell machine makes no real difference.
2005-01-21 12:50 am
Anonymous
Didn’t the Original BeBox have 4 ‘cells’ with the load distributed cleverly by BeOS..?
2005-01-21 1:05 am
Anonymous
>>Ok, this is great, but I thought Internet Explorer’s User Agreement forbides the use of IE in other system than Windows.
PPC has IE (OS X)
2005-01-21 1:06 am
Anonymous
There is NO WAY they could fit 8 APUs per cell and 4 cells on a system with 4 4.6 GHz pipelined FPUs in each plus integer processors. here is NO WAY they could fit 8 APUs per cell and 4 cells on a system with 4 4.6 GHz pipelined FPUs in each plus integer processors. That’s totally impossible. Even with a 500 million transistor chip, it’s not going to happen.
You’re probably thinking of standard desktop CPUs which are huge complex beasts. The Cell, and the APUs in particular are very simple.
Last I checked 8×8 = 64 MB and that’s not enough,
That seems to be PS2 specific, there’s no mention of Cells being limited to 8MB per bank.
Also, if this processor is so amazing, how come Sony contracted NVIDIA to do the graphics for the PS3?
Graphics processors are much more specialised.
This guy really has no clue what he’s talking about. I’m eager to find out more about the Cell, but not from this guy.
Go read the patent then, or the Microprocessor Report article (I’d imagine it’s similar).
2005-01-21 1:14 am
Anonymous
Microsoft dropped both MIPS and SH3 in favor of StrongARM. Why? Well I can only speak for myself, I develop PocketPC software for a living. I got tired of having to compile/test on 3 different platforms, then have to make sure my users install the right one.
That’s Microsoft you talk about. Does that really impact other PocketPC developers who are skilled on both MIPS and SH3? It seems you use developement tools optimized for Windows version PocketPC.
As video gamers, I’m sure some people have gotten tired of keeping up with all the different consoles. Kudos to Sony for keeping backwards compatibility through all these years.
Sony was not the first to provide backward compatibity. AFAIK, Nintendo Gameboy Advance is the only handled console that can support both old Gameboy and Gameboy color. Therefore the kudos goes to Nintendo.
Well as much as we, the consumers, hate swapping consoles, I bet the developers get tired of having to relearn new platforms.
Au contraire, developers love to learn new platform. Else you will be still be in Atari era according to your logic.
As for swapping, you have a choice to do or not to do. In my case, I chose to not get a new console.
Make it simpler and cheaper for developers to make games, and we might see games for cheaper.
Middleware application like Maya or other similar softwares are there to fill the gap.
2005-01-21 1:15 am
Anonymous
although the CELL architecture is “part of the PPC family” it is not really compareable… the only part that is really compareable is the central “general” processor, which won’t really be used by programs/etc.. designed for the CELL architecture…
personally I can’t wait to get my hands on this technology ^_^
I’ve been waiting since the early rumors(even around the release of the PS2)!
2005-01-21 1:25 am
Anonymous
Doesn’t the Cell architechture have binary compatibility with the PPC? If so, porting to all 3 consoles next generation will be much easier and they will all use fairly standard PC-like Nvidia and ATi GPUs. Renderware will be singing all the way to the bank.
2005-01-21 1:27 am
Anonymous
is that you need a highly parallel algorithim to make it useful. Cell processing seems to be great in a very limited amount of problems and is going to require very very good libraries to get a start on taking over the PC. The Cell will never be able to kill the x86/Win32 until that happens.
Remember that most programmers are not that smart and that writing parallel code is hard. Writing code for Win32 will continue for a long time yet.
2005-01-21 1:34 am
Anonymous
So the combination of Cells and Grid computing should be quite phenomenal. I’ve read quite a lot about GRID, and they seem to be taking the same philosophy (somewhat) into the design of the PC and processor internals.
I’m interested in the hardware, but am even more interested to know what OSs they have lined up for these machines and the PS3… Is there a Linux port underway inside IBM? Under the terms of the GNU GPL, they don’t have to tell anyone. When they distribute it they merely have to provide the sourcecode with it…
2005-01-21 1:37 am
Anonymous
oOOOoooo… and seeing that the Amiga is running on PPC hardware then the next generation of the Amiga hardware and Amiga OS4 can take advantage of it too…
2005-01-21 1:57 am
Anonymous
It’s riddled with spelling and grammar mistakes, but the wealth of content and insight make up for these shortcomings.
2005-01-21 1:59 am
Anonymous
this is a great article. while much of it is speculation like some of his articles, i did find it a good read. only time will tell. i’ll take a cell-based computer.
lets try reading the article before posting, as opposed to just responding to the osnews blurb.
2005-01-21 2:02 am
Anonymous
So…
Now there are:
IBM Servers/Workstations
Macintoshes
Game Cube
Playstation
XBox
Amiga
All using PowerPC CPUs. It wouldn’t be nice if Apple could give, or even licence, the Cocoa (OpenStep) framework?
See, Apple doens’t need to open source Quartz or the Aqua interface. Whith Java, and a native Cocoa port, Linux, Amiga OS and many others (ebended maket?) could run the same applications whitout need to re-compile!
This could pave the road for more and more PowerPC desktops!
2005-01-21 2:15 am
Anonymous
There is NO WAY they could fit 8 APUs per cell and 4 cells on a system with 4 4.6 GHz pipelined FPUs in each plus integer processors.
Not necessarily. If you’ve got a transistor budget of 500mi, you could probably pull it off. The Cells are exceedingly simple. They probably have very simple dynamic instruction scheduling, they have no cache, which saves transistors not just on memory but stuff like hardware to do cache lookup, handle cache misses, etc. They have no MMU, thus no huge associative memories for TLBs, etc. The MPC7450 (G4e), has, discounting it’s 256KB of L2 cache and 64KB L1 caches, about 17m transistors. With this budget, they have 4 integer units, an FPU, and four vector units. The G5 has about 55m transistors. 4 G5s plus 32 cacheless G4e’s would take about 750m transistors. This number is a gross overestimate, since it doesn’t take into account transistors saved on cache tags, TLBs, MMUs, etc. Given a 500m transistor budget, (again, not unreasonable, given the GeForce 6’s 220m transistors), these numbers look doable.
No motherboard designer in his right mind would ever want a 1024 bit memory bus.
The memory bus is 128-bits. 6.4GHz (800Mhz x 8 bits per clock cycle) with 16 bytes per cycle ~ 100GB/sec. The 1024 bit bus is internal to each PE, not external on the motherboard. This is again not extreme — the EE that appears in $149 PS2s has a 2560 bit eDRAM bus. Even old Celerons have 256bit internal busses to their caches.
Also, if this processor is so amazing, how come Sony contracted NVIDIA to do the graphics for the PS3?
NVIDIA is doing the graphics backend for the PS3. For something like triangle setup, it’s much cheaper to dedicate silicon then to waste an overly-general vector-FPU on the task.
2005-01-21 2:15 am
Anonymous
“Maybe I’ll be proved wrong, but I really doubt it. New architectures are fine, but there are a lot of things wrong with this. There is NO WAY they could fit 8 APUs per cell and 4 cells on a system with 4 4.6 GHz pipelined FPUs in each plus integer processors. That’s totally impossible. Even with a 500 million transistor chip, it’s not going to happen.”
http://pcweb.mycom.co.jp/news/2004/11/29/011bl.jpg
2005-01-21 2:20 am
Anonymous
No, it will be very hard. The PS3 is going to be a unique processor to program for. It’s not a 32-way multithreaded machine. It’s a 4-way machine with 32 dedicate stream processors. The stream processors work in batch mode: they are submitted a job, and continue to run the job (without preemption) until they are done. Also, the stream processors have no cache, so you can’t access memory like you normally would. Instead, you need to explicitly copy often-needed data to the local storage. So programming the PS3 is going to be very different from programming a regular multithreaded machine. Meanwhile, the XBox2 is a much more traditional architecture. That’s going to make porting between the architectures very hard.
2005-01-21 2:30 am
Anonymous
Like he said, these could compete with reasonable prices because there is a lot of fabrication behind it, not just good design. This is distributed computing to it’s extreme.
2005-01-21 3:08 am
Anonymous
He goes on about how it compares to PC CPUs, but as others have pointed out, the x86 compatable CPU is probably not going anywhere. What about a Cell or Cell-like GPU?? Doesn’t that make more sense?
Modern CPU + Cell based GPU would beat plain old Cell any day yes?
PC will ALWAYS win. At least for the next 20 years or so.
2005-01-21 3:12 am
Anonymous
Doh, as I read on, I guess he did touch on it. (Yeah, Im responding to myself. )
Still, he seems to think that just because Cell currently uses shared memory, it always has too, and so wouldn’t work well as a GPU. I find that hard too beleive.
2005-01-21 3:20 am
Anonymous
85 Celcius seems incredibly *high* to me…
Then again I’ve only dealt with PC processors..
2005-01-21 3:35 am
Anonymous
Reading this made me laugh a lot. Cell will not “blow away” the x86 market because most applications run today cannot be parallized to any meaningful extent. Lets take something simple; rendering a webpage. The result of each pixel being rendered is completely independent of each other pixel, so that means that each pixel can be broken up into a seperate parallel task, an “apulet”. This is all well and good, except that color determination is not the slow part of rendering a web page. The slow part of rendering a web page is the laying out of each object, which is dependent on every other object’s position, the exact thing a vector processor is bad at! Cell will work well in all the places a vector processor would work well.
Additionally, statements like “If I was to write Cell code on OS X the exact same Cell code would run on Windows, Linux or Zeta because in all cases it is the hardware Cells which execute it.” are just stupid. If I write x86 code, the exact same code would run on Windows, Linux, or Zeta because in all casses it is the CPU executing it… unless I have to talk to the Operating System, that is. Cell won’t mythically make all operating systems the same, unless it implements its own abstraction on top of the OS.
2005-01-21 3:42 am
Anonymous
I don’t believe in luck–I believe in merit–but, hey…good luck with this, IBM.
There’s no chance of it replacing AMD64 in the next ten years, though.
–EyeAm
2005-01-21 3:47 am
Anonymous
While I totally understand the logic behind patenting it, would this not be counter productive to the whole reason x86 is ubiquitous?
Lots of cheap,fast hardware.
IBM tried to keep the bios under lock and key until it was white-room reverse engineered by Compaq. With patents this cannot happen as the design rests solely with these three.
It _could_ take off, but I won’t hold my breath. Its a shame too sounds like a killer architecture.
2005-01-21 3:53 am
Anonymous
“It’s not a 32-way multithreaded machine. It’s a 4-way machine with 32 dedicate stream processors.”
I thought the PowerPc 970 was 64 bits.
WHY ARE WE GOING BACK TO 32 bits? I was hoping this would be a step up.
2005-01-21 4:11 am
Anonymous
Well, I think we all recognized that article was a little over enthusiastic but it does suggest some interesting possibilities.
First of all I want to say I think it is completly possible to make a processor with 8APUs and so forth. For starters PowerPC chips already have several seperate execution units on them, and I think they use fewer transitors than intel chips. Moreover, a huge chunk of the transitor budget goes to doing things like cache consistancy or complicated instruction prediction which is probably not used on the much simpler APUs.
Of course it seems like this is primarily of interest to game systems or signal processing applications (note that a 4 threaded 32 stream processors is just another way of saying 4 cell procesors, each has a PPC core with 8 APUs). However, I would not be so quick to dismiss this for the PC market. While it may be true that many individual applications may not easily multi-thread it seems we are approaching a point where the biggest complaint is not the maximum processing rate in one application but the ability to run multiple applications at once. On my computers I’m rarely if ever frustrated at the rate some program is running at, but slowdown in other programs when I run a processor intensive job or turn on a video. So while drawing a webpage may not be speed up by this processor drawing several webpages at the same time will be and that is the sort of thing which makes a big difference for the end user.
Also, a processor like this offers great possibilities for JIT and VM code. The main thread can dispatch instructions and threads to the APUs dynamically based on what is happening in the system. Also I find it interesting that IBM is going the same way as intel in pushing all the complexity on the compiler. It makes one wonder if itanium is really as dead as everyone thinks. Perhaps in 4 years when AMD can’t squeeze anything more out of x86 intel will be ready to jump in having worked out all the bugs to their new chip.
2005-01-21 5:34 am
Anonymous
I find it interesting that IBM is going the same way as intel in pushing all the complexity on the compiler. It makes one wonder if itanium is really as dead as everyone thinks. Perhaps in 4 years when AMD can’t squeeze anything more out of x86 intel will be ready to jump in having worked out all the bugs to their new chip.
Itanium
A very perceptive comment as a multi-core Itanium is close to being a cell processor itself. Itanium would make an awesome cell processor for “general purpose” as well as specialty computing applciations. With 128 integer registers, 128 floating point registers and 64 predicate registers and six instructions per cycle at your disposal the EPIC architecture enables algorithms in three dimensional graphics and audio applications to go wild and get fast!
In effect the Itanium almost becomes a variant of a Field Programmable Gate Array (FPGA) processor, since it has multiple instructions that can be executed at once (currently 6 instructions per cycle but that could be extended to 128+). But it has an advantage over FPGA processors in that it’s easier to program since it’s still pretty much a processor that we are familar with.
Projecting forward: if Intel takes the Itanium design to it’s ultimate implementation each chip could have 128-256+ processing units on board and execute 128-256+ instructions at the same time. Obviously most software doesn’t have that much instruction level parallelism so such a chip would also need to have multiple cores that share their processing units. An 8 core Itanium chip might have 16 processing units per core for 128 processing units to be shared amoungst all the cores. (For a discussion of Itanium 2’s “Execution Resources” See page 22 of http://www.dig64.org/More_on_DIG64/Itanium2_white_paper_public.pdf ).
Other Cell Processors
There are other “cell” processors out there with more cells than “The Cell”: for example, see PACT’s eXtreme Processor Platform (XPP)( http://www.pactcorp.com/ ) and “PACT offers 80 Pentium4s on a 100Mhz chip” ( http://www.theregister.co.uk/2001/07/24/pact_offers_80_pentium4s/ ). This is an incredible FPGA chip design.
The Cell processors as decribed in the article are very interesting indeed especially if the low cost is achieved with high volumns in the retail digital appliance (HDTVs, Game Consoles, etc) markets.
Markets
Gaining marketshare is critical for any “cell” processor to “take hold” or “displace” general purpose CPUs. However marketshare gained in the “appliance” market doesn’t necessarily transfer over to the general purpose market.
What is certain is that there are a lot of companies working on producing new chip designs with multiple cores and “cells” to maximize the ulitilzation of transisters.
Generialization
What is also certain is that it’s much easier to “parallelize” general purpose software for processors such as the Itanium than The Cell processor.
The Future
Ultimately the direction that I’d like to see are processors with tens of thousands of “itanium” capable cells on each chip. This is what is needed for visions such as Alan Kay’s Dynabook, MIT’s Project Oxygen ( http://oxygen.lcs.mit.edu/Overview.html ) and pervasive computing. A processor such as this is the ultimate system: software objects mapped to real dedicated hardware processors. Smalltalk would rock on such systems! ;–)
Winners
The market will rule. Here’s to hoping that the end users are the winners!
All the best,
peter william lount
http://64bits.net
http://smalltalk.org
2005-01-21 5:45 am
Anonymous
Hey the artical was pretty good, but i’m not baught on the idea of the Cell killing the PC market we know of today. The advantage right now is that Cell is 100% parallel in everything, and if you write apps for it they will autimatically work with as many cells as you can put into your system.
But then you look at what we have on the x86 so far, for starters we had Intels HT tech, with the OS and HT, one core cpu’s can basically take todays apps that aren’t made to run on multi-cores and split up the tasks to an extent and gain in performence. This doesn’t work for most apps but HT does in most cases give you a performence boost of some degree.
Now come dual core AMD and Intel cpu’s with 2 cores and 64bits. If the dual core x86 cpu’s work how they should work, with their built in logic and the OS working together, you could have them taking and splitting up tasks/threds and more or less forcing uni-processer apps work like they where writen for multi-core/processer systems. In the end it’s the OS and the cpu that handle everything.
If this works out then you will see a performence gain on todays 32bit apps and future 64bit apps on the PC without any real recompiles needed for the most part, of course if you recompile the apps to better use the multi-cores then you’ll get an even better performence gain.
Also Intel, as the artical says, is working on some big secret project Z which is a massivlly parallel cpu of their own. They might already have something in the works to fight back with, Intel isen’t stupid and neither is AMD, they won’t just sit and let Sony walk in on their market.
If the Cell does have a big performence lead when it comes out in late 2006 early 2007 for the PS3 anyways, then by then Intel and AMD might have 4 core x86 64bit processers ready, who knows?
2005-01-21 6:00 am
Anonymous
@M Jared Finder: Cell will work well in all the places a vector processor would work well.
Yes, stuff like image and video compression, 3D rendering, many scientific analyses, etc. A lot of very important programs can be vectorized, and many that cannot now could be in the future. There is a lot of research going into parallizable algorithms, and since everyone is going multicore these days, these developments will accelerate.
@Devon: Note that the Cell isn’t just a processor for the PS3. It’s a processor that IBM hopes will find it’s way into everything from workstations to supercomputers. Cell will definitely be faster than any PC when it is released (consoles almost always are), but PC’s will catch up. However, those PCs might very well be the Cell-based DCC workstations IBM plans to release
@Matt: I said 32-way multithreaded, not 32-bit. Cell is neither. The PowerPC PUs are 64-bit chips, while the vector processors are 128-bit chips. However, the vector processors are not used in a traditional SMP arrangement. The APUs don’t run threads from a central pool of schedulable threads. Rather, they operate in batch mode. Threads running on the PUs create a software “cell” (a bundle of code and data), and dispatch the cell. The scheduler then allocates an APU to an available APU, and then the APU runs independently until it finishes the computation. As such, code written for traditional multi-way machines cannot take full advantage of the Cell architecture. Software has to be divided up into course-grained cells that can be batch-processed by the APUs.
2005-01-21 6:11 am
Anonymous
“@Devon: Note that the Cell isn’t just a processor for the PS3. It’s a processor that IBM hopes will find it’s way into everything from workstations to supercomputers. Cell will definitely be faster than any PC when it is released (consoles almost always are), but PC’s will catch up. However, those PCs might very well be the Cell-based DCC workstations IBM plans to release “
I still stand by my statement that x86 will be sticking around for some time, no matter if cell is faster, or even five times faster… well, unless two things are true at the same time:
1. Cells are cheap enough at least for high end desktops.
2. Cells can emulate/interpret x86 perfectly and at least as fast as the current top line x86s available.
In that case, I would have to concede that cell would have a good chance. Still not a sure thing though.
2005-01-21 6:13 am
Anonymous
Of course, Im not desputing its dominance in the console market. Thats where it will crush, maim, and steamroll its compitition.
And Apple would do well to consider it too Id say.
2005-01-21 6:23 am
Anonymous
<>I still stand by my statement that x86 will be sticking around for some time, no matter if cell is faster, or even five times faster… well, unless two things are true at the same time:[/i]
Oh, no doubt. I don’t think Cell poses any threat to x86 at all, really, except in the DCC and scientific computing markets. I was just pointing out that it wasn’t matter of “PC vs Cell”, because Cell’s will be in PCs (personal computers).
2005-01-21 6:25 am
Anonymous
Some of the comments really smack of someone who has not read the entire article, of course it could be that maybe you just don’t understand it… please at least read the article before commenting though…it gets really annoying on OSNews sorting through the comments that sometimes are way off to find the few good ones.
~~Go CELL Go!~~Who else is going to ditch their pc once the PS3 comes out, especially if SONY makes a linux distro for it!?! I know I am!(maybe sell my geforce 6800 and make my main pc into another server ^_^)~~
2005-01-21 6:28 am
Anonymous
“Oh, no doubt. I don’t think Cell poses any threat to x86 at all, really, except in the DCC and scientific computing markets. I was just pointing out that it wasn’t matter of “PC vs Cell”, because Cell’s will be in PCs (personal computers).”
Well, I am really certain that when the author said “PC” he meant x86, not literally “personal computer”. Especially since he discussed the possibility of a CELL-Based Desktop emerging from some 3rd party company who licenses the architecture from IBM, Motorola, or SONY… Which would in fact, be a “personal computer”…
2005-01-21 6:53 am
Anonymous
Looking at structure of hardware “Cell” here http://www.blachford.info/computer/Cells/Cell1.html I suddenly realized that IBM was there Just compare arhitecture of IBM’s RS600 system for example and the “Cell” processor:
http://www.blachford.info/computer/Cells/Cell_Arch.gif
2005-01-21 7:09 am
Anonymous
This person is extremely overenthusiastic. It’s a fairly open ended patent. They mention that they’d like to do 8 APUs per processor and that they’d like to get to 32 GOPS and 32 GFLOPS. They probably won’t get it with their initial implementation. Actually, given a 4 fpus per APU at 4.6 GHz (if they can even pull that off [yes, I saw the picture, goals are VERY different from reality]) that would get them to a theoretical peak of 18.4 GFLOPS, which, as we all know, nobody really reaches theoretical peaks, so we’re ever so slightly shy of the 32 GFLOPS stated in the patent.
I’m still very dubious. It’s not that I don’t think that IBM could pull off an amazing processor, it’s that I doubt they could get an order of magnitude speed increase, or as the author of the article states, several orders of magnitude. I see it being a really big problem getting data into this system fast enough. DRAM just isn’t fast enough to fill it. 6.4 Gbps is not fast enough. Modern graphics cards, which are basically SIMD monsters don’t have enough bandwidth and the new cards are pushing 40 Gbps, almost an order of magnitude more than this system.
As for fitting it all, I’m still dubious. I know that eliminating the TLB and caches is a big deal for space, but each APU is a non-trivial amount of space, especially if it’s going to run faster than 4 GHz.
Again, I would love to be proved wrong. If the x86 ISA would die, the world would be a better place, but something tells me this thing won’t be fast enough to kill it. We’ll see.
2005-01-21 7:34 am
Anonymous
because of the x86 userbase,not everyone will just jump ship and move to Cell. Not even 5 years after cell becomes mainstream…there’s just too many x86 out there…it all boils down to numbers so Cell will probably thrive in high-end computing market.
2005-01-21 8:07 am
Anonymous
I have several problems with that article:
1). Cell’s main ability is in parallel operations. Unless Sony have one HELL of a compiler and developers that know how to take full advantage, they aren’t going to see anything close to the performance the article hints at.
2). Linux is nowhere near good enough to come close to touching a system running windows on x86. Sorry, but it’s true! Even xp embedded is often times better and easier to develop for than linux — and that’s the area where ms are most vunerable.
3). It is HIGHLY unrealistic to expect people to suddenly switch platforms — especially given the extent to which legacy code is currently supported in x86-64 + windows. Too expensive.
Also, didn’t IBM recently sell their cpu manufacturing division?
2005-01-21 8:32 am
Anonymous
I see people giving opinions here like they were the processor developers. Please, tell any of you one MP that you had designed and it is in the market. People, do you know what Sony, Toshiba and IBM have done during the past 20 years. And you keep talking like they dont know what they doing. LOL. Great designs in the past have failed because of the market, period.
>>Unless Sony have one HELL of a compiler and developers that know how to take full advantage, they aren’t going to see anything close to the performance the article hints at.
LOL. No Sony has highschool kids to do the work for them. Please, I want to learn something in the comments, from the smart people here. That is why I read OSNews.
2005-01-21 8:33 am
Anonymous
Unless Sony have one HELL of a compiler and developers
That problem showed up already with their Emotion Engine – people had problems developing software for two parallel asymetric VUs…
2005-01-21 8:38 am
Anonymous
In fact porting linux seems to be a lot simpler than ” —.dsl.siol.net” would imply.
Did you notice that linux had support for new processors (itanic, itanium2, opteron, amd64, and so on) in very short times?
Sure, some programs will have to be custom ported, but this normally means changing very small parts of the code.
2005-01-21 9:13 am
Anonymous
This article was like reading a college freshmans paper, arggggg, so painfull.
How can anyone be taken seriously when making such serious semantical and Logistical errors?
Comparing a platform (PC) to a component (the cell is just a part of a possible platform)?
So many comments on this forum demonstrate the lack of clarity of the article (or like someone pointed out, the fact that most people couldn’t read past the first page–a clear sign that the article is barely readable, sure the contant is interesting but since it is not clear, nor exact, it is hard to either understand what the point of it is or trust the points it does make.)
…
IBM seems to get in x86 killing partnerships every 10 years or so and it seems to make the same ponouncements every 10y as well.
I was around when the A.I.M partnership was announced and it proclaimed the same exact thing.
(we will surpasse and beat x86 within 3years)
Well despite the best laid out plans, the wind still hasn’t changed and x86 is still ahead of the market curve.
sure PPC is arguably a better cpu, (look at performance vs resource curve for example) but the thing is the x86 camp doesn’t stop improving it’s designs, when they hear of competitors, they just keep going. so when IBM/Motorolla don’t deliver on time and with the #’s they promised, they can’t/shouldn’t keep gloating about their skillz. (where is 3Ghz ppc970? why has moto been stuck between 500Mghz and 1.7Ghz for almost 5 years, when x86 has gone from 500Mghz to almost 4Ghz in the same time.
If AIM had just delivered on it’s plan on time it would have won out for sure.
they would have reached 1Ghz first at a great termal range, and then ms would have kept making Win for PPC (I OWN A “PREP” MACHINE WITH NT3.5)
The thing is like everything else from a big corporation there is an amazing dicotomy between the real engineering, and the real Marchitecturalizing.
marketeers are who we hear, engineers we don’t.
Marketeers promised a flying car by the year 2000 at the 1935 worlds fair… where the f is my flying car damnit.
my take on all of this is this.
unless something trully revolutionary happens –like the 8086 combined with msdos, and sold by IBM at a reasonable price, or the Mac OS combined with Bill Gates and sold as windows at a blackmailers price– then we will not see anything changed.
Right now we could use some change, and in some ways the giant machine which has been in power for so long is showing it’s age, so maybe something will come along to ad the sand in the gears and kill the machine slowly and painfully.
You would think IBM of anyone would know considering how many markets it once dominated and now just services.
2005-01-21 9:49 am
Anonymous
Thanks for that great article! The Cell processor is a very interesting proejct and I predict a bright future.
Some people here are complaining that it is difficult to vectorize a program and many applications can’t be vectorized at all. Of course this is true, but a vectorized MS word would not make any sense at all, and peopel don’t buy a 3GHz machine in order to run Word faster.
But if they are doing Photoshop are if they are gaming, or if they are doing serious scientifc calculations or if they are doing audio or video processing, or if… then a Cell processor will definately help a lot. And exactly these use cases are the reasons why people need faster machines. All these examples are very processing intensive and are inherent parallel thus easily vectorizable.
See it this way, a Cell processor resembles a normal CPU coupled with a GPU for general purpose calculations. And using the GPU for other tasks than graphics rendering is a very hot topic nowadays, SUN even suggested a special API for GPGPU programs in order to make the programmers life easier.
So I guess the Cell goes into the right direction, it is only the logical consequence of GPUs and the successor to SSE and MMX.
Kaya
2005-01-21 9:55 am
Anonymous
Concering the undoubtable market dominance of Intel, AMD and Microsoft, I would expect that Intel and AMD will jump on the same train as IBM and will offer something similar like the Cell in order to replace SSE within 2 or 3 years.
Althogh I won’t believe that we’ll switch over from PCs to Cell based systems soon, I expect that we’ll see similar techniques in PCs offered by the usual suspects (Intel and AMD).
Kaya
2005-01-21 10:09 am
Anonymous
There is something i don’t understand, maybe some of you could help me :
It is said that cell don’t use cache nor virtual memory, but rather they have 8 8MB memory slots with ultra fast access..
Ok, but then how does it handles operations on very large files (size > 64 Mo) ? Does the programmer has to do all the work of loading/unloading parts of the file in memory from disk ?
There is something I just don’t get…
2005-01-21 10:16 am
Anonymous
be os be the best os for cell b/c of its pervasive multithreading would work well with multiple cells?
too bad sony didn’t buy be os for a song and made it its os for ps2 and now ps3
2005-01-21 10:22 am
Anonymous
If IBM has The Cell and if it can release it this year and if it clocks over 2 Ghz and if it has 8 APUs and at least a large chunk of a G5 and if they keep it open enough for Linux to thrive it might be able to compete against the PC.
What they need to do to win over the market is release the chip for build-your-own consumers, like me. We will learn how these things work, cluster them and recommend them. But if the only products are from large commercial businesses who only want to profit from IBM’s IP and The Cell’s technology, the consumer will have little incentive to feel like they really own this architecture.
I have a PC. I know my PC inside and out. I can repair it, upgrade it and rebuild it anytime. I know where to get parts. I have several vedors to get motherboards, CPUs, memory, cases, fans, devices, etc. When the same can be said about The Cell, it might have a chance to do what this author suggests.
Raw performance won’t make a difference, unless that price/performance ratio is considerably lower than the open architecture. This is why Apple will never take over the PC market. Though they have a chance at the PC software market with OSX, if they cared about that sort of thing.
I predict this chip will go the way of transmeta or the G4/G5. It will be neat when it comes out, but it won’t scale with the competition and won’t be open enough to lure customers like me. But I hope I’m wrong. IBM just might “get it” and might make it open. In which case I’d love to jump ship. Hell, I’d jump over to the IBM PowerPC 970 if they’d open it up. Its much nicer than any X86, even without the price/performance numbers AMD can crank out.
2005-01-21 10:38 am
Anonymous
One solution would be to let the OS implement virtual memory – and run all applications as managed (.NET/Java). If there is no MMU in then Cell, and no SW layer between the OS and the “metal”, then I assume that device drivers have to do the loading/unloading (Maybe aided by some system calls) ?
2005-01-21 10:42 am
Anonymous
this is a very very very intrestin article.
the cell is here to stay: will be used in play3 (a secure best seller) and by sony toshiba and other licensing in consume electronic
on pc side, both sony and toshiba can start a new computer system, maybe like a media center play 3 based…
and if (and is a very big IF) APU (that me think simpson’ s market…) are an extention of altivec, on the mac side will be a bomb, expecially with the easy of multithread in cocoa and the auto-vectorization of gcc 4 in tiger…
and btw: xbox2, like nintendo revolution, will use a ppc but not cell
2005-01-21 11:03 am
Anonymous
I have several problems with that article:
1). Cell’s main ability is in parallel operations. Unless Sony have one HELL of a compiler and developers that know how to take full advantage, they aren’t going to see anything close to the performance the article hints at.
You know MS’s next Xbox has 3 SMT CPUs (6 parallel execution contexts)? Perhaps you’d better tell Micorosft and Sony and all their game developers that they don’t know how to develop software and they’d better retire now.
2). Linux is nowhere near good enough to come close to touching a system running windows on x86. Sorry, but it’s true! Even xp embedded is often times better and easier to develop for than linux — and that’s the area where ms are most vunerable.
Do you know how elegant the POSIX APIs are when compared to that abortion called Win32? Sorry, but it’s true!
FYI, PS3 will be running some form of Linux for its OS. You’d better tell Sony how crap it is and they should be using XP embedded. Snicker.
3). It is HIGHLY unrealistic to expect people to suddenly switch platforms — especially given the extent to which legacy code is currently supported in x86-64 + windows. Too expensive.
x86-64 + windows. Excuse me? x86-64 + windows supports exactly zero software. Microsoft missed the boat again.
Also, didn’t IBM recently sell their cpu manufacturing division?
No. Have you any idea what you are talking about?
2005-01-21 11:05 am
Anonymous
wow!
if this is half of what it seems (i have not completed the article btw) then its an insane idea. it seems to be able to handle numa enviroments on chip rather then in software (unless they have to go software for the networking).
hmm, i wonder if one in the future not so much buy a new computer as buy a addon box that you hook to the old one to help with background prosessing. home clustering anyone?
hmm, the design seems like a big-iron on a chip in that you have a main system that basicly acts as a traffic cop and then smaller units that do the real work. this can allow for single chip true multitasking (rather then the faked on we have on desktops today).
i say, if this can get a nice linux support and one can get motherboards that fits in a normal atx tower then this starts to get realy interesting
this can be the jump that the computeing world is waiting for. but like someone have allready aired, there have been created a nasty inertia in the computeing world based on x86 and windows. but this in combo with linux may well break that inertia, atleast i hope so
cell based computers with linux preinstalled, man i would love to see that in shops
2005-01-21 11:16 am
Anonymous
But then do you realize the complexity of the OS in the case of multiple cells ?
It means it has to handle loading / unloading of data among all the cells (a real nightmare if you want to be efficient and have data splitted among all the cells) !!
That’s distributed memory management, without any hardware to help and correct mistakes, and efficient enough for real-time game computation….
They really must be sure of the talent of their engineers do start such a thing with such a heavy time constraint.
2005-01-21 11:33 am
Anonymous
@M Jared Finder
> Lets take something simple; rendering a webpage. The
> result of each pixel being rendered is completely
> independent of each other pixel, so that means that each
> pixel can be broken up into a seperate parallel task, an
> “apulet”.
The job of rendering a web-page isn’t something that can benefit much from parallel computing.
The big job for any non-trivial web page is parsing the HTML/XML and deciding where on that page each object should be rendered, at what size, with what attributes, etc. Once that has been decided, the task of actually drawing pixels on screen is relatively minor.
But never mind that. The bottleneck for rendering web pages is actually downloading the stuff from the web in the first place. Lots of processors won’t help much there.
Cheers,
Rich
2005-01-21 11:38 am
Anonymous
@EyeAm
> I don’t believe in luck–I believe in merit–but,
> hey…good luck with this, IBM.
I would have thought that anybody looking at the history of computing for the last 25 years would quickly come to the conclusion that merit – having the technically better product – doesn’t count for much.
Cheers,
Rich
2005-01-21 11:53 am
Anonymous
@seabasstin
> unless something trully revolutionary happens –like the 8086 combined with msdos
How can you possibly think that the original IBM PC was in any way revolutionary? And, as a point of fact, due to IBM’s penny-pinching, it shipped with the 8088 processor, not the 8086 (the 8088 was an inferior, cost-reduced version of the 8086).
The PC was very much a copy of the various 8080/CPM machines which were popular at the time. IBM chose the 8088 because it would be simple to port CP/M; they ended up with a cheap knock-off of CP/M (MS/DOS) because they failed to license CP/M.
What made the PC successful initially was the IBM brand.
Once again it proves – even back then – that technical superiority counts for little in the marketplace.
Cheers,
Rich
2005-01-21 12:09 pm
Anonymous
I want one!
2005-01-21 12:56 pm
Anonymous
Game programmers could barely touch the potential of Playstation 2 when it first was released. Only after a few years they could cope with the very different (compared to PS1 and PC) architecture, and then usually only the Japanese. Luckily for them, there wasn’t a competent platform at the time (Dreamcast was heavily flopped, XBox wasn’t released yet).
Cell and Playstation 3 may have the same effect. However, since XBox 2 will have a traditional PC architecture (only the CPU will be PPC), Playstation 3 may actually lose the console war if the game programmers can’t harvest all the Cell power and PS3 will look like a PS2.5 compared to XBox 2. So, here’s hoping for they make a damn good library to utilize the Cell processors in an easy way for the programmers.
2005-01-21 1:57 pm
Anonymous
OK, so lets say the guy is overly optimistic and Cell will be in the same ballpark as Top AMD/Intel processors.
Contrary to previous challengers like Alpha, Transputer and Crusoe processors, there will be a huge market from day one for these processors. Most important of course is the PS3, but if the guy is even half right, there will be HDTV, Blue-ray, Media Centres and such more screaming for the technology. High volumes makes for lower prices and increased R&D. The mentioned prevous technologies never had this enormous advantage. Let’s not forget that Intel/AMD have to jump through hoops to make x86 do what it does. Cell technologie would have more ‘growing space’. So Cell could become a player…
x86 will not disappear, but might become part of a hybrid system, a supporting chip for those things that are not ‘vectorizable’. (lets face it, all of us need high performance boxes at home to do home videos, 5.1 surround sound, accelerated 3D graphics etc, just the strenghts of a Cel architecture. Sure, someone mentioned HTML rendering but that hardly needs a multicore Opteron… Are there things that cannot be optimized for a cell architecture but that are nevertheless very CPU intensive?
Maybe it is all hype, a storm in a teacup, but it’s sure nice to get carried away like this… (just imagine games with better visual quality of the Unreal tech demo running at 90 frames a second…)
2005-01-21 2:26 pm
Anonymous
The Cell processor is why there will not be a G5 in a PowerBook. The G5, 970 and derivatives, are too hot for laptops. Apple and IBM cannot afford to produce two lines of processors, one for laptops and one for desktops.
Apple has hinted in the direction they are going with Core Image in Tiger. This lets the GPU take some load off of the CPU and Altivec. I think Apple is preparing for a future without the Altivec, and having GPUs and Cell APUs.
In the article you mention there are some things that the Cell won’t be good at, but in those areas, I would argue that you don’t need a lot of performance. People don’t use all the power in their CPUs all the time, but on the occassions that they do increase the load, they want it faster. Usually those instances are in areas that the Cell was designed for, graphics, sound, and math.
In my research IBM has recently said a rack of Cell servers would do 16 teraflops. A rack of Apple’s Xserves only do 630 gigaflops currently. Apple wants the Cell.
Read my http://www.tweet2.org/wordpress/index.php?p=13 article for an in depth look at Apple and the Cell.
2005-01-21 3:13 pm
Anonymous
The Atari Falcon when it came out had a 56001 DSP attached to the 68030 when it came out as standard. This was wonderful to code for.
Is the cell processor going to be just like a collection of DSPs around then central G4/G5 (PU) processor? If not then how does it differ from this model?
2005-01-21 3:21 pm
Anonymous
Uggghhhh….it’s someone with just enough knowledge of CPU architecture to make him dangerous, spewing garbage based on some specs/patents w/o any real analysis.
I really want to dissect it, but I’m not sure it is worth the time. Here is what I got from the first two pages…
Quote:
“This architecture is not fixed in any way, if you have a computer, PS3 and HDTV which have Cell processors they can co-operate on problems. They’ve been talking about this sort of thing for years of course but the Cell is actually designed to do it. I for one quite like the idea of watching “Contact” on my TV while a PS3 sits in the background churning through a SETI@home [SETI] unit every 5 minutes. If you know how long a SETI unit takes your jaw should have just hit the floor, suffice to say, Cells are very, very fast [SETI Calc]. ”
and then later at the end of the doc when I expect to see this [SETI Calc] reference…
Quote:
“[SETI Calc]
5 minutes for a SETI unit? This could be completely wrong… It is based on the difference between a 1.33GHz G4 (6 Hours / unit @ 10 GFlops) and a 250 GFlops Cell, this assumes the SETI client is using Altivec on the G4 at full speed and the PS3 has 4 Cells. I rounded up to 5 minutes to be conservative. ”
oh OK – I like the I pulled a complete # from my ass with no real basis, but I’ll round up to be “conservative”!
Quote:
“It can go further though, there’s no reason why your system can’t distribute software Cells over a network or even all over the world. The Cell is designed to fit into everything from PDAs up to servers so you can make an ad-hoc Cell computer out of completely different systems. ”
Yup, there would be no latency or concurrency issues! Memory access over ethernet is much better then local DDR or whatever Intel/AMD uses now.
I understand the distributing tasks aspect – big buzz about grid computing in the IT industry – but let’s not confuse that with combining PDAs and TVs and PCs into some fantasmic wonder.
Plus, I’m not sure I want Acme SpyWare Company CELL machines distributing apulets “all over the world”. I know this is not the intention, but the way he words it….
Quote:
“(This is a guess since no details have been released as yet) ”
So how about you don’t.
Quote:
“The lack of cache and virtual memory systems means the APUs operate in a different way from conventional CPUs. This will likely make them harder to program but they have been designed this way to reduce complexity and increase performance. ”
Ummm…aren’t you contradicting yourself there?
Quote:
“By not using a caching mechanism the designers have removed the need for a lot of the complexity which goes along with a cache. The local memory can only be accessed by the individual APU, there is no coherency mechanism directly connected to the APU or local memory.”
This may sound like an inflexible system which will be complex to program and it most likely is…
So remove complexity somewhere but add it somewhere else? That’s actually a reasonable statement – as a programmer I run into that alot. Too bad, he never addresses the programming complexity as a negative later on in the article.
On x86….
Quote:
“If data being worked on is not present in the cache the CPU stalls and has to wait for this data to be fetched. This essentially halts the processor for hundreds of cycles. It is estimated that even high end server CPUs (POWER, Itanium, typically with very large fast caches) spend anything up to 80% of their time waiting for memory. ”
on CELL…
Quote:
“In order to do stream processing an APU reads data from an input into it’s local memory, performs the processing step then writes it to a pre-defined part of RAM, the second APU then takes the data just written, processes it and writes to a second part of RAM. This sequence can use many APUs and APUs can read or write different blocks of RAM depending on the application. If the computing power is not enough the APUs in other cells can also be used to form an even longer chain. ”
So wait…memory fetches for the PC are bad but multiple memory fetches for the CELL are good? One For each APU? I don’t pretend to be an expert on low-level CPU architecture – I know enough about main memory and caches and CPU vs. GPU differences and multi-cpu vs. multi-core vs. hyperthreading differences and how the P4 pipeline resulted in faster Ghz vs. the AMD approach. But he doesn’t explain to me why the CELL is better in this regard. Either his exaplanation is off or he is just giving the most optimistic spin on everything.
Quote:
“To prevent problems occurring when 2 APUs use the same memory, a mechanism is used which involves some extra data stored in the RAM and an extra “busy” bit in the local storage. There are quite a number of diagrams to look at and a detailed explanation in the patent if you wish to read up on the exact mechanism used. However the system is a much simpler system than trying to keep caches up to date since it essentially just marks data as either readable or not and lists which APU tried to get it. ”
Maybe. But you don’t tell me how local memory for the PC (cache) is worse then local memory for the CELL.
2005-01-21 3:49 pm
Anonymous
BTW, I don’t want to appear as if I’m attacking the CELL – I’m just making fun of this guy and this piss poor write-up.
Quote:
“Little is know at this point about the PUs apart from being “Power architecture” but being a conventional CPU design I think it’s safe to assume there will be perfectly normal cache and coherency mechanism used within them (presumably modified for the memory subsystem). ”
So, wait, later on you say that CELLs will be cheaper but they still have a conventional CPU in them? How is that possible. Isn’t that like adding Nitro to a car and selling it for less then what you paid for it?
As for the DMAC…
Quote:
“As the DMAC handles all data going into or out of the Cell it needs to communicate via a very high bandwidth bus system. The patent does not specify the exact nature of this bus other than saying it can be either a normal bus or it can be a packet switched network. The packet switched network will take up more silicon but will also have higher bandwidth, I expect they’ve gone with the latter since this bus will need to transfer 10s of Gigabytes per second. What we do know from the patent is that this bus is huge, the patent specifies it at a whopping 1024 bits wide.
At the time the patent was written it appears the architecture for the DMAC had not been fully worked out so as well as two potential bus designs the DMAC itself has different designs. Distributed and centralised architectures for the DMAC are both mentioned.
It’s clear to me that the DMAC is one of the most important parts of the Cell design, it doesn’t do processing itself but has to content with 10’s of Gigabytes of memory flowing through it at any one time to many different destinations, if speculation is correct the PS3 will have 100GByte / second memory interface, if this is spread over 4 Cells that means each DMAC will need to handle at least 25 Gigabytes per second. It also has to handle the memory protection scheme and be able to issue memory access orders as well as handling communication between the PU and APUs, it needs to be not only fast but will also be a highly complex piece of engineering. ”
Does he ever mention this as a possible bottleneck? No – not in la-la land.
Quote:
“Each bit doubles the number of memory look-ups so the PC will be doing a thousand times more memory look-ups per second than the Cell does. The Cell’s memory busses will have more time free to transfer data and thus will work closer to their maximum theoretical transfer rate. I’m not sure my theory is correct but CPU caches use a similar trick. ”
Wait – is he deriding and praising CPU memory/cache architecture in the same statement?
Quote:
“But these are just the theoretical figures and never get reached, assuming the system I described above is used the bandwidth on the Cell should be much closer to it’s theoretical figure than competing systems and thus will perform better. ”
Does that make sense? And OSNews links to him?
Quote:
“Details of this are not known other than the individual wires will work at 6.4 GHz. I expect there will be busses of these between each Cell to facilitate the high speed transfer of data to each other. This technology sounds not entirely unlike HyperTransport though the implementation may be very different.
It’s not clear how more than 8 cells will communicate but I imagine the system could be extended to handle more. IBM have announced a single rack based workstation will be capable of up to 16 TeraFlops, they’ll need 64 Cells for this sort of performance so they have obviously found some way of connecting them. ”
Ohhhh…so you have these 6.4 Ghz individual wires – I never knew wires had a speed. Some are more efficient then others in terms of heat & other aspects, and some are faster then others, but a Ghz rating for a wire? This makes no sense.
Quote:
“The memory system also has a memory protection scheme implemented in the DMAC. Memory is divided into “sandboxes” and a mask used to determine which APU or APUs can access it. This checking is performed in the DMAC before any access is performed, if an APU attempts to read or write the wrong sandbox the memory access is forbidden.
Existing CPUs include hardware memory protection system but it is a lot more complex than this. They use page tables which indicate the use of blocks of RAM and also indicate if the data is in RAM or on disc, these tables can become large and don’t fit on the CPU all at once, this means in order to read a memory location the CPU may first have to read a page table from memory and read data in from disc – all before the data required is read. ”
Never once addresses whether this memory protection could be a performance issue. So wait – does this mean CELL won’t ever support paging? Does he even understand why this is needed for PC’s?
Quote:
“It’s not clear how this system will operate in practice but it would appear to include some adaptively so as to allow Cells to appear and disappear on a network. ”
Yes, master – the master plan is working splendidly. The world is ours.
Seriously, how about asking questions like – if none of this is centralized, which CELLs decide to delegate tasks to which other ones? How do they notify each other of their presence? Broadcasts? Configuration? etc.
Thing is, I’m totally interested but this article doesn’t offer anything besides the grandest speculations and I’m not sure even half of this is backed by the available documentation. I guess I’ll have to wait until Ars Technica writes up an analysis.
2005-01-21 4:15 pm
Anonymous
– On “concrete processing.”
My summary of this section: Abstraction/layers is bad for performance, good for programming, easier to write software. CELL won’t have the abstraction, so it’ll be faster, YET on the other hand…
Quote:
“The Cell approach does give some of the benefits of abstraction though….Cell provides something similar to Java but in a completely different way. Java provides a software based “virtual machine” which is the same on all platforms, Cell provides a machine as well – but they do it in hardware, the equivalent of Java’s virtual machine is the Cells physical hardware. If I was to write Cell code on OS X the exact same Cell code would run on Windows, Linux or Zeta because in all cases it is the hardware Cells which execute it. ”
Quote:
“Cell will accelerate many commonly used applications by ludicrous proportions compared to PCs. ”
and then 5 sentences later…
Quote:
“yes many OSs will support multiple processors but many applications do not and will need to be modified accordingly – a process which will take many, many years. Cell applications will be written to be scalable from the very beginning as that’s how the system works. ”
So which is it?
Quote:
“Cell may be expensive initially but once Sony and Toshiba’s fabs ramp up it will be manufactured in massive volumes forcing the prices down, the fact it’s going into the PS3 and TVs is an obvious help for getting the massive volumes that will be required. IBM will also be making Cells and many companies use IBM’s silicon process technologies, if truly vast numbers of Cells were required Samsung, Chartered, Infineon and even AMD could manufacture them (provided they had a license of course). ”
As if there aren’t massive volumes in current CPUs and GPUs already?
Let’s compare:
“PC vendors shipped 177.5 million units during 2004, up 14.7 percent from the 154.7 million units shipped in 2003. ”
vs.
“…recently announced global figures for the PlayStation of 80 million, although world-wide sales for the Xbox and GameCube can only be estimated at around 16 to 17 million for the former, and around 15 million for Nintendo’s home console. ”
So thats 177 million PC CPUs for 2004 vs. 110 million for all consoles for the years 2000-2004?
So even if the CELL is used for video games and some other devices, he expects “massive volumes” to price the CELL below x86/Power CPUs even though it might contain a standard Power CPU itself?
2005-01-21 4:22 pm
Anonymous
Manything he said are total junk:
– Emulating a general purpose 80×86 software on a specialised vector architecture? The performance will be awful !!.
Transmeta is using a VLIW dedicated to 80×86 to be able to emulate 80×86 efficiently, cells do not so they will really suck at emulating for example 80bit floating point operations..
– One thing that I really hate in the article its is use of Kilobit in some parts to have big numbers..
“The APU access main memory in blocks of 1024 bits” whouah how impressive!
1024/8=128 byte, while this is a bit larger than cacheline’s size used in general purpose CPUs (normal for a vector processing unit), it is not that big..
2005-01-21 4:23 pm
Anonymous
I just read the article. I had heard of it before but the details were never published. Take a look at the page titled “Cell Architecture Explained – Part 2: Again Inside The Cell.” Scroll down to the section labeled “Software Cells.” I am totally stunned! The similarity to COSA cells and the “COSA cell processor” is unmistakable. The main difference is that a COSA cell can have multiple destination addresses. From the article I could not make out if IBM’s Cell Processor is a signal-based reactive processor. If it isn’t, that’s too bad. The revolution is still in the future.
The COSA Operating System:
http://users.adelphia.net/~lilavois/Cosas/System.htm
Check it out.
Louis Savain
2005-01-21 4:24 pm
Anonymous
@ Not unless Microsoft supports it with Windows and it’s too late for that.
Microsoft is not needed anymore, there’s an army of developers eager to port Linux to any new architecture under the sun, if that CELL thing is that powerfull, Linux will be ported to it lightning fast…
Imagine any kind of VMWARE software running on that thing!
2005-01-21 4:38 pm
Anonymous
Isn’t a GNU Hurd type OS a good match for this cell system?
Linux being monolithic will have its limitations I’m sure, unless the geniuses developing the kernel come up with something totally novel.
2005-01-21 4:40 pm
Anonymous
MS tried port OS and software in the past… OS porting seens non worth, and even software seens a huge work, a lot more of work to support and so on…
From the moment MS bring .NET, there’s a new target: redo also old, make it easily portable (by just porting the “software plataform”, and some libraries), and bring a easy to develop solution, also targeted in a more modern developing paradigm (software need to talk to other software, no matter where it’s; besides lots of other things…)…
Is easy portability is already avaible? Well, not yet (at least from Microsoft…). But with plataforms current expanding (em terms of access) to the costumers, it will be probably avaible soon. We have already x86-64 software, we have the next Xbox with a PPC arquiteture (no need for complex software in the machine, there’s need to a good work with these software… as most of the work will be done by VPU, and so on…) and also, there’s XNA initiative by MS, bringing easy portability to high performance demanding software for several (suported) plataforms.
Looks like MS doesn’t want to be jailed by plataforms to expand it’s programs/solutions…
…and going a little back to the topic… in the future, IMHO, there’s even more chances to x86 and legacy be replaced by ppc, similar and advanced plataforms… well, let’s just wait… =]
2005-01-21 5:23 pm
Anonymous
Yo Babiec, good way of pointing out the really compromised Logic of the article.
Oh and Drummond, that is what I was saying as well.
the Wintel world (MS+X86) didn’t succeed because of real innovation or technological advantage over other architectures or platforms.
It has succeded solely because of superior Marketing and Market vision.
I can count on more then 2 hands the much better technologies that “should have” but didn’t beat wintel.
(Amiga, OS2, BeOS, MacOS, Alpha, PPC, and on and on)
The thing is technology alone is useless as we have seen time and time again.
doesn’t matter how many GFLOPs the cell processor does, if it is not sold/supported properly.
(Alpha anyone?)
2005-01-21 6:35 pm
Anonymous
Like Bill, my first thought was ‘Atari Falcon’ (which had a 68030 CPU with 56k DSP). Just with more DSPs, and integrated into one chip. There’s nothing wrong with that, and it can be very useful for all applications that are using MMX/SSE/AltiVec today. But it’s no revolution, it just speeds up some applications.
2005-01-21 6:43 pm
Anonymous
Now one just needs a matching language.
OCCAM++ 🙂
2005-01-21 6:49 pm
Anonymous
@Anonymous: Since the Cells aren’t multithreaded, BeOS would not be the ideal OS to use here. I believe Sony will use an in-house RTOS for Cell, because the architecture is different enough that a lot of the concepts in mainstream OSs (eg: multithreading), can’t really be adapted to Cell.
@Bill: Yep, that’s a much better way of looking at how Cell works. The main difference is that from a programmer’s perspective, the dispatching to the vector processors is done automatically. However, individual APUlets must be packaged in a way that would allow a DSP-style parallel execution.
2005-01-21 7:16 pm
Anonymous
@Rayiner Hashem
I’m interested in this automation of which you speak… you mean automated by the OS, and then the OS would merely provide an interface for this.
2005-01-21 8:59 pm
Anonymous
thanks for responding, Rayiner Hashem, so would qnx be a better choice for something like the ps3?
if ps3 isn’t multithreaded, the xbox2 is multicore (3-core powerpc) wouldn’t be os be better for xbox 2?
2005-01-21 11:53 pm
Anonymous
Considering IBM’s current capacity for producing the PPC 970 and the current clock speed I highly doubt this thing will be any good for general purpose computing like he says. I mean it has no memory protection! That alone screams special case vector processing, which is fine. I see this thing using a much scaled down PU. Workstations will be built with a general purpose cpu and the Cell as a coprocessor for DSP, audio, video, compression etc. This guy makes it sound like IBM developed the POWER5 and is developing the POWER6 for nothing.
2005-01-22 12:57 am
Anonymous
Now I have no intention of defending this article but I think the above criticicisms went a little far. In particular expecting to offer detailed explanations for everything simply isn’t reasonable, for instance he really can’t be expected to perform a detailed analysis of how the memory protection mechanism might work. An article does have a limited depth so just because he doesn’t explain how paging protection might be worked around doesn’t mean an explanation doesn’t exist.
Since this is descended from the Power architecture I am guessing that the memory protection mechanism is something like that in 32bit G4/G5s (I don’t know about 64 bit PPC). Instead of (or perhaps in addition too if anyone can verify this I would be greatfull) page faults on wrong page access each application can only see certain areas of memory based on what is loaded into their segment lookup table. Given the descriptiong (if accurate) of the memory protection model it sounds like something similar might be at work. Each APU and probably the central unit could have several registers which dictate what parts of memory are visible (the containters mentioned).
Now nothing says that paging isn’t also present. It is just that no paging level protection is implemented. This means a memory access need only check a bit to see if the page is loaded and not perform more expensive permission checkings.
All in all I think it is a good guess that the CELL processor isn’t nearly as revolutionary as this kid thinks. It sounds to me alot like IBM has removed a bunch of front end hardware (instruction dispatcher, cache synchronization) from its power architecture to make room for more computation units. Its an interesting idea but many of the concerns the above analysis mentioned are potential shortcomings.
2005-01-22 2:01 am
Anonymous
Andrew,
This sort of attack just makes you look like an insensitive idiot who doesnt know what he’s on about. If you’re going to belittle someone’s hard work you could at least put some effort into making proper criticisms. For example:
You quote the author as saying “It’s clear to me that the DMAC is one of the most important parts of the Cell design, it doesn’t do processing itself but has to content with 10’s of Gigabytes of memory flowing through it at any one time … it needs to be not only fast but will also be a highly complex piece of engineering.”
You then say “Does he ever mention this as a possible bottleneck? No – not in la-la land”. What? Are you really telling me that his quote does not suggest to you that the DMAC might be a potential bottleneck if they don’t focus a lot of effort on its design? Why do you think he’s mentioned that it will need to be a highly complex piece of engineering? Does he really have to spell it out for you?
Then, he says “(This is a guess since no details have been released as yet) ” and you say “So how about you don’t”. What? So he’s not even allowed to guess about something? A lot of the article is speculative and I thought the author made a fair attempt at stating when he was speculating based on incomplete knowledge. Given that he was disecting patent documents which tend to be very general so as to stake as much territory as possible, I think that’s understandable.
I could go on and mention other attacks you made that were a result of your misinterpreting/misreading of the article but I don’t have the time or the inclination. The fact is that even *if* you were correct in all your criticisms, you could still have made an effort to make them constructively and with respect for the author. It’s really not that hard. Not doing so just makes you look childish.
I learnt more from 5 lines of Rayiner Hashem’s comments than 100 of yours. You should take a leaf out of his book.
2005-01-22 2:43 am
Anonymous
@ Not unless Microsoft supports it with Windows and it’s too late for that.
Microsoft is not needed anymore, there’s an army of developers eager to port Linux to any new architecture under the sun, if that CELL thing is that powerfull, Linux will be ported to it lightning fast…
Imagine any kind of VMWARE software running on that thing!
It is impossible to port VMWare to a non-x86 system . VMWare is virtulization software it is not emulation software. You could run BOCHS but then the typical performace of bochs is less than 1/100th of your CPU speed.
Emulation wont work on these systems anyway the APU’s by the sounds of it would be totally useless for emulation. As for the PU its just a modified PPC. Basically the author is way off on his emulation predictions.
Oh and also do you really think home users are going to want to run MS office in an emulator.
These are specialised processors, useless for the vast majority of normal applications which are I/O bound e.g. Word processing, Anything internet related, and a whole lot of other things.