Post a Comment
First off, CELL is a PPC
Correct. What I meant by PPC was the G5. Sorry for the confusion.
a bit of a contradiciton for a PPC to be competing against itself.
No contradiction - related products compete with each other all the time. It is why the Celeron is so completely (and artificially) crippled, because otherwise it would compete against Intel's high end and take away more expensive sales. A new Ford truck is going to be competing against the old one if it isn't discontinued, etc.
Second, IA64 is geared towards the server/high end markets, these blades have limited local memory and are used mostly as DSP appliances.
These blades, yes, but I see the chip in general competing in that high end market rather than the low end like these blades. I could be wrong about that though.
Huh? How is CELL competing against PPC and Itanium? First off, CELL is a PPC... a bit of a contradiciton for a PPC to be competing against itself.
This argument would suggest then, that Intel and AMD are not competitors, I mean, why would i386 compete with it self?
I may be wrong, but I believe the original poster was possibly suggesting that the PPC/IA64 segment of the market is an area, where the purchasers are in the habbit of either A) Not running "popular" applications, and very specific ones, B) have significant inhouse development facilities.
These sorts of people should not be compared to average Jo, who can't get his favourite application working.
Though I may be wrong, and the poster meant some thing entirely different.
Second, IA64 is geared towards the server/high end markets, these blades have limited local memory and are used mostly as DSP appliances.
I have to admit, at this point things are going way over my head, and I'm not in a position to really talk too much.
However, just because some thing is currently being developed towards the blade area, does that mean that it is exclusively capapble of dealing with that form of application.
Considering the CELL system I think is a fairly new technology, one could think that may be they are just starting? As I said, I don't really know, just wondering.
Maybe I need to clear it up for some of you: IA64 is like an apple, CELL is more like an orange. Get it?
Marvelously whitty, did you think it up all on your own? ;P
Hi,
now IBM has proved that Cell can be used as a single processor and does not need a host processor like the blades from Mercury. So the logical consequence would be to offer workstations equiped with this chip as a replacement for the PPC970 aka G5 processor currently used in the cheapest IBM workstation offerings. But the price has to come down, more than 18.000$ for a single blade? Come on.
Anton
Yes, Cell is not very suitable for normal PCs. It has nothing to do with the floating-point support, though you are right in that Cell's DP floating-point is much slower than its SP floating-point. The primary problem with Cell for desktops/workstations is that its integer performance sucks. There are two types of processors in the Cell, an SPE and a PPE. The SPEs are completely unsuitable for general-purpose code, since they can only directly address 256KB of memory. Any code utilizing the SPEs must be specially written to fit their memory model.
With the SPEs out of the picture, much of Cell's potential performance disappears. What's left is a very simplistic 2-issue in-order PPE. To make things worse, the PPE also has extremely high cache latencies, and a very long pipeline. All these sacrifices were made so the PPE could be clocked as highly as the even simpler SPEs without using large amounts of power. Added together, these inefficiencies compound, resulting in a processor whose general-purpose integer performance is probably at the level of a sub-1GHz PIII (and that's being optimistic).
Edited 2006-09-14 00:12
The Cell SPEs can stash and fetch chunks of their local memory to interleaved memories like RAMBUS memory very quickly. Just think of the Local Store memory like a software-controlled cache much like a disk cache. It traverses links in a linked list by double or triple-buffering the nodes and making the linked lists much faster.
BTW, the thing that slows down the SPEs are pipeline stalls due to excessively pipelining them for vector usage. Replacing search trees with search tries will speed up the matter making up for the difference.
If you are old enough to remember programming a RAM expanded Commodore 64 or a 286 with XMS memory then you'll have the hang of programming the SPEs as well in no time. It's just the same with some pipelining thrown in for speed.
In a Commodore 64 or 286 with expanded memory you have to DMA to/from conventional memory as well. How is this any different from what you've said? Apparently you don't consider a 286 to be a real PC or something becuase this kind of workaround is commonplace outside of the 32-bit realms. It is unusual for a 128-bit processor like the SPE to use such a workaround but I'm not concerned becuase the DMA process is fast for large chunks of memory.
IMHO, the days of readable source code without workarounds/threading is drawing to a close anyway. The days of optimized code are coming back.
Supposedly the IBM XLC Compiler can do a code partitioning, and software cache support (e.g. access to variables, array, etc. would ensure that the memory is present when it's needed), where the code is dynamically swapped in-out the SPU when needed. Here there are more documents about it:
http://www-128.ibm.com/developerworks/edu/pa-dw-pa-cbecompile5-i.ht...
http://cag.csail.mit.edu/crg/papers/eichenberger05cell.pdf#search=~...
There is a very long distance between initial results in a research paper, and a usable, mature implementation. The delay is only a matter of years, if you're lucky. There is nothing in the article that indicates anything like this is ready to ship for the forseeable future.







