Cell Could Offer Dramatic Boost for Scientific Computing

Thom Holwerda 2006-06-17 IBM 30 Comments

A new paper from a group at Lawrence Berkeley National Laboratory, “The Potential of the Cell Processor Scientific Computing” [.pdf], explores the performance of IBM’s Cell processor on some specific types of code commonly found in high-performance computing applications. The paper compare Cell’s performance on these kernels to the performance of the Cray X1E, AMD Opteron, and Intel’s Itanium2. The idea here is that Cell will be a commodity processor (at least that’s what the authors and IBM hope), so it’ll be a viable HPC alternative for the cost-sensitive academic research market. This paper represents the first formal academic attempt to decide if Cell hardware is something that researchers will want to invest in. So how does Cell stack up in comparison to these three competitors? In a word, it screams.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

30 Comments

2006-06-17 10:28 pm

theine
What a hideously looking research paper. I wonder whether they use Openoffice or Word.

2006-06-17 10:34 pm

Thom Holwerda
What a hideously looking research paper.

I think it looks clean. I prefer two columns, as I really detest reading long lines of text. I start reading at the wrong line when my eyes move from the end of a line to the next one; this is because I read at literally lightning speed (it’s scary sometimes, seriously).

2006-06-18 11:23 pm

Alex Forster
No way, literally?

2006-06-17 10:43 pm

rayiner
Heh. Look at the abstract. Every other line is hyphenated. That crappy H&J didn’t come from TeX I’m sure of that.

2006-06-17 11:49 pm

chemical_scum
Yea superficially it looks like a typical academic paper prepared in TeX not a Word Processor. But you are right there is a lot more hyphenated lines than you would expect.

To check this I downloaded a random recent paper from astro-phys on arXiv. I checked to make sure that it had been prepared in TeX by downloading the source as well as the PDF. It had a lot fewer hyphenated lines than the LBNL paper (though the astro-phys paper had more than I expected) and looked indefinably better.

Anyone got an idea what it was really prepared in ?

2006-06-18 2:35 am

whenney
Anyone got an idea what it was really prepared in ?

Obviously LaTeX – it uses CM fonts for a start. Yes it is ugly, but that is not the authors’ fault, but rather of whoever designed the ACM macros (http://www.acm.org/sigs/pubs/proceed/template.html)

Edited 2006-06-18 02:36

2006-06-18 7:39 am

Lakedaemon
The reserch article looks quite okeyish for me.

It was obviously written with (La)teX.

The many hyphens come form the fact that lines are very short due to the 2 columns.

What bothers me more is the sentences sticking out of the columns, and the lack of vertical space which makes paragraph a very indigest read.

it seems on their templates there :

http://www.acm.org/sigs/pubs/proceed/template.html

that they are concerned about the number of pages of their papers (which isn’t surprising for research paper)…

So, in my opinion, this paper isn’t as ugly as was said, if you consider that they traded aestetics for a little number of pages.

The author could have worked a bit more on his sentences to avoid (weird) hyphenation…that’s true, but research paper authors only care for facts sometimes.

Lakedaemon

Edited 2006-06-18 07:40
2006-06-18 12:55 pm

theine
It was obviously written with (La)teX.

Indeed, this is what pdfinfo gives:

Title: CF06_final1.dvi

Creator: dvips(k) 5.95b Copyright 2005 Radical Eye Software

Producer: AFPL Ghostscript 8.51

I have to say I’m realy suprised that somebody managed to make LaTeX output look this bad.

Edited 2006-06-18 12:57

2006-06-18 10:39 am

netpython
What a hideously looking research paper. I wonder whether they use Openoffice or Word.

They clearly preferred function(s) above fashion.

2006-06-17 10:41 pm

theine
Two columns are fine and standard — I was more referring to line/paragraph spacing and fonts.
2006-06-18 1:14 am

poohgee
.. & aesthetics 8) – Cool .

But yeah that paper wont win beauty contests IMO

The headline makes all this sound like a surprise – isn’t this exactly what this multi-CPU Cell thing was designed for ?

Are there any other processors of similar design to Cell out there ? because – I dont know bout Cray – but AMD & Intel I guess have the problem of having to be backwards compatible x86 chips so they cant just go mad on some advances in hardware developments .

The Cell is brand new without any backwards compatibility & PC software etc that it has to remember .

2006-06-18 1:40 am

fffffh
Cell use a modified core of Power arhitecure.

2006-06-18 1:58 am

transputer_guy
There will already be alot of Opteron server systems already headed to HPC land and the Cell/PS3 has not yet proved itself to be a viable or cost effective platform, who knows what the yield will be.

Recently we saw the introduction of FPGA coprocessors for the Opteron socket, a brilliant idea I believe, made possible by the recent opening up of the HT bus.

I could see a similar opportunity for a Cell as coprocessor to Opteron in that same second socket. I am not sure it that makes complete sense, or even if IBM has a compatible HT link. Such a module would have just the Cell cpu, some RDRam for which it has special interfaces, and possibly an FPGA HT bridge if needed.

Atleast such a solution would allow Opteron systems to continue on and the risk of investing in Cell platform much reduced. This makes the same sense as not designing special purpose Opteron+FPGA boards the way Cray/Octiga Bay did and allows the customer to choose from various Opteron server boards.

The same will likely happen with ClearSpeed as the other FPU candidate.

2006-06-18 2:39 am

SamuraiCrow
Most high-end supercomputing platforms run some Unix variant so the processor used would only be as critical as its need to run the software: function over form.

The Cell processor contains a fully functional PowerPC processor core (substituting IBM’s brand of hyperthreading for out of order execution) so it will already run Linux. Why use Opteron as a host processor when the PowerPC Cell is self-hosted?

It is more likely that ClearSpeed will come out as a similarly self-hosted Opteron spin-off as a competitor to the Cell. Due to endian issues the two systems cannot easily be mixed.

Here’s an article on ClearSpeed for the uninitiated:

http://www.reed-electronics.com/electronicnews/article/CA6316147?ni…

2006-06-19 6:16 pm

Wes Felter
Why use Opteron as a host processor when the PowerPC Cell is self-hosted?

Because the performance is very different.

2006-06-18 10:37 am

John.Gustafsson
Hmm, one quad Opteron + 3 co-processors with 16 “pipelines” each gives us 4 massively fast CPUs with 48 coprocessor units. Compare that to 4 Cells which gives us 4 slow CPUs and 32 coprocessor units (possibly 28).

The cell is not, and will not, be a silver bullet, but rather the PS2 on steriods (and we all know what a pain the PS2 is to code for…).

2006-06-18 11:22 am

MediaSex
“one quad Opteron + 3 co-processors with 16 “pipelines” each gives us 4 massively fast CPUs with 48 coprocessor units. Compare that to 4 Cells which gives us 4 slow CPUs and 32 coprocessor units (possibly 28). ”

Cell, coprocessor…

Stupid x86 fanboy.

“and we all know what a pain the PS2 is to code for…”

What an idiot.

Here you go Einstein:

http://research.scea.com/research/html/CellGDC05/index.html

Read it and save yourself from making a fool of yourself in the future.
2006-06-18 11:31 pm

ceo1
Only one way to find out:

Let’s see if it stands the test of time. If IBM/Mercury/etc manage to make the Cell a ‘commodity’ processor, then perfect.

As it stands now, the GFLOPS/$ ratio isn’t terribly attractive and although I welcome diversity, I do not see the Cell succeed on a large scale (e.g. medical imaging, oil and gas, research/scientific community).

It will be equally interesting to see the GPGPU approach take off. The GPU is already a commodity.

-CEO

2006-06-19 2:01 am

rayiner
Actually, the GFLOPS/$ ratio should be excellent with the 65nm shrink, which will come sometime next year. The chip will be small (~120mm^2), produced in large quantities, and have its development cost subsidized by the PS3.

2006-06-18 2:50 am

Cloudy
Back in the old days in supercomputing, we called results like these “not to be exceeded” numbers, because they always assume the best performance possible from the system.

It’s funny to see them in the future perfect tense though.
2006-06-18 7:21 am

Marcellus
One problem with the cell is that it is only single-precision which is not too useful in HPC applications.

A problem with the results from the paper itself is that the Cell tests were handtuned to extract the best numbers possible. Something that you’re not likely to spend that much time on in a real world application, where you have time constraints (time to implement) to take into account as well.

2006-06-18 11:26 am

MediaSex
“One problem with the cell is that it is only single-precision ”

Bzzzt!!!

Your competence to comment on the topic is not that impressive.

“Something that you’re not likely to spend that much time on in a real world application”

You mean all of us console companies, defense contractors, medical computing, media companies are all wasting our time on Cell based systems???

Oh no!

2006-06-18 3:56 pm

Marcellus
Ok, so I forgot to mention that Cell does have double-precision support as well… Maybe I forgot because the double-precision performance stink.

You mean all of us console companies, defense contractors, medical computing, media companies are all wasting our time on Cell based systems???

I’m only aware of a single console company (Sony) that is playing with the Cell.

I’m not aware of any defense contractors, medical computing or media companies that are building anything around Cell.

2006-06-18 4:57 pm

rayiner
1) The article was about Cell’s double-precision performance. While the article did test simulation, rather than hardware, it showed that Cell’s double-precision performance could be quite usable as well.

2) Raytheon is working with IBM to use Cell in defense applications, Mercury Computer Systems is releasing a Cell-based blade server for industrial and medical computing, and Toshiba is going to use Cell in HDTVs.
2006-06-18 11:28 pm

Alex Forster
Maybe I forgot because the double-precision performance stink.

If by “stink” you mean “only marginally above average as opposed to eye-wideningly above average.”

2006-06-18 9:20 am

datadevil
Not that a cpu is purely for 3d, but it does show its potential; I saw a demo of a Cell powered machine running a 3d flightsim on a conference this week, pitched against a ppc, and it was a framerate of 8-10 against 50 or so, pretty impressive.

2006-06-18 11:36 pm

ceo1
Are you by any chancereferring to the precorded flight sim video of the Cell vs the PPC at EAGE in Vienna ?

2006-06-18 12:34 pm

tamlin
… and that is a CPU, or even a well-documented FPGA (though a CPU, even if something as Cell, is more known and understood by most) on a PCI card, with tools included to program and use it.

If this Cell CPU is capable of over 200 GFLOPS as the linked ArsTechnica article refers, why not throw just a few MB of SRAM onto a PCI card, put a Cell chip on it, and have a darned Cray-killer-on-a-PCI-card (sounds almost like the “pogo-on-a-stick” from SpaceQuest 3, doesn’t it 🙂 ).

I know I could have most certainly made very good use of such a thing, had it been priced correctly of course when I wrote and ran some RSA-576 factoring code a few years back – heck, I even considered buying a Xilinx (sp?) kit due to general purpose CPU’s being so horribly slow on large integer math (they have to do it sequentially, at that time 32 bits at a time, while an FPGA clocked at a mere 100MHz could do it all in parallel and literally do a 576-bit * 576-bit multiplication in 2-3 clockcycles).

What did however worry me about the Cell was from the ars article “… and Cell still manages to trounce the other guys at performance/watt”. This was in comment to the “fact” that the Cell was (according to those number s) able to churn out 204.7 GFLOPS, while the Cray X1E stopped at 29.5.

I don’t know about you, but if you multiply the power consumption of a Cray X1E by just 6 (not even reaching the Cells alleged capability) and the Cell still comes out on top for power/FLOPS, it could in theory mean the Cell sucks more than 6x the power of a Cray X1E. We’re talking about shipping a CPU with integrated power plant and colling solution in a container now – just for one CPU! 🙂

I hope someone knowing more about this can put my worries to rest.

2006-06-19 9:38 am

nimble
an FPGA clocked at a mere 100MHz could do it all in parallel and literally do a 576-bit * 576-bit multiplication in 2-3 clockcycles.

No way. Perhaps at 10 MHz, if you had a big enough FPGA. But even the fattest Virtex-4 (in a >1000-pin package) has “only” 512 dedicated 18-bit multipliers (forget about doing this in regular FPGA fabric). Please correct me if I’m wrong, but to do all the partial multiplications in parallel, you’d need (576/18)^2 = 1024 of those multipliers, and then you still need enough LUT to add them all up.

2006-06-19 1:48 pm

theine
They clearly preferred function(s) above fashion.

This sounds like you somehow have to make a compromise between the two — which you clearly do not.