Sun Microsystems has completed the design of its Niagara processor, a crucial product in the server maker’s effort to keep its own UltraSparc chip family competitive, a source familiar with the project said.
Sun Microsystems has completed the design of its Niagara processor, a crucial product in the server maker’s effort to keep its own UltraSparc chip family competitive, a source familiar with the project said.
Niagara is notable for an unusual design that includes eight processing engines, or cores, each of which is capable of handling four instruction sequences called threads. Sun variously calls this approach “throughput computing” or “chip multithreading.”
Jeez, why haven’t I heard of this thing before?
Seriously, though, it sounds as if Sun seems to be abandonning the Sparc architecture in the low end (at least, that doesn’t sound like a proc that’ll be in a desktop machine any time soon). Too bad – I’ve yet to see a machine that made removing/installing internal drives as easy as it was on a Sparc 20
What makes you think that Niagra is abandoning SPARC ?
Niagra IS a SPARC based chip aimed specifically for the low end.
…abandonning the Sparc architecture in the low end…
Apparently you have used any new Sparcs lately
Well, I don’t know that much about the features of the Niagara chip, but here is my wish list for a new Sun processor:
– Get OOO (out-of-order) execution! My, this has been done for so long by everyone bun Sun, and it’s still not in Sparcs (at least in UltraSPARC IIs, probably not in the IIIs either).
– Have a “low-end” model that fully supports “standard” PCs I/O, such as HyperTransport. Considering that one version of the UltraSPARC II (the IIi) already has a full northbridge integrated, it shouldn’t be too difficult to get a DDR controller in the new chip. A socket-939 compatible pinout would allow the use of Athlon64 mainboards, which in turn would lower the overall cost of the machine for end-users, and thus give Sun a chance to sell quite a lot more CPUs than now and get a big fat profit.
– This low-end chip would have two cores, just for the hell of it (everyone’s going for it anyway), and a big L3 on-die shared cache (2MB woudn’t be too much).
– Get a VIS2 instruction set in.
– Of course, it has to be superscalar. But for dog’s sake, keep a short pipeline! Don’t go the way the Prescott did…
– Manufacture it in 90nm SOI, although I don’t know if Texas Instruments has the facility to do it. Well, if not change manufacturer.
– Full Solaris 11 support, including drivers (and gcc!)
– Massive amount of free available docs for Linux and *BSD: no NDAs!
FYI, I’m writing this on an old Ultra2: 2x296MHz, 2MB cache per CPU, 1280 MB RAM, and so on. Real cool, love it!
– Get OOO (out-of-order) execution! My, this has been done for so long by everyone bun Sun, and it’s still not in Sparcs (at least in UltraSPARC IIs, probably not in the IIIs either).
Most OOOE cpus are reaching thier limits, the natural evolution of CPUs is Mult-core SMT designs. Noice Intel annouced an end to its single core designs and also moved thier high-end soultions to other architectures like EPIC.
– Have a “low-end” model that fully supports “standard” PCs I/O, such as HyperTransport. Considering that one version of the UltraSPARC II (the IIi) already has a full northbridge integrated, it shouldn’t be too difficult to get a DDR controller in the new chip. A socket-939 compatible pinout would allow the use of Athlon64 mainboards, which in turn would lower the overall cost of the machine for end-users, and thus give Sun a chance to sell quite a lot more CPUs than now and get a big fat profit.
Sound like the UltraSPARC IIIi, it has a DDR memory controller on chip.
Full Solaris 11 support, including drivers (and gcc!)
Hmmm. is there a reason to suspect Solaris 11 will drop support for SPARC, gcc is already available for solaris.
Most OOOE cpus are reaching thier limits, the natural evolution of CPUs is Mult-core SMT designs. Noice Intel annouced an end to its single core designs and also moved thier high-end soultions to other architectures like EPIC.
Out of order execution has been a pretty strong workhorse up till now. Itanium’s great performance is more a testament to their architecture and process technology rather than something against OOOE. *Anything* that alleivates cache miss stalls is only going to become more important in future.
Watch what happens when Intel add OOOE to their IA64 chips.
Hey where can i get a sparc based computer? i wanna try something different!! please post URL. thx
The Niagara chip is aimed on business workloads (server), not scientific workloads (workstation). I guess they will use Fujitsu SPARC64 V/VI deratives in future workstations.
The main difference between business workloads and scientific workloads is that generally scientific workloads have data parallelism and business workloads. Scientific workloads are generally also more predictable, since they generally have a damn lot of loops with predictable numbers of iterations. The combination of data parallelism and predictability is what drives IPC (instructions per cycle). Business workloads generally feature neither data parallelism nor predictability. So OOOE (out of order execution), which perfectly hides level 1 cache misses, but not level 2/3 cache misses, and speculative execution, which relies on predictability, do not help to quantum leap performance. The IPC remains low, damn low. To give numbers, scientific codes achieve IPC of 1.5 or higher, business workloads 0.25 or even less.
But generally business servers run several tasks concurrently. So there is a damn lot of task parallelism to exploit. And that is what the Niagara chip is designed for. It performs 32 threads concurrently. 4 per core, which perfectly matches the IPC less than 1/4 number.
Carsten
*Anything* that alleivates cache miss stalls is only going to become more important in future.
You mean like “Vertical Multithreading Threading” design that Niagara is based on.
You must have misunderstood my statement to mean OOOE is dead or I might have phreased it incorrectly. I was commenting on the fact that OOOE is not a marker to judge the advancement of processor architecture.
Ebay. 😀 Thats where I got mine, you can get an old 32-bit Sparc for well under $100, UltraSparcs can be had cheaply too. Not the fastest boxes around (older Sparcs) but if you want one to play with, or to learn Solaris on its native hardware its a great way to go! 🙂
You mean like “Vertical Multithreading Threading” design that Niagara is based on.
Yeah that too, but you could do that *and* OOOE. In fact, OOOE is basically a no-brainer in terms of improving memory performance characteristics (of course it will add a lot of complexity though). While multithreading includes a tradeoff: you now have more threads fighting to keep their working set in cache. Some workloads slow down with Hyperthreading on a P4 enabled for example, or both CPUs on a POWER4 enabled. This won’t be any different for Sun’s chip.
And multithreading / multiprocessing, of course, doesn’t do anything for single threaded tasks, which are still important.
“Too bad – I’ve yet to see a machine that made removing/installing internal drives as easy as it was on a Sparc 20 “
Then you haven’t seen the SGI Indigo (early 90’s).
Well as someone has pointed out, OOOE can help quite a lot of single-thread programs. Of course, we’re heading for multithreading everywhere, but it’s not the revolution some people want us to believe. For example, some algorithms can be inherently sequential, and pretty difficult to get to work in a parallel fashion. Then, you can only rely on either parallelisation of the input data, or a better pipeline fill using execution-time prediction and OOOE.
About Solaris 11, I was more talking about drivers for peripheral chips (bridges, I/O and so on). I’m still eagerly waiting for a full USB stack in Solaris, I hope it’ll be included in S10. And from what I know, gcc is not yet really efficient on SPARC CPUS… (I may be wrong on this, but considering the difference in load times between Mozilla-compiled-with-Forte and Mozilla-compiled-with-gcc, I think it’s pretty obvious.)
At last, the Itanium now includes OOOE, or will include it in the next revision. I can’t remember where I read that, but it seems that’s why there was such a huge performance gap between the very first version and the Itanium 2. Just have a read through the 4000+ pages of doc from Intel and HP, and you will discover that the fundamental paradigm of EPIC is a highly flawed one, and that they got it badly wrong from the beginning (the researchers admit it officially themselves…). The only way to decently improve performance was: 1) move away from the EPIC concept as much as possible, 2) hire the Alpha engineering team, 3) get some huge cache on-die to allow for a deep analysis of the compiler-parallelized code, and reshuffle it using execution-time information.
I’ve read somewhere that the optimal L3 cache size for the Itaniums is around 24 MBytes. Funnily, that’s the figure thet was announced some time ago for the next Itanium generation…
Well as someone has pointed out, OOOE can help quite a lot of single-thread programs.
Well the paradigm so far in the computing industry has been to optimize for ILP and single threaded programs. However, Server workloads have a very bad ipc. So Sun and also the rest of the industry (intel and AMD are moving heavily into the server market) realised that do the single thread as fast as you can won’t work for servers and probably even tomorrows desktops. With vanderpool and virtualization on the horizon, more and more applications are going to be Multithreaded.
While you are correct and Single threaded performance is important, not every market demands single threaded performance. So OOOE is not always benificial in cpu design. I think Sun’s thinking is let’s make a great server chip, where as conventional thinking is let’s make a good general chip.
About Solaris 11, I was more talking about drivers for peripheral chips (bridges, I/O and so on). I’m still eagerly waiting for a full USB stack in Solaris,
All bridges and I/O on SPARC products are supported on Solaris, do you mean Solaris x86? What do you mean by full USB stack? Solaris 8 has support for USB.
And multithreading / multiprocessing, of course, doesn’t do anything for single threaded tasks, which are still important.
True, but only to certain markets. Single threaded performance doesn’t do much for server workloads.
Niagara is designed to efficiently execute many threads at the expense of individual thread performance. Modern desktop cores are heavily optimized for single thread execution. A lot of silicon is used for things like out-of-order execution and predictive execution. Also, long pipelines are used to allow for increased clockspeed. Niagara’s design is fully toward the other extreme. The cores on Niagara are very simple so they can have a lot of them (8, I think). The multi-threaded design of each core also pushes multi-thread performance at the expense of individual thread performance. This means that Niagara will probably perform very poorly on desktop and workstation workloads, but should perform well on highly parallel server workloads.
Just knocked off a Proliant with 2x 3,2 GHz Xeon…
Go, SuSE, go.. 🙂
OK, I agree with you on the non-necessity of OOOE in some markets. And you’re also probably true on the point that Sun mainly cares about the server market: that’s where their cash is coming from in the end. Well, with Solaris for AMD64, I guess they’ll announce a Sun-badged worstation running on an Opteron at some point.
About the drivers, I was still talking about my wishlist chip, so it’d be support for SPARC-on-AMD64-motherboards… And are you 110% sure that Solaris has USB2.0 support? I dowloaded the USB DDK to have a look, I haven’t had time to play with it though.
Shifting a focus to software (Java) and commodity computing (Operton, linux) was the right thing for Sun to do. Hopefully it won’t pour too many resources in pursuing a novel architecture, unless Sun thinks the future is in the crowded specialty/high performance market (good luck taking on NEC, Cray, IBM, etc).
Once again, market viability, not technical viability, continues to be the biggest issue for Sun.
“Out of order execution has been a pretty strong workhorse up till now. Itanium’s great performance is more a testament to their architecture and process technology rather than something against OOOE. *Anything* that alleivates cache miss stalls is only going to become more important in future.”
That is cool, except that OOOE is not there to alleviate cache related (as in memory access related) stalls, but rather structural and dependency stalls. Most of which have nothing to do with cache but with execution.
Most of the benefits that come out of OOOE can be implemented by better static scheduling, or with hiting from the compiler. It is cool and all but the hughe instruction windows and the overhead in control logic required for OOOE made it not an ideal solution by a long shot.
“Hopefully it won’t pour too many resources in pursuing a novel architecture, unless Sun thinks the future is in the crowded specialty/high performance market (good luck taking on NEC, Cray, IBM, etc”
I disagree. It is all finally making lots of sense now. Solaris is becoming more and more just a kernel. Soon you won’t be able to tell the difference between their Solaris and their Linux except that Solaris will be able to do a lot more things on the server. However, the interface will become the same. Sun is also abandoning the SPARC for the desktop and low end marketshare. They are relying on AMD for this market’s future . They will just stick to selling computer with AMD inside and running either Linux or Solaris. On the high end server side, they are going to push Niagra. With data becoming more and more abundant (RFID’s), and companies going more and more towards terminals again, people will start to need the parallelism of Niagra. And to integrate it all, Sun is making a complete Directory Service/Security Software, J2EE, development tools, office suite and everything in between. It looks like Sun is finally getting things and giving people the features, performance, and integration they want.
there goes sun down the waterfall
this name is realy prophetic…
That is cool, except that OOOE is not there to alleviate cache related (as in memory access related) stalls, but rather structural and dependency stalls. Most of which have nothing to do with cache but with execution.
Most of the benefits that come out of OOOE can be implemented by better static scheduling, or with hiting from the compiler. It is cool and all but the hughe instruction windows and the overhead in control logic required for OOOE made it not an ideal solution by a long shot.
Out of order execution definitely can and does help alleviate cache miss penalties when combined with out of order memory access, which I believe has been used by Intel since the P6 core.
Now I don’t know why you say “it is cool and all but the …”. The transistor budget to reduce cache miss penalties is massive. If you can reduce cache miss stalls by 10%, it is probably worth several million transistors.
A P4 or Athlon can handily the Itanium in many things outside the number crunching field despite having less than a 10th the cache.