How many Opteron or Nocona processors can a computer system support? Good question, and one that only AMD and Intel can answer. Since they’re not saying, here are some system scalability facts you can draw upon when generating scalability guesstimates.
I honestly fail to see how this article clears up anything. The NUMA issue has as much to do with the OS as with the processor, so not much to go by the author’s speculation about x86 platform in particular.
The SMT argument against latency makes some sense, but if one were to consider it in the light of Hyperthreading benchmark results for many applications (improvement = 5 – 20%), it may not be sufficient to cover the added latency that comes with higher “NUMA-ness,” and thus not help scalability to any significant degree.
It seems to be a very sad pattern. Many technical writers seem to have no understanding of issues they discuss nor desire to consult competent individuals.
This editorial is no exception.
While it starts off quite fine the moment author dives into technical specifics (the four requirements) we know he has no clue.
Too bad
If I had taken architectural classes, the architectures of the day would have been S/360 and PDP. The irony is that the technical information of which I have no clue came not from me, but from directly from an engineer so clueless he that he jumped ship from the PDP-11 team and designed an architectural disaster called the MicroVAX II. It was such a disaster that only 100,000+ systems were sold and the engineer received only a few patents. Guess you’re right, the individual I consulted must be truly incompetent. Too bad
Presently, HP Opteron systems support up to 4 CPUs and 64GB of memory and run 64-bit versions of Linux, and soon Windows.
looks like the latest news says not-so-soon. but that’s not your fault, the news was released after your article
i find it hard to imagine that both intel and amd would invest so much in server-grade processors that can’t scale well. or maybe they got wrapped up in their own hype about x86-64 that the PR monster is driving the engineers. afaik, the instruction set should be fine (though i think x86, whether it be i86 or AMD64, is crappy and should have been tossed out a decade ago), perhaps the current architecture of intel and amd’s chips, which seem to be pretty similar, have to change to scale better.
The future of computers isn’t one big Server, its clustering. Small groupings of Server working together in paralle. If you want more bang for you buck bigger isn’t best. Divide your workload out into a CPU farms and SAN arrays. No single server needs more then 4 CPUS! The BUS gets to busy, even with direct access to RAM. Divide and conquer! If you have a big DB but each table onto its own server to handle the querys! It’s a lot cheap and just as fast if not faster. Clustering also has redundency as hardware failure isn’t a problem. You loss a Server its just becomes a little slower for a while, while you get it replaced.
Why am I not surprised? Note that I didn’t write “Real Soon Now”… knowing Microsoft as I do, I realize that that Redmond is in a time zone of its very own, one that cannot be defined via GMT offset. They seem to operate on a calendar right out of Star Trek. ;-}
Anyhow, I can’t blame AMD for taking its approach to 64-bitness. And Intel would have been foolish not to respond in kind, especially since they snagged the Opteron and extensions specs right off a public AMD Web site where they were inadvertantly posted. Oops…
Now, IF Opteron and Nocona can in fact scale to the enterprise level, Intel may find that maintaining parity with AMD will put a real hurting on Itanium. Not a problem I’d want to contend with, and mayhap an EPIC disaster in the making.
As for scaling, and the impact of SMT, I shall again defer to my “clueless” compadre, who was a corporate consulting engineer and a senior VP at a former computer firm formed by Ken Olsen. Sayeth the architect, “It’s not surprising that hyperthreading shows improvements of only 5-20% on many applications. Most applications are single-threaded. They won’t speed up with SMP either. Since we’re talking scalability, the proper metric is the speedup not on single applications but on the multi-application workloads typical of most servers.”
I’ll leave it at that. I reported, you decide. But you can rest assured that I do the requisite research before I “write strories.” I have done so since 1970, when I was an intelligence analyst with the ASA in Viet Nam. Back then, lives depended on the accuracy of the content of the “strories” (MACV Daily Intelligence Summaries, Songbird reports, and Prarie Fire reports, actually) I wrote. In this venue, editorial gaffes or reader misinterpretations might injure one’s credibility, but they don’t create corpses!
Get a f*cking education, there is a DIFFERENCE between CMT and SMT, find out the difference then respond. Geeze.
There were three architects to the MicroVax II: Dobberpuhl, Supnik and Witek. Which one is your buddy?
BTW, MicroVax II did not evolve from PDP-11. There was a MicroVax (that was not as good because it was a first model), but other being first single chip Vax cpu and having good performance it was not all that special.
The team was assembled in summer of ’82, but chip only came out in ’85. Alpha came out around the same time (as far as I remember), thus MicroVax II is hardly anything more than a solid implementation of long ago proven concepts.
Have your friend read you an IA-32 manual. It should provide some clues what is needed to support large number of CPUs regular board.
I was really hoping for some numbers. Perhaps artificial benchmarks on how well the bus and cache are utilized in smaller system, and how much room they may have. I really see Opteron as scaling well with its hypertransport links. Mabye as well as Itanium(or better?). Nocona will no do so well unless its a much larger change than just 64-bit extensions, because Intel still has a shared bus, which afaik, starts to gets bogged down with 4-cpus. SMT seems like a secondary thing. When you have many processors, its more important to make sure they can all get quick access to data, making NUMA much more important, than using a few spare cycles here or there(SMT).
Isn’t Opteron being used in a Cray supercomputer thess days?
Unisys has a line of 32 processor IA-32 machines that you can buy today. That sounds like a decent upper end for a single machine. BTW there are plenty of applications where large scale SMP systems still work better than clusters of 2 or 4 processor machines.
Isn’t Opteron being used in a Cray supercomputer thess days?
Still nothing compared to their X1, but it probably makes a pretty cool little cluster. I wonder if UNICOS runs on it for test flying software before putting it on big iron…
http://cray.com/products/systems/xd1/
I have bo way of getting the aforementioned numbers, lack the inside sources necessary for same. Correct on Hypertransport, it is very slick technology. Unfortunately HP (in the near) won’t be doing a 4-way Nocona-based server, so we won’t be able to compare HP 4-way Opteron-based ProLiant with an equivalent Nocona-based ProLiant from the same manufacturer (only difference between the products would be the CPU, otherwise architecturally alike and suitable for apples-to-apples benchmark testing)
Indeed there is an Opteron/Cray connection. The firms are collaborating on the development of “Thor’s Hammer,” the first version of a supercomputer system based on the Red Storm architecture, a new MPP computer design that will scale from a single cabinet and relatively few processors to hundreds of cabinets and thousands of processors. The system uses high production volume commodity processors combined with a very high performance 3-D mesh interconnect to produce high parallel efficiency on a broad spectrum of scientific and engineering applications and has an excellent price/performance ratio. Red Storm architecture and the Thor’s Hammer machine are being developed jointly by Cray, Inc., and the Department of Energy’s (DOE) National Nuclear Security Administration’s (NNSA) Sandia National Laboratories. Thor’s Hammer will be installed at Sandia National Laboratories in Albuquerque, NM, in the summer of 2004. That info is from a Sandia fact sheet. Installation is being delayed because the system is being refit with dual-core Opterons.
Specs are 10,368 compute node processors and 256 + 256 service and I/O node processors. Compute nodes are AMD Opteron processors, 10TB of DDR memory, 240TB of disk storage. Performance goal is ~40TFLOPS. There are plans to upsize the thing to accommodate as many as 40K Opterons.
~40TFLOPS would displace NEC’s Earth Simulator as top dog on the TOP500 List. Next list (number 24) comes out in November. IBM’s Blue Gene/L, which should deliver similar levels of performance, may be operational before Thor’s Hammer.
Ironically, Thor’s Hammer costs about $90M USD. The current Number Three machine on the TOP500 list is the ASCI-Q (now just “Q”) machine at Los Alamos National Labs. It’s a cluster of 2,048 AlphaServer ES45 systems running Tru64 UNIX and containing 8,1982 CPUs. Speed: about 20TFLOPS. Cost when implemented in 2002: $200M. So in two years we’ve had a 4x price/performance improvement!
More on Red Storm at my site, see “Red Storm Rising…” at
http://www.shannonknowshpc.com/stories.php?story=04/03/25/3347373
I’m a bit surprised that Cray is only shooting for 40 TFLOPS with that $90 million Opteron system. A fully decked out Cray X1 system would have a peak performance of 52.4 TFLOPS. I don’t know how much said configuration would cost though…looking forward to reading your article.
There were three architects to the MicroVax II: Dobberpuhl, Supnik and Witek. Which one is your buddy?
None of the above. System Implementer is better terminology. Supnick had finished the design for the VAX on a chip and was running VMS on the thing, but DEC marketing specialists (an oxymoron, that) saw no customer interest in a small VAX. The original plan was to implement the thing on “Aurora,” a system using the VAXBI bus. Aurora was a five-year development project.
BTW, MicroVax II did not evolve from PDP-11. There was a MicroVax (that was not as good because it was a first model), but other being first single chip Vax cpu and having good performance it was not all that special.
Never said the MicroVAX II evolved from the PDP-11. Its predecessor, the MicroVAX I, was NOT a single-chip implementation and did not really catch on.
Jesse Lipcon, then in the PDP-11 engineering group, thought that withholding the MicroVAX II until a VAXBI implementation could be fielded was a dumb idea. PDP-11 group manager Mike Gutman agreed, and let Jesse assemble a team of engineers to build a Q-bus-based MicroVAX II. Another “midnight project” not officially sanctioned by the DEC Executive Committee, but a darned successful one. One year after Jesse announced the MicroVAX II in May 1985, the product brought in $800M USD in revenue, and continued to gain momentum as faster, cheaper versions were rolled out. Aurora never got built, which is just as well as the VAXBI bus was a proprietary disaster.
The team was assembled in summer of ’82, but chip only came out in ’85. Alpha came out around the same time (as far as I remember).
Alpha was announced on 10 November 1992.
Thus MicroVax II is hardly anything more than a solid implementation of long ago proven concepts.
That it was. The single-chip implementation started at 0.9 VUP (VAX Units of Processing, a metric based on a benchmark suite wherein the VAX 11/780 equalled 1 VUP) and ended up at around 39 VUPs as it was reimplemented in CMOS at 2.7 VUPs and subsequently shrunk, tweaked, sped up, etc. The CPU formed the basis of the VAX 6000 “Calypso” midrange product family, which is where it finally topped out at 39 VUPs (code name wise, it went from CVAX I in 1986/7 to CVAX II to NVAX to Mariah to Rigel). In one 39-month period, performance jumped from 13 VUPs ro 39 VUPs. which was pretty impressive. In fact, less than 6 months after the VAX9000 atrocity (originally designed as the water-cooled Aquarius, then released as the declocked air-cooled Aridus), the VAX on a chip achieved or exceeded the performance of the $10B VAX9000 developed by the High Performance Systems Group under Bob Glorioso and Sultan Zia. A total of 454 VAX9000s were sold, and that number may be optimistic. That product arguably helped precipitate the DECline of Digital.
Have your friend read you an IA-32 manual. It should provide some clues what is needed to support large number of CPUs regular board.
I learned how to read 48 years ago, so I don’t need Jesse to read me anything.
One unresolved issue wth IA-32. x86 with extensions supports 64-bit Linux, Solaris, and at some point Windows. It does not support HP-UX, VMS, or NSK, and whether or not it actually could is unknown, at least to HP. So HP is stuck with Itanium as its 64-bit enterprise computing engine. If feasible, porting the three aforementioned OSes to Opteron/Nocona would take several years, giving the competition a huge window of opportunity.
I believe 40TFLOPS is the published goal for the initial incarnation of Red Storm. But since tbe box is now being refitted with dual-core Opterons, 40TFLOPS may be conservative. I believe the Phase One system has 10 or 12K CPUs, the architectural design supports ~40K processors. So 100 TFLOPS doesn’t sound unreasonable.
Anyone got projections on SGI’s Project Columbia SGI system being built at NASA Ames?
http://www.shannonknowshpc.com/stories.php?story=04/07/29/7717157