“We might be ahead of Apple’s product release cycle, and we’ve probably violated our Mac Pro’s warranty, but we just had to see what the Apple Mac Pro could do when populated with a pair of Intel’s brand-new quad-core Xeon 5355 processors,” Daniel A. Begun reports for CNET. They conclude: “Unless you do work normally relegated to high-end workstations, perform massively multitasking workloads, or just want the bragging rights, eight cores is definitely overkill – at least for now.”
“Unless you do work normally relegated to high-end workstations, perform massively multitasking workloads, or just want the bragging rights, eight cores is definitely overkill – at least for now.”
Definitely bragging rights
Sadly enough though, in a few years people will be looking at that machines thinking, “on that old piece of junk, nah, I need the new 64 core to run this software”
Edited 2006-11-15 21:01
I don’t quite see that yet. Everyone knows Moores Law but only programmers seem to know Amdahl’s law.
There are only a few problems that can be parralelized almost indefinetly, like in GPUs.
Most programs won’t gain anything from 64 processors.
Multi-core is the holy grail of desktop computing. Even at this early stage, it’s easy to see the OBVIOUS advantages.
But what will the future bring? Especially with dedicated FPU/GPU’s slaved to each core like CELL? And all software is designed to run on one or many cores?
No longer will Moore’s Law matter as the race for efficiently cuppled multi-core processors becomes the dominant market thrust. And reading the article makes my spine tingle!
All I’m hoping is AMD can come back with a decent competing product. It makes me wonder if AMD didn’t buy ATI to pull off the multi-core, multi-FPU/GPU hat trick.
What about the demand for efficient software? Software at the moment is horribly ill prepared for massively multi core processors, and it could take quite some time before your 32-core chip is even barely used efficiently. By current trends Unix and Linux users could be miles ahead of Windows in this regard simply because of better APIs and threading models.
What? That’s ridiculous. Windows has had a very nice threading model since NT was released in ’93 or so, while Unix has the bolted on hack known as pthreads. In my experience, Win32’s threading API:s are much nicer.
I saw someone once some sort of a benchmark and it showed pthreads were faster than Windows’ threading implementation. And anyway, Linux kernel atleast utilizes multiple cores better than XP (or older ones), though I don’t know about Vista. By this I mean the processes are distributed quite evenly across all the available cores, with the ability of controlling which apps goes where. As far as I know, you can even reserve one or more cores for single app usage. Though, note that I’m not an expert in these things.
Without knowing anything about that benchmark there’s nothing that can be said about it.
And you can easily control where threads are executed on Windows too, with SetThreadAffinityMask() and SetProcessAffinityMask(). I haven’t done any heavy performance testing on Vista 64, which I’m running right now, but I’m certainly not having any problems. Windows is well prepared for the multi-core future.
Just out of curiosity, if an app itself doesn’t request to be run on a certain core/cores, is there any way to make those apps run on a certain one? Or limit it and it’s threads to a certain set of cores?
You can do it in task manager, by launching it from a process that itself is limited to a core, or by tweaking some flags in the PE header.
setting affinity for processes:
– in windows nt since 1993
– in linux [2.5 kernel] since 2003
😉
Edited 2006-11-16 00:06
got any links to back this up ?
only interested in the stuff
thanks
I ran NT on a Tyan Tiger230 for a while… It blue screened frequently, and the processor sharing was anything but equal, with CPU0 running hot, and CPU1 running barely warm. I ended up pulling the motherboard and replacing it with a single CPU board. I found Windows SMP to be very fragile, and not “symmetric” at all. That same board ran BeOS fine for months without problems, and the CPUs tracked each other in workload and temperature. BeOS may have had other problems, but CPU load sharing worked lots better than NT. What good is threading when the OS doesn’t have the smarts to use it to advantage?
I ran NT on a Tyan Tiger230 for a while… It blue screened frequently, and the processor sharing was anything but equal, with CPU0 running hot, and CPU1 running barely warm. I ended up pulling the motherboard and replacing it with a single CPU board. I found Windows SMP to be very fragile, and not “symmetric” at all.
Probably a bad board. The Tiger supported PIII’s right? Those were smokin!
NT ran great on a dual PPro 200 I had. Sweet.
For some reason, BeOS was generally reputed to be a lot more stable with many SMP boards than NT4/2k (including the Abit VP6). I’m not sure if it was due to BeOS being more tolerant of lousy hardware, or not stressing/fully utilizing the hardware to the same degree as NT. But I do remember a lot of comments on the topic in comp.sys.be.help back in the day.
FurryOne: In the early days, lots of drivers were not SMP aware (because of the rarity of multicpu systems) and did stupid things, causing problems like that.
I didn’t have access to multi-CPU systems back then. The situation is different now.
I still recall seeing benchmarks of Novell Netware, Windows NT 4.0 and IBM OS/2 Warp Server Advanced which stated that OS/2 did considerably better than both its opponents in file and print performance.
Excerpt from:
http://www.databook.bz/default.nsf/8525608c005e322585255d7c00545af7…
‘Warp Server’s performance is outstanding. An independent test conducted by PC Week Labs shows that OS/2 Warp Server running on a single processor outperforms both Microsoft Windows NT Server and Novell Netware 4.1 running on four-way SMP (symmetric-multiprocessing) equipped servers.
According to the tests, OS/2 Warp Server running on a single processor system had a peak performance of 56M bps (megabits per second), outperforming Windows NT Server by up to 26 percent in file and print services. Windows NT Server running on a four way SMP system performed at a maximum of 44 Mbps, while Netware 4.1 placed last. The testing was conducted with Ziff-Davis Benchmark Operation’s newly released NetBench 4.01, with file and print services running on a 100 Mbps Fast Ethernet network.’
This is not to say Linux treading and SMP capabilities are better, but rather to say that within this field Windows NT 4.0 Server fared less than great compared to its direct competition.
Your experience is hardly typical – I’ve worked in plenty of trading environments where Win2k and better are deployed as standard on SMP hardware without general issue, and you should consider what the almost-all-of-them majority of multicores are running now. If we want to piss on older versions prior to (say) NT 3.5.1, then we’d have to look at early LinuxThreads too, so lets not go there, because it’ll get ugly all round.
Bear in mind also that the results that Microsoft achieve in TPM and dynamic web benchmarks are not a fluke, and for extra points revisit old Volano-mark results.
It is certainly the case that the XP scheduler could do better, but you have overegged the cake somewhat.
What good id threading if the OS isn’t very well threaded? Well, time in the OS is time not in my code anyway, and an OS which uses aio models primarily rather than threaded calls through to the devices can work rather differently in terms of threading internally.
Win32 is a fine SMP environment for JVM and CLR apps. MS’s C runtime heap isn’t well threaded, but that can be circumvented.
>By current trends Unix and Linux users could be
>miles ahead of Windows in this regard simply >because of better APIs and threading models
Older programmers with Windows experience often have considerable OS/2 experience, and the OS/2 and Windows models always had threading as a given (and we didn’t have a COW fork()) – so we generally have more (or at least, longer) experience. pthread was around in draft for years and years, but its only relatively recently standardised, and the system scope synchronisation primitives are still poorly supported.
The pthread model is lower-level than the Win32 APIs but its hardly better in practice except for the flexibility that can give in the right hands – and is sadly lacking in terms of unification of waiting on a combination of descriptors and synch objects.
In practice, both APIs are flawed, but adequate.
Would you care to provide some evidence to support your point of view?
No longer will Moore’s Law matter as the race for efficiently cuppled multi-core processors becomes the dominant market thrust.
We’re seeing multi-core processors because of Moore’s law.
Moore’s law has nothing to do with clock speed. It has to do with the number of transistors available:
“the number of transistors that can be fit onto a square inch of silicon doubles every 12 months.”
(Source: http://arstechnica.com/articles/paedia/cpu/moore.ars)
Power consumption and physics acted as barriers to clock speed, but physics has not (yet) become a barrier to the number of transistors that can be squeezed onto a silicon wafer.
So if transistors keep getting cheaper (because they keep getting smaller, so you can fit more of them within the same area), what do you do with all those extra transistors?
You either make your processors smaller, allowing you to create more processors at once (making them cheaper to manufacture), or you keep the processor the same physical size but place more functionality onto the processor.
Behold: multi-core processors — placing more functionality into the processor.
It’s all in keeping with Moore’s law.
This is same as with clock speed and even more.
All transistors are dissipating heat. Amount of that heat cannot be less than needed to distinct in reliable way o and 1 at given speed.
Putting more transistor into same or lesser volume, you meet that barrier – more heat and in way much harder to cool down.
Other problems – too thin insulator layers inside transistor structure, causing leaks, additional power dissipation and unreliability.
One more, related to first. To avoid overheating with high density packaging, you need to reduce voltage/current which leads to lesser amount of charges acting in switch. Down to some tens of electrons – which means, together with all that noise, incl generated by heat-electrons, that schematics is going even more unreliable again.
Edited 2006-11-16 00:51
// “Unless you do work normally relegated to high-end workstations, perform massively multitasking workloads, or just want the bragging rights, eight cores is definitely overkill – at least for now.” //
Maybe. Bragging rights, for sure at the moment. I think as covergence continues to move forward we’ll need this extra processing power. I think eventually we’ll have one device that’ll do all of our living room (TV, video, DVD, PVR, etc), personal computing, home automation and perhaps even home security in one.
Eventually we’ll all have two or three television recievers and encoders in one box so we can record multiple TV shows or capture video from our camcorders and we’ll not want to wait for one to be available for use.
// at least for now //
Definately.
Eventually we’ll all have two or three television recievers and encoders in one box so we can record multiple TV shows or capture video from our camcorders and we’ll not want to wait for one to be available for use.
I think this is incorrect. I think that a far more likely scenario is that everything will be “on demand”. Why should everybody store a copy of every TV show when it can just be piped over the network when needed?
That’s not what I meant. What I mean is that peoples lives tend to get busier and busier (or so the media tells us) and we’ll want to record things like free to air and cable television whilst we’re doing other things. Also how often have there been programs on different channels at the same time you’ve wanted to watch? For me, it’s quite often especially on cable.
As for on demand, yes that’s probably true, but not for a many years I feel. Depending on country, broadband penetration even for the lowest speeds is still not covering millions of people, let alone high enough speeds to support on demand streamed television. A case in point would be rural or low per square kilometer population density countries.
Keep in mind that in many / most households there are multiple people (children, renters, etc) who have different views on what they want to view and capture for viewing later on and this should in any smart multi receiver system.
I think eventually when broadband speed is sufficiently high and available you are correct, but in any event, people will still want to record.
It’s rare for me that there’s one show on at a time that I want to watch. I’m not alone.
I cannot wait to upgrade next year. By then i’ll splurge on workstation grade hardware and run quad quad cores with as much ram as the board can handle
keep me from having to upgrade for 10-15 more years…
(my last upgrade was to a 600mhz p3 )
oh, and before that it was a p133…
I like to jump in my speeds…
Edited 2006-11-15 22:07
i use a ibm pc with a 500Mhz Pentium III (Katmai) processor
Well, as a recent Mac Pro purchaser, I’m just giddy — overkill or no.
What I was looking for here is simply that, yes, indeed, the machine reasonably scales by dropping in the processors, and from the graphs shown it, reasonably, does.
I don’t look to this as Window v Mac debate at all, I mean, they’re pre-production chips. And XP benchmarks are moot now anyway with Vista coming around the corner.
But it’s exciting that I may perhaps someday, when I simply need “mo’ powa” might be able to drop these honey’s straight on to my MB and get a free “turbo button”.
You could notice that at a given time when an instruction execution is required the 8 cores simaltaneously will funtion thus the OSX is more efficient than WindowXP or Windows server 2003, when handling multicore CPUs. I have many dual core CPUs and neither of them reach 100% when I benchmark them, unless I use POV; but when quad core if used and two of them, then the story differ and the OS is the bottleneck in that area due to inefficient scheduling and task slicing/feeding.
Great Review!
Now there’s a machine that would run BeOS nicely. There definitely are some exaggerated notions out there regarding BeOS’s abilities, but how efficiently it scaled to multiple processors was definitely *not* a myth. I went from a P3 550 to dual 1Ghz P3s – with the single proc, it was still pretty responsive. But with 2, it’s pretty much impossible to make it unresponsive. Even with the 550, it still *felt* significantly faster than my Athlon64-based XP box.
Have you tried Firefox 2.0 while opening 8 or more sites? I am running an ABIT with 2 Celeron CPUs at 550 MHz and the only thing that slows the ENTIRE BeOS environment down is Firefox.
Does a dual 1Ghz P3s solve that problem? Otherwise, yes BeOS rocks with multiple CPUs, the more the better.
I want that machine…even though I can afford it right now…I would rather wait to see how AMD 4 x 4’s perform compared to a single Intel quad core. I have a feeling that as numbers of sockets and cores increase, unless Intel switch to their CSI stuff, it aint gonna cut the mustard. 4 x 4 previews have already been massively impressive from AMD albeit they are in controlled environments. I really wish to buy a brand new rig right now but I cant not until AMD quad cores are out!!!
Why do we call it a law? That has always bothered me. If it’s a law then it has to be true.
Shouldn’t we just say Moore’s Observation…