“In an announcement at the International Supercomputing Conference, Intel provided further details on the many-core chip that it hinted at earlier in the month. The first product based on the new design will be codenamed Knight’s Corner, and will debut at 22nm with around 50 x86 cores on the same die. Developer kits, which include a prototype of the design called Knight’s Ferry, have been shipping to select partners for a while now, and will ship more broadly in the second half of the year. When Intel moves to 22nm in 2011, Knight’s Corner will make its official debut.”
Would it have been any better had they named it Knight’s Fury? No of course not but it really would have been a better name. It would have made more sense with Knight’s Ferry being the developer version.
All I’m not impressed with intel so far their designs are little more than *take small cores and network together* which isn’t the best approach for supercomputing IMO cloud computeing perhaps but not supercomputing.
Super computers don’t actually require complicated CPU designs given that the instructions are fixed; all the CPU that is required to do is suck in the information, crunch it and spit out the other side. When you use a super computer for number crunching you’re pushing in a sequence of equations and pumping out a result the other end so you can get away with stripping off branch prediction and so forth because all you’re really interested in is raw power.
Have a simple CPU design, chain together those many cores, parallelise the code to buggery, a heck load of bandwidth and a clock speed going gang busters and you’ll be all set to party.
Edited 2010-06-02 00:45 UTC
Instructions are always fixed, it’s called an ISA. And what you are describing seems to be some kind of stream processor. Most super computers need to be good in several tasks, with algorithms which can be parallelized successfully to varying degrees. Of course even if you parallelise it to bits there is always Amdahl’s law.
You can only get away with stripping off branch prediction (and I presume other niceties such as Out Of Order Execution) if you have a well behaved algorithm, which you almost never have in reality. Of course some (parts of) algorithms run well on GPUs, which is what you seem to be describing here.
Again, this only works for some algorithms. Communication between processors does not scale that well for most workloads. So you’d rather want fewer high performance cores, than more low performance cores. Scaling is not very important if your total performance still sucks.
If you don’t believe me, check out the super computer top 500. Almost all systems use Xeons or Opterons.
What Intel is building here is interesting. Larrabee was supposed to be a many core x86 processor with massive vector units. The memory system was cache coherent using a massive ring bus. There were serious doubts as to if it would scale very well even for embarrassingly parallel workloads. This MIC might look more like the other project Intel had, in which there was no cache coherency but all chips were connected by a switched network and one had to use explicit message passing between threads in software, almost like a cluster on a chip.
I saw an Intel video about these experimental many-core chip designs and cloud computing was specifically intended to be the target. Their whole goal is to explore ways to further improve space- and power-efficiency in cloud computing datacenters. (eg. by having 50 cores that consume as much as a single high-end CPU and can be throttled back to a 10th of that at off-peak times)
Doubtful it will be useful in super computing. The current 6 core chips tended to not have nearly the performance improvement over 4 core chips that was expected. Mostly because the problem isn’t CPUs count. Or CPU speed. The problem is the memory. Memory bandwidth is the killer, and no one seems to be offering solutions.
Well … intel has eDRAM which is more compact which is also why intel chips have such huge caches these days…
I would be curious to see what would happen if cores were capped at 4 and whatever extra die space were thrown at cache and a real integrated GPU design where it would be more akin to how an FPU is treated instead of just a device hanging off of PCI-E
Aren’t you confusing them with IBM? I’m pretty sure Intel is just really good at making small cheap SRAM.
On top of that, even eDRAM would leave them with the problem of having to have a royal caravan of RAM slots–it would only make the size of cache cheaper. eDRAM is still no performance match for SRAM.
However, even with SRAM caches, workloads that can crunch on moderate sizes of data that can be fit into a shared cache might be able to work very fast, without jacking up the RAM bandwidth. If Intel needed to, I’m sure they could do 32+MB SRAM caches on a die, and still make their high margins.
Edited 2010-06-02 04:09 UTC
These are more likely to be used as co-processors rather than as a replacement for the nodes primary CPU. It’s similar to what nVidia & Clearspeed already do.
INVENTORS – DO NOT TRUST INTEL
I invented a CPU cooler – 3 times better than best – better than water. Intel have major CPU cooling problems – “Intel’s microprocessors were generating so much heat that they were melting” (iht.com) – try to talk to them – they send my communications to my competitor & will not talk to me.
Winners of major ‘Corporate Social Responsibility’ awardS!!!
Huh!!!!
When did RICO get repealed?”
INVENTORS – DO NOT TRUST INTEL!!!
BTW, I have the evidence – my competitor gave it to me.
BBTW, I am prepared to apologise to Intel if;
• They can show that the actions were those of a single individual in the company, acting outside corporate policy, and:
• They gain redress on my behalf.
Although playing a major role in it’s facilitation, the power of the internet appears to have come as much a surprise to Intel as it has to the catholic church.
Inventors – help your fellow inventors – share your experiences with companies – good and bad.
At last! realtime raytracing!
This “multiple low-powered core” technology is not gonna last on the desktop, the day people realize that only few problems scale well accross multiple cores.
For virtualization-oriented servers, on the other hand, putting that together with NUMA could do wonders. But as other people around, I think that bus bandwidth issues will kill this product.
Edited 2010-06-02 07:42 UTC
You mean only few software programs scale well across multiple cores. There are many problems that can be decomposed into parallel tasks, you just need to build your software from the ground up to take advantage of large number of parallel execution units.
There are many things people do on desktop machines that benefit from multicore processors: audio/video encoding, digital photography, data rendering, be it a complex 3D scene or office/web document. And many new problems can be created to fill the demand for such hardware.
No one says that all existing software or all existing types of software should scale to multiple cores. It’s more an issue of existing software taking advantage of parallel processing for different kinds of tasks.
Software like photoshop, 3ds max, even web browsers (scaling javascript and the rendering processes) can be modified to take advantage. Audio software can greatly be benefitted too (run multiple virtual effects/synthesizers each on a separate core), and of course videogames (physics simulation, renderingm etc).
So the target is to give more power to existing software, not asking it to be rewritten…
Why the odd number of 50 though?
Can’t wait to get our hands on this chip to see how it performs! This is something we want to support in BareMetal OS (http://www.returninfinity.com) for HPC.
Edited 2010-06-02 13:11 UTC