CPU Startup Combines CPU+DRAM – and a Whole Bunch of Crazy

Thom Holwerda 2012-01-23 Hardware 22 Comments

“The CPU design firm Venray Technology announced a new product design this week that it claims can deliver enormous performance benefits by combining CPU and DRAM on to a single piece of silicon. We spent some time earlier this fall discussing the new TOMI (Thread Optimized Multiprocessor) with company CTO Russell Fish, but while the idea is interesting; its presentation is marred by crazy conceptualizing and deeply suspect analytics.”

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

22 Comments

2012-01-23 12:25 pm
kokara4a
From linked article:
The “Power consumption” graphs show Oracle’s maximum power consumption for a system with 10x Xeon E7-8870s, 168 dedicated SQL processors, …
WTF is a “dedicated SQL processor”? Google doesn’t show anything.

2012-01-23 1:44 pm
galvanash
WTF is a “dedicated SQL processor”? Google doesn’t show anything.
My guess would be that should have been “processes” instead of processors… A 10 CPU E7-8870 would have 100 cores and would be capable of running 200 simultaneous threads. 168 sounds like it might be a a good target for the maximum process count for such a system (just a guess but it sounds plausible).
2012-01-23 3:54 pm
Bill Shooter of Bul Platinum Prime
A company called kickfire ( now owned by terradata) , used FPGA’s to do the sql parsing pseudo-natively. They called those SQL processors. Maybe Oracle does something similar?
2012-01-24 12:14 pm
lucas_maximus
It is where you assign specific processor(s) to an SQL instance.
Depending on what the SQL box is doing it normally isn’t needed. The Server mentioned is probably set up in a very particular way.
This is how to do it on Microsoft SQL
http://msdn.microsoft.com/en-us/library/ms187104.aspx
EDIT: after reading again … I think it should be processors as well
http://www.osnews.com/permalink?504217
Edited 2012-01-24 12:24 UTC

2012-01-23 12:27 pm
boulabiar
But, System on Chip and Microcontroller already have dram, flash, and other stuff inside the same package with the CPU !
Here you only put a better CPU.

2012-01-23 12:41 pm
kokara4a
Yeah. But this processor is on the same chip as the DRAM. So it has very wide bus that connects it to the DRAM (4096 bits). However, I guess it’s still subject to the same latencies.
It’s a bit like the Cell’s SPEs and I guess it will be as hard to program for.

2012-01-23 2:12 pm
smashIt
Yeah. But this processor is on the same chip as the DRAM. So it has very wide bus that connects it to the DRAM
sounds like the gpu from 10 years ago that never materialised

2012-01-23 4:15 pm
Fergy
Yeah. But this processor is on the same chip as the DRAM. So it has very wide bus that connects it to the DRAM
sounds like the gpu from 10 years ago that never materialised
Yeah whatever happened to those GPU’s with 12MB of eDRAM? I read Intel wants to do something similar to get their GPU up to speed.

2012-01-23 5:00 pm
Brunis
You mean Glaze 3D from BitBoys Oy ?
It’s on this top15 Vaporware list:
http://pcworld.about.net/od/technology/The-Top-15-Vaporware-Product…
🙂
2012-01-23 5:05 pm
smashIt
You mean Glaze 3D from BitBoys Oy ?
yeah, that’s the one!
2012-01-23 6:46 pm
n4cer
You mean Glaze 3D from BitBoys Oy ?
yeah, that’s the one!
There’s also the Xbox 360’s GPU, which has eDRAM on die.
This is the SoC version in the current 360.
http://www.tgdaily.com/hardware-features/51228-microsoft-details-ne…
2012-01-23 7:29 pm
n4cer
You mean Glaze 3D from BitBoys Oy ?
yeah, that’s the one!
There’s also the Xbox 360’s GPU, which has eDRAM on die.
This is the SoC version in the current 360.
http://www.tgdaily.com/hardware-features/51228-microsoft-details-ne…
I need more sleep. Clearly the eDram is on a separate die.
Edited 2012-01-23 19:30 UTC
2012-01-24 8:57 am
viton
Xenos has ROPs on edram die.
2012-01-24 1:19 pm
kamiko
ATI bought them.

2012-01-23 6:49 pm
bnolsen
Be interesting to know if this moves the bottlenecks around.
A big part of a modern CPU involves legacy instruction decode and instruction/data fetch/prefect. Removing the whole issue of board traces and DRAM interfacing/serialization etc with a legacy free instruction set hopefully might change the rules enough to make something like this viable.
On to the benchmarks!
Edited 2012-01-23 18:49 UTC

2012-01-23 8:40 pm
tylerdurden
The 80s happened 30 years ago. Oh, and is it that hard to read the article?

2012-01-24 12:50 am
bnolsen
The criticisms are there, but should that really change anything? So today its just 22k transistors. Maybe next time its a few hundred thousand. Nothing wrong with promising technology.
Today an SOC has to have external independent RAM. that requires either traces on a PCB or an external POP package. The next evolution in SOCs is to put it all on one die and have a true sigle chip solution. These guys are actively pursuing one approach and I applaud them for this.

2012-01-24 2:54 am
tylerdurden
Yeah, except that is now what they are doing. At. all.
Seriously, is it that hard to read the article?
Anyhow, this is by all means not a new idea. And it has always failed, because the programming models for these sort of architectures simply are not there, or have never proven practical for generalized algorithms.
The big thing is that they are doing the CPU using DRAM processes. So probably they will end up being a patent factory, just like the previous startup from these guys.

2012-01-23 9:57 pm
transputer_guy
Before I even read the article I was thinking about the Forth chips and Chuck Moore.
The last section though was pretty scary but the Futurologists like Ian Pierson make it sound pretty lame stuff.
There are DRAMs that are literally 20 or more times faster than regular DRAM, so they can start full almost random accesses every 2.5ns, not the usual 60ns of todays commodity chips.
With Micron RLDRAM, you can sustain certain types of compute processing at up to 400M I/Os per sec. It is based on 8 concurrent banks of 8cycle 20ns latency DRAM blocks sharing a split I/O bus structure in a 1Gbit DRAM process. It has full address and data I/O lines on dedicated pins like an SRAM with modern DDR pin speeds. The networking industry uses them, in fact Atiq Raza the Nextgen/AMD architect used these RLDRAMs in a custom network processor at RMI now NetLogic.
The question is can you build a useful general purpose computer that can get 5 operations or so for each memory cycle at 2000M ops/sec.
You can only do this on highly threaded designs, and you have to pay for the effect of making the 8 banks look like one address space.
In practice with an FPGA you have to use the slower version at 300MHz, and the penalty for the single address space is about 1/3 of memory bandwidth is lost. So you are left with about 1000M ops/sec and it takes around 20-40 odd threads that will need some communication between them and other nodes. Such a processor can be built in FPGAs like Virtex series and you can effectively get 40 simple 25Mips cores per node. The 40 threads actually are spread on 10 or so 4 way cores.
Is that useful to anyone, probably not to usual punters, but I wouldn’t mind having one. The big advantage is that every memory cycle has no effective Memory Wall, you get a big Thread Wall instead. If you can deal with that then you can also expand the system up many times, more Thread Wall though.
If you could implement the RLDRAM and processor on the same chip, then the clock rate can go up a few times, and the whole node replicated as DRAM capacity allows. The processor can then get decent FPU as well.
In my MVC analysis of graphics apps I have written, I know that the more complex Control part needs very few cycles and can happily run with a few MIPs, the Model part usually needs cycles. The View part can usually be partitioned quite nicely over dozens of small tiles, it is a question of organizing the graphics into parallel pipelined structures.
Since we already have a Thread wall with typically 4 x86 processors, might as well go full hog. I have 2 Intel Quad PCs and 99.xx% of the time those spares are never used.
Perhaps Venray is thinking along the same lines, dunno.
2012-01-24 2:03 am
Soulbender
This is awesome. Really.
* Take someone’s venture capital
* Make some prototypes that arent even close to what you want to sell.
* Creae some fictional performance numbers and make some completely insane predictions
= PROFIT!
Or, more likely, go bust and be forgotten in a year or so.
2012-01-24 10:19 am
torbenm
I saw the idea more than a decade ago (1996), where someone proposed a Sparc processor in DRAM technology. It came out as fairly promising, but AFAIK nothing was ever built. See http://dl.acm.org/citation.cfm?id=232984
Whether the case has gotten better or worse in the meantime, I can’t say.
Also, ARM2 used only 24K transistors, not the 30K claimed in the article, so the TOMI is about the same size.
2012-01-25 11:49 pm
rhfish
Nice discussion of the architecture.
A little easier to read explanation can be found in EDN.
http://www.edn.com/article/520059-The_future_of_computers_Part_1_Mu…
http://www.edn.com/article/520499-Future_of_computers_Part_2_The_Po…
Best Regards,
Russell