Hardware Hackers Create a Modular Motherboard

Thom Holwerda 2009-08-21 Hardware 19 Comments

I’ve often wondered why computers – be it laptops or desktop – are so relatively monolithic. Wouldn’t it make much more sense to have a whole cluster of very tiny individual computers, all with their own tiny processor, RAM, data storage, and serial ports, which power up when needed and are easily replaced when broken? Well, Liquidware thought so too, and came up with the Illuminato X Machina.

The Illuminato X Machina is a tiny 2×2″ square with a 72Mhz ARM processor, an 128kb EEPROM chip for data storage, a 16kb SSD, LEDs used for output, and each of the four sides has a port to which another Illuminato X Machina can be plugged into. The tiny machines are smart enough to know if they’re connected to one another, and can establish the correct power and signal wires all by itself.

The X Machina has software-controlled switches to gate the power moving through the system on the fly and a â€˜jumping gene’ ability, which means executable code can flow directly from one module to another without always involving a PC-based program downloader.
Each Illuminato X Machina node also has a custom boot loader software that allows it to be programmed and reprogrammed by its neighbors, even as the overall system continues to run, explains Huynh. The X Machina creators hope to tie into the ardent Arduino community. Many simple Arduino sketches will run on the X Machina with no source code changes, they say.

You can see the automated reprogramming in action here – it’s amazing.

It’s of course in a relatively early stage, but it’s cool nonetheless.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

19 Comments

2009-08-22 12:26 am
BlueofRainbow
At a first glance, this looks pretty much like the Inmos Transputer Concept:
#1 – inexpensive, low power processing unit with a RISC model
#2 – four-way interconnection pathway from/to every unit
#3 – self-propagating code/OS through the array of processors
#4 – highly scalable design (from 1 to 4, 8, 32, ???? units in the array)
It might be worth looking through our old bins of documentation to refresh our general knowledge about programming techniques involved with Transputer. There was even an operating system designed for the Transputer – Occam if I remember correctly!

2009-08-22 11:44 am
Vanders
At a first glance, this looks pretty much like the Inmos Transputer Concept
Yeah, it does look familiar. The complete system sounds very similar to the Meiko Compute Surface machines.

2009-08-22 4:49 am
Tuishimi
They would have to have a truly distributed OS, or at least come up with some standard method of micro-OSes talking to one another, sharing data. It would be interesting to write a living, fluid OS that spread bits and pieces of itself to other units, and receives bits and pieces (I am thinking data being passed with some sort of descriptor of what has been done to it, what needs to be done to it, allowing a CPU to decide whether or not it should work on a given piece or pass it along to another unit… etc.)
2009-08-22 6:13 am
Bill Shooter of Bul Platinum Prime
No benchmarking has been done. No workable OS. No real theoretical mathematical framework explaining why it would be better.
Don’t get me wrong its still cool as a hobby, but for heavens sake let the thing boot, before you claim its the second coming of sliced bread.
2009-08-22 6:43 am
jscipione
You have to be blind to not see the potential of this. Just imagine if these devices were smaller and more powerful, Forget dual-core or quad-core CPUs, this is desktop cluster

2009-08-22 6:51 am
calica
You have to be blind to not see the potential of this. Just imagine if these devices were smaller and more powerful, Forget dual-core or quad-core CPUs, this is desktop cluster
It just goes against decades of integration. It will never compete on a cost or bandwidth basis. On die interconnect is faster than on board interconnect which is faster than off board interconnect. More dies, more boards, more connectors, more cost.

2009-08-22 6:59 am
Mereo
You have to be blind to not see the potential of this. Just imagine if these devices were smaller and more powerful, Forget dual-core or quad-core CPUs, this is desktop cluster
It just goes against decades of integration. It will never compete on a cost or bandwidth basis. On die interconnect is faster than on board interconnect which is faster than off board interconnect. More dies, more boards, more connectors, more cost.
It reminds me of the Eniac Bulbs that you needed to replaced all the time…

2009-08-23 7:43 am
sbergman27
You have to be blind to not see the potential of this. Just imagine if these devices were smaller and more powerful, Forget dual-core or quad-core CPUs, this is desktop cluster
And clustering is a clunky band-aid “solution” for when you can’t afford SSI. I’ll keep my quad, thank you very much.
Edited 2009-08-23 07:44 UTC

2009-08-23 10:40 pm
reflect
And clustering is a clunky band-aid “solution” for when you can’t afford SSI. I’ll keep my quad, thank you very much.
I think this is a very arrogant statement, considering a large amount of the problems we currently do calculations on, are “embarrassingly parallel”. And by large, I mean well over 75%.
Now, I do recognize that for some things, a uni-processor would be best, but there might be a hybrid solution here. Say, a quad-CPU and then a hundred of these slow ones behind it, handling the bulk of.. well, anything.
Besides, there are OS:es out there that are meshing hardware systems into SSI. Or does SSI only mean SSI if it’s done in hardware for you?

2009-08-22 8:17 am
FealDorf
Achieving this idea is really awesome, but I don’t think this has a place in desktop computing.
People would see this on the desktop if:
1. It’s easier i.e., cheaper for manufacturers to design
2. It’s easier for programmers to code on
So far, I don’t think that’s possible yet.
Instead, I feel that this might lead to new developments in the concept of distributed super-computing. Probably something like “dedicated distributed super-computing”
2009-08-22 9:27 am
drstorm
I already imagine tens of iPods connected together in the same way, calculating the date when the World is going to end.
Seriously though, it looks very cool. It’s a toy at this point, but who knows.
BTW, it reminds me of Problem #13 on this page:
http://technologizer.com/2009/06/14/fifteen-classic-pc-design-mista…
I mean, that could be the pitfall if they tried to make it work on the desktop in a “user friendly” manner.
Edited 2009-08-22 09:34 UTC
2009-08-22 11:15 am
rom508
It doesn’t look to be that different from a regular cluster of computers, just on a smaller scale. Distributed computing is not new and before you start implementing hardware, you need to figure out what software tools/techniques are going to be suitable. At the moment programmers struggle to make full use of current multicore processors. Programming languages and operating systems need to be redesigned from bottom up before you can effectively redistribute your processing over 1000s of cores/processors.
I do feel that starting with tiny distributed processing cells is the wrong way to go. The cost is higher and the overhead of communication and synchronisation will negate any gains of distributed processing. You only go distributed for large problems, that can be broken down into smaller parts.

2009-08-22 1:38 pm
glarepate
Your analysis is correct within the framework you describe in terms of having a large problem size and applying computing power to that problem.
Alternatively this seems like an ideal platform to run something like Plan 9 where rather than having parts of the problem be distributed the actual functions of the computing environment are distributed, i.e. terminal server, compute server, storage/file server. From that perspective it seems like it might be very functional even if it wasn’t the fastest computing platform from a design standpoint.

2009-08-22 8:56 pm
transputer_guy
Well I ought to say something since it is pretty familiar territory given my nic!
This project is clearly very similar to an array of TRAMs each populated with 1 T800 Transputer and some memory and I/O. The Transputer though made this easy, all the I/O channel interconnect was built in as was the DRAM controller. Here it has had to be reinvented. If the 1990 T9000 had worked properly, it should have performed at a similar level to this 70MHz ARM. Given another 20 years of development, a modern 2009 Transputer core at probably a GHz or more and we would have had some pretty interesting concurrent systems by now, but that was not to be.
The 4 edge connectors and the mechanical stability of a large plane of modules is going to be fragile, but for a half dozen modules should be okay for educational uses.
The choice of the ARM is interesting, it was designed about the same time as the Transputer but it went the embedded route where only 1 processor was usually needed. If the 70MHz ARM was replaced by a much faster clocked processor (ARM, Atom, MIPs etc), the PCB engineering with signals of several hundred MHz would be much harder. I didn’t see any mention of DRAM on board?
Every module should include an FPGA for all it’s interconnect duties, which can be programmed on the fly to handle most common I/O tasks. FPGAs esp the more expensive ones have very high performance Serdes links on them that are near SATA speeds, at least a few GBps per chip, but the engineering can be interesting. They also include enough LUT fabric to build a pretty decent processor along with custom hardware to suit. This suggests putting some of the end user application into ‘soft’ hardware instead of software if one can master that skill.
On a performance comparison with the Core Duo, it is pretty obvious a 70MHz ARM is a lame duck, not even dozens will compare and a large array will be mechanically unstable. However the ARM once powered real workstations, so that says something. A few ARMs should be enough to power a nice light BeOS like OS, no real need for glassy windows and other optical bling. One module could handle most everything except graphics, the rest could divide up the screen area although integrating the tiles to the video path would be interesting. Perhaps a GPU should sit on one of the modules.
OCCAM research still continues out of U.Kent and other places and the successors runs on modern processors, so it should be available for this board. OCCAM is not unlike a hardware description language where communicating processes model hardware blocks.
I stopped working on my Transputer design largely due to lack of interest in this sort of platform, perhaps I was wrong. In order to get some real performance it was necessary to go for a highly threaded/interleaved processor and memory design giving something like 40x 25mips per core with no effective memory latency for any of the 40 threads. A low end FPGA would have sufficed for the processor core but the memory interface requires 300-500MHz signals and therefore the faster FPGAs, something that was beyond my lab resources. Still the Transputer was all about fine grain concurrency, 1000 ops and switch to next task so a modern version would be even more threaded to hide memory latencies. The delivery would have been quite similar to this ARM board, more like a credit card sized TRAM holding 1 FPGA (transputer inside), a bank of RLDRAM chips and space for some other parts. One module would use spare FPGA resources to hold VGA/DVI interface, but graphics software would be distributed amongst the modules.
Still, I am happy to see anyone doing something similar with hardware again!
2009-08-22 10:37 pm
transputer_guy
Since it is clear I’d like the TRAM farm I mentioned before but probably won’t ever get to see, I have long thought an acceptable improvement in motherboards is to redesign it completely differently while ending up with something that really works the same.
I propose to redesign the motherboard as 2 separate boards connected with one wide PCIx N connector, and further lay them back to back. Further the ATX PSU is replaced solely by a much simpler fanless 12V/5V PSU or external brick if less power needed.
About 95% of the heat and power in a PC mobo comes from the processor chip, the VRMs, the DRAMs, the GPU and the ATX PSU losses. Mostly this part of the board is completely OS agnostic, the cpu is either Intel or AMD, with 1-n cores and optional gpu is AMD-ATI, nVidia or Intel. It has so much optional DRAM on it, plus temp sensors etc. This module can be replaced as and when a bump in performance is needed and can be resold. It is also essentially dustless since no airflow is needed to remove heat!
All the hot chips are mounted directly or socketed on the PCB back side and thermally coupled to a common large heat spreading plate with heat pipes. The heat dissipation can be managed in one single shot to this heat plate cooled with 1 fan or not and mounted directly to or through the case siding. The heat plate (Cu most likely) is then coupled to a chosen heat dissipating system, which might be the case side, a large extruded Aluminum heat sink or to something more exotic if the heat is large enough. The density of the heat has already been lowered and spread evenly over the whole area of the processor board.
Since the top side of the board can be nearly flush, the remainder of the system can be on a parallel board connected by a single PCIx n channel slot. The DIMs would also be topside with a heap pipe connecting DIMMs to the hot plate.
The remainder of the system uses very little power but contains a myriad permutations of possible connector features that make OS drivers so much more fun. They would hold all the various ports, USB, FireWire, SATA, Audio, TV, you name it. Power (12V/5V) comes through the bus, and no heat management is likely needed at all. You replace it when you want new ports.
Since the heat is managed directly at the source, the 2 boards cam be much smaller and denser than usual. The interface board needs to be large enough to hold all the I/O ports plus optional PCI connectors. The processor board likely needs to be about 4″ per side to handle 100W or more.
Since the ATX PSU is replaced by a much simpler 12V/5V box, any heat production is also handled by a heat pipe connecting to the heat spreading plate.
If Intel is reading this, I’m available!
2009-08-23 10:29 pm
reflect
This is a proof of concept. Now imagine that the CPU used would be more powerful, and the onboard eeprom would hold more data..
Now imagine the thing would boot an OS capable of stitching many machines together into looking like one. Afaik, there’s not that many OSes out there that can do this, except VMS. However – DragonFly BSD has that on its roadmap. If I understand things correctly, this is the thing they’re currently working on.
So, we have an OS that wants to be able to expand its resources once hardware is added.. and remove resources once things are removed.. and its goal is to show you a single front out of many, many computers.
When I saw this modular approach, I immediately thought dfly bsd. I certainly hope that these two find eachother, they seem like a perfect fit.
To visit DragonFly BSD’s page, go to http://www.dragonflybsd.org
2009-08-24 9:10 pm
Bounty
SETI @ Home
CUDA & Tesla
FPGA
Larrabee
Hex-core + Hyperthreading
Not sure where I see this fitting in. Maybe once CPU’s stop getting faster every year this can be an upgrade path.
2009-08-25 12:00 pm
marcp
“I am the Sartre Of Borg. Existence is futile!”
what a great piece of work! I am really impressed. It also reminds me some StarTrekish stuff – Botgs to be precise.

2009-08-26 1:39 pm
Bjorg
Me too!!