Codezero microkernel developers announced support for the quad core Cortex-A9 processors in their recently released v0.3 kernel. Codezero is an open source L4 microkernel variant written in C that evolves the L4 API for security and virtualization purposes. With the recent announcement Codezero team is probing the possibility of having their microkernel as an open source option for enabling virtual rooms of execution on high-end, multi-core mobile platforms. Cortex-A9 is the latest flagship product of ARM plc UK. With its unbeatable performance to power ratio, it is known as the biggest rival of Intel Atom line of cpus on the mobile cpu arena.
Up until now, all ARM processors have been in-order designs. An in-order CPU is much smaller and power efficient than an out-of-order architecture because it doesn’t require all this extra hardware to track out-standing instructions. But of course, OOO has the advantage of allowing work to continue to get done while some instructions are taking time to execute.
To meet the thermal design power requirements in Atom, Intel decided to go with an In-order design. It’s like going back to an original Pentium (not Pro), except with SSE instructions and virtualization. The thorn in Intel’s side is the instruction translation front-end. It’s like instruction decode on steroids, because it translates x86 CISC instructions into RISC-like internal opcodes. With a CPU as large as the Core i7, this isn’t a big deal. But with the Atom, it takes up a rather sizable portion of the total area, and thus a similar proportion of power.
Ironically, ARM has gone the other way. The Cortex A9 is their first OOO design. Of course, one advantage is the lack of a translation front-end; instructions are RISC already, and the ARM instruction set is really simple to execute. The other thing they do is VERY aggressive power-gating. Functional units that are not being used are actually powered down. This is important because at 45nm and smaller, leakage power (static power just from the circuit being on but not doing anything) is very high. ARM claims that going OOO is an advantage because it “keeps more functional units busier.”
And this brings up a difference in power management philosophy of ARM compared to others. Most manufacturers optimize to “leak less” by finding ways to directly limit leakage, like power gating (which Core i7 and recent Itaniums do). ARM, on the other hand has always “leaked less” by “getting everything done faster and then switching off completely.” Now they’re combining philsophies.
ARM’s always doing some really interesting cutting edge stuff that the likes of Intel seem to ignore. For instance, normally, you have a “voltagae safety margin”, where you run the voltage higher than you need to so that the transistors are faster than they need to be for the clock rate so that when you get inevitable but rare dips in supply voltage, you don’t get timing-related errors. Just for fun, go read about “razor flipflops.” They allow you to detect and correct timing errors, and ARM does this kind of stuff that otherwise just sits around in the academic literature.
Thanks for taking the time to post that. It was a good read
I also had a good time with that paper on Razer flipflop design (1st google link) with the Alpha prototype from 2003. It strikes me this is the sort of thing one would do when you have already tried/studied self timed logic (see Amulet another ARM S.Furber project) and didn’t want to go that way since self determined voltage by error correction rate is so much closer to standard synchronous design.
If I were to use this Razer technique myself I think I might just use localized charge pumps to nudge up the voltage on critical node blocks and use the error correction to pump further as need be. I am pretty uncomfortable with fiddling with the entire power plane as the VRM response is too long, but slow fiddling of local critical path supplies would seem safer too me. One can use static timing analysis to design this bumpy power supply so that most of the logic runs with a supply just high enough. I don’t do circuit design anymore but interesting to read about and come up with other ideas. Not sure that many project managers would want to go along for the ride though!
Also Intel has changed the circuit design strategy with Core i7 with 8t ram cells and also has extensive power gating ability. I wonder if those techniques will end up in use in the Atom too.
+1 for post
The leakage differences between an ARM core and a i7 have nothing to do about “philosophy” (always a big red flag when one tries to pass qualitative concepts to justify quantitative assumptions) but rather the different power densities and feature sizes, of which leakage power is directly dependent.
Were an ARM core scaled up to meet the same performance target as a core i7 part, chances are they would display a similar power/thermal/leakage envelope as the intel part.
Also ARM designs are actually more conservative than their Intel counterparts, since they do not control the fab and they don’t provide “hard” cores for the most part. They simply are targeted for different application envelopes. ARM have an advantage that they do not need to support the same amount of compatibility and cruft as intel does with their Atom parts.
But GPLv3 with copyright assignment required for contributions. Your mileage may vary, but this is closed rather than open for me.
What’s wrong with GPLv3 ?
EDIT : Dammit, I’m so used to see the first edit on a website being used in order to enter username, and now I can’t change it ^^’
Edited 2010-03-26 07:44 UTC
For a project, ARM has kindly allowed me remote access to a prototype quad-core A9 cpu to test and profile NEON code. While I cannot disclose any benchmarks right now, I can definitely say that this cpu will take the world by storm. As a SIMD design, NEON is better than both SSE and AltiVec, it’s excellent, versatile and it’s FAST. Plus, ARM is backing it very strongly -much more than IBM/Freescale backed up AltiVec. Aside from NEON, a quad-core A9 has an extremely powerful VFP unit -forget A8 and previous slow ARM FP performance- and it consumes very very little power.
What can I say, I’d definitely buy such a laptop or even a server based on that chip. I honestly think that no matter what Intel do, they have to support so much cruft, that they can’t beat ARM on a power/performance ratio and soon, they’ll have a competitor pure-performance-wise.