Apple’s Cyclone microarchitecture detailed

Thom Holwerda 2014-03-31 Apple 29 Comments

I suspect Apple has more tricks up its sleeve than that however. Swift and Cyclone were two tocks in a row by Intel’s definition, a third in 3 years would be unusual but not impossible (Intel sort of committed to doing the same with Saltwell/Silvermont/Airmont in 2012 – 2014).
Looking at Cyclone makes one thing very clear: the rest of the players in the ultra mobile CPU space didn’t aim high enough. I wonder what happens next round.

This is one area where Apple really took everyone by surprise recently. When people talk about Apple losing its taste for disruption, they usually disregard the things they do not understand – such as hardcore processor design.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

29 Comments

2014-03-31 11:00 pm
hallux
Isn’t it obvious? The next round will finally bring us:
iOS-X!
Code name will be iUnion or perhaps Singularity, or some such thing. Maybe they’ll call it oSz, eschewing the obviously problematic “y” (why) in a product name.
Personally I’d love to see a MacBook with a touch screen and one-touch authentication and access to Siri a’la the 5S, but with a real keyboard in a clamshell form factor… but I won’t hold my breath.
2014-04-01 2:07 am
WorknMan
This is one area where Apple really took everyone by surprise recently. When people talk about Apple losing its taste for disruption, they usually disregard the things they do not understand – such as hardcore processor design.
Unless they can make it shiny or paint it gold, most of their users are not likely to care

2014-04-01 2:20 am
themwagency
The sad thing is that you probably actually think that.

2014-04-01 3:00 am
Drumhellar
There’s truth in it. The fact that Apple didn’t increase the amount of RAM is actually a problem – people are experiencing app crashes at a significantly increased rate, and this is almost exclusively due to out of memory errors
But, hey, 64-bit is a heckuva selling point if you don’t look beyond it.

2014-04-01 5:27 am
shotsman
All this negativeness towards the 64bit processor reminds me of the same sort of thing happening 22 years ago when DEC came out with the Alpha series of CPU’s. Sure there were 64bit CPU available but some of the competitors took to derriding 64bits while they frantically cobbled together their own 64bit designs.
Nowadays pretty well every X86 CPU sold into the consumer market is 64bit.
For those who keep repeating their mantra ‘Apple don’t innovate any longer’, really should take the blinkers off. In my mind, the A7 is a big step forward. It answers many of the questions asked about 64bit cpu’s in a mobile device. It isn’t perfect but nothing is really.

2014-04-01 7:19 am
Carewolf
Well, it 64-bit does come at a cost of more memory consumption. Unless it is packaged together with instruction set improvements like x64 and AArch64 is it will always be slower than 32-bit. So the only reason iOS has ANY benifit from 64-bit is not due to the 64-bit, but despite it.

2014-04-01 11:48 am
renox
Note that even though the transition from 32bit to 64bit address space was quite boring in x86 and ARM, it doesn’t *have* to be this way: in the Mill project they use the 64-bit address space to have unified address space (but still with memory protection) which allow them to have to more efficient memory subsystem ( http://millcomputing.com/docs/memory/ ) and yes not be restricted to a tiny TLB is very important (think about the advantages for multiple processes).
I think that it would be great also if CPUs would use 64-bit pointer to allow you to have efficiently a tag field in the higher bits (which has benefits over using the lower bits as it is compatible with packed data(unaligned)) for tagging pointers (efficient GCs), tagging integers (efficient big-ints implementations)..
2014-04-01 2:43 pm
galvanash
Well, it 64-bit does come at a cost of more memory consumption. Unless it is packaged together with instruction set improvements like x64 and AArch64 is it will always be slower than 32-bit. So the only reason iOS has ANY benifit from 64-bit is not due to the 64-bit, but despite it.
The A7 is AArch64…

2014-04-01 12:51 pm
pica
OK, the MIPS R4000 (64bit single chip uP) based SGI Indigo was public available several month before I had to sign a NDA to join a 3 day workshop on DEC Alpha internals. That workshop was held November 1991.
pica
2014-04-01 4:50 pm
tylerdurden
Alpha is perhaps not a very good example, since it got little traction in the marketplace (overall) and its associated development costs are one of the principal issues eventually leading to DEC’s demise.
Luckily for Apple they mainly “innovate” with regards to marketing. Which in the end is the dept responsible for bringing home the bacon…

2014-04-02 9:21 am
pica
Luckily for Apple they mainly “innovate” with regards to marketing. Which in the end is the dept responsible for bringing home the bacon…
Yes, it was a marketing desaster.
* the VAX arch was EOL
* the Alpha was not ready
* MIPS R3000 was marketed by Digital as VAX successor
Yes, MIPS R3000 was marketed by Digital as VAX successor and less than a year later Digital presented Alpha based systems. That was to much change for it’s customers. All trust credit has been burned.
It was not cost, it was not the switch from the most complex CISC to the most risky RISC, it was simply to much change.
pica

2014-04-02 4:23 pm
tylerdurden
Actually it was costs that sank DEC. Right before their acquisition, for each $ of revenue DEC required over 300% overhead than Compaq.
Even after Compaq took over, each generation of Alpha was getting more and more expensive to design and manufacture. While its marketshare never grew fast enough to keep up with the rising production costs.
I don’t think many people are acquainted with the economic realities of semiconductor/processor design. The tech sector is a business at the end of the say.

2014-04-02 5:03 pm
Alfman verbose=1
tylerdurden,
I don’t think many people are acquainted with the economic realities of semiconductor/processor design. The tech sector is a business at the end of the say.
Very true. A “good enough” cheap commodity architecture will usually win out over an even better architecture lacking scales of economy. That’s x86 in a nutshell, not the best processor design, but good enough.
If it weren’t for the performance arms race the x86 processors found themselves in on the desktop side, I suspect x86 would have been “good enough” for mobile platforms too. However the notorious power inefficiencies opened up a large window for ARM to take hold of the mobile market, which to me is a good thing. I’d even like to see some desktop computers running on ARM processors, but that’s a whole other barrier due to most commercial software being tethered to “wintel”.
Edited 2014-04-02 17:04 UTC
2014-04-06 8:45 pm
zima
It’s ~ironic, x86 started as embedded CPUs, ARM as a desktop CPU, and they kinda switched their main roles along the way…
2014-04-02 7:00 pm
pica
Actually it was costs that sank DEC. Right before their acquisition, for each $ of revenue DEC required over 300% overhead than Compaq.
That is completely right, but a secondary effect. With the massive customer base loss DEC suffered, also their revenue declined massively.
pica

2014-04-01 8:24 am
Drumhellar
I wonder if the move to 64-bit was more about the wider design than other design considerations. ARMv7 already had 16 general purpose registers. While ARMv8 has double that amount, 16 is already a plenty, and all else being equal, the extra 15 registers would have a minimal impact on performance (Unlike the move from i686 to AMD64, which quadrupled the GPR count from 4 to 16. i686 was horribly starved for registers). The iPhone 5s doesn’t add any memory, and 32-bit integer math is rarely a limitation for the type of stuff run on a phone.
However, all else is not equal: The A7 can issue about twice as many instructions as the A6 – the extra registers would be a boon for enabling the extra ILP, and that seems to be where all the A7’s performance enhancements come from.

2014-04-01 8:32 am
mutantsushi
I always wondered why a 32 bit variant of x64 wasn’t released with the same register advances, but without 64 bit capability. Perhaps that was benevolent forward thinking in case of x86, although it’s hard to see a usecase for that in mobile. Ultimately I don’t think that aspect really matters one way or another.
It’s kind of sad that 64bit is seen as an Apple innovation though, that’s just ARM’s latest standard design. The innovation in the CPU has to deal with other details besides that aspect.
Edited 2014-04-01 08:43 UTC

2014-04-01 12:21 pm
viton
that’s just ARM’s latest standard design
ARM achitecture is a description of registers, instructions and memory model. You can download architecture manual freely from ARM site.
32bit ARM architecture was created by Roger Wilson back in the 80s.
That Apple did in A7 is totally amazing. Even X-Gene (first 64-bit ARM processor on paper) is not ambitious enough. A7 is wider than latest Intel processors and has comparable internal resources like 192-entry ROB, massive buffers, also memory bandwidth is good (The problem of most ARM application processors).
Edited 2014-04-01 12:23 UTC
2014-04-01 3:04 pm
Carewolf
That is called x32 (or x86 ILP32).

2014-04-01 5:13 pm
Drumhellar
I think he’s talking about a 32-bit processor with the other AMD64 features.
But, the lack of interest in Linux’s x32 shows why it would’ve been a waste of resources to design a chip.

2014-04-01 5:02 pm
Drumhellar
I always wondered why a 32 bit variant of x64 wasn’t released with the same register advances, but without 64 bit capability. Perhaps that was benevolent forward thinking in case of x86, although it’s hard to see a usecase for that in mobile. Ultimately I don’t think that aspect really matters one way or another.
Well, the instruction format would have to be changed to accommodate extra registers – the x86 instruction format uses 3 bits to encode either source or destination register, which isn’t to select from additional registers.
To make a 32-bit chip with the other architectural enhancements of 64-bit, you’d still have to enter a different processor mode – say, x86+, to execute software that takes advantage of the extra registers and flat x876/sse register file (x86 uses a stack), and executing older 32-bit code would still require a mode change.
IIRC, adding AMD64 capability to the Pentium 4 (Well, technically Intel-64, the purposely slightly incompatible knock-off) only increased the die space by ~5% anyways, and by the time x86 was being dropped into ultra-mobile designs, well, the 4GB address limitation was already looming close in those designs. A x86+ design would have probably only had one generation of use…

2014-04-01 2:49 pm
judgen
Could anyone explain why went with a much longer pipeline stages design? I mean standard ARMv8 Cortex designs has 8 pipeline stages and apple A7 has 14. Would this not mean that when a faulty branch clears the pipeline it takes longer to refill, causing performance decreases?

2014-04-01 5:07 pm
Drumhellar
They can mitigate this by improving the branch prediction.
Or, perhaps they felt that most of the performance-sensitive code that’s run on the A7 isn’t branch intensive, and suitable for pipelining.
2014-04-01 6:10 pm
viton
If you mean Cortex-A53, it is low-performance low-power design.
Cortex-A57 is high-performance and has 15+ stage pipeline.
2014-04-02 6:18 am
Alfman verbose=1
judgen,
Could anyone explain why went with a much longer pipeline stages design? I mean standard ARMv8 Cortex designs has 8 pipeline stages and apple A7 has 14. Would this not mean that when a faulty branch clears the pipeline it takes longer to refill, causing performance decreases?
This enters very “opinionated” territory
There’s obviously a trade off between growing the pipeline to increase parallelism, and increasing the risk and cost of branch misprediction. The article suggests this architecture increases misprediction penalty from 0-19% and says nothing about the misprediction frequency (which depends alot on the software in use). The idea is for these negatives to be offset by having additional parallelism.
I think the compiler could probably do a better job at scheduling execution units even beyond the CPU’s pipeline and with fewer mispredictions since the CPU is forced to do it on the fly. The compiler is far less constrained and should be able to do a more comprehensive analysis. The transistor savings by removing this complexity would result in less electricity or more parallel execution units depending on the way you want to look at it. Either way it’s a win! However a pretty big problem with this is the way we distribute software in practice: generically precompiled and expected to run unmodified on different versions of a CPU. It would leave very little room for future CPUs to add execution units and for existing code to take advantage of them since scheduling is specific to a CPU model. Having competing CPUs would be problematic since code would be optimized for one or another, but not both at the same time.
One way to get around this problem is to distribute all software in an intermediary form and produce code which is always be compiled exactly for the target machine’s execution units using exactly the right schedule. But for better or worse, CPUs evolved to the long pipelines we have now.

2014-04-02 3:34 pm
hackus
Translation…steal other peoples IP you mean.
http://www.patentlyapple.com/patently-apple/2014/02/apples-a7-proce…
Besides the fact Apple steals your job value (hiring and salary cartel activities) they also got REALLY desperate.
They stole the IP for some of the researchers code at UW-Madison it would appear.
2014-04-02 7:27 pm
deathshadow
It might actually drag an ARMv8 up to the per-clock speed of a Pentium II.
Now, I’m not saying a current 1ghz ARMv8 is roughly equal to a 450mhz PII. Oh wait, no, that’s exactly what I’m saying.
NOTE, I’m not saying that’s entirely bad, since ARM is about performance per watt, not per clock…
Though that begs the question, what do these types of changes do to power consumption?

2014-04-04 4:20 pm
puenktchen
Actually the A7 performs similar at 1.3 GHz to a Core 2 Duo at 2 GHz.
2014-04-07 10:34 pm
zima
http://en.wikipedia.org/wiki/Begging_the_question#Modern_usage