CPUs in Apple Silicon chips are different, as they contain two different core types, one designed for high performance (Performance, P or Firestorm cores), the other for energy efficiency (Efficiency, E or Icestorm cores). For these to work well, threads need to be allocated by core type, a task which can be left to apps and processes, as it is in Asahi Linux, or managed by the operating system, as it is in macOS. This article explains how macOS manages core allocation in all Apple’s M1 series chips, in what it terms asymmetric multiprocessing (AMP, although others prefer to call this heterogeneous computing).
This design has now also made its way to x86 with Intel’s 12th Gen processors.
Though i understand there’s no-ones actively claiming Apple designed the asymmetric core concept, it’s worth reiterating that ARM CPUs with “Performance” and “Efficiency” cores on one die is not a new thing. Far from it. As such, the technology required to control these cores has been in the Linux kernel for a long time. Far longer than M1 Macs have been available for purchase.
https://en.wikipedia.org/wiki/ARM_big.LITTLE
Indeed, I own a 2012 Nexus 7 with a Tegra 3 chip.
https://en.wikipedia.org/wiki/Tegra#Tegra_3
https://www.notebookcheck.net/NVIDIA-Tegra-3-SoC.72804.0.html
The Tegra3 was not a big. LITTLE design though.
It has the “spirit”, the “companion” core (power efficient transistors, 500MHz) doing lightweight tasks when the 4 other cores can scale from 1 to 4 in parallel to do the heavy lifting.
True. The tegra, if anything, was a slightly more “elegant” solution since it was a homogeneous model.
The123king,
I was using these in SBC boards a decade ago. In my case I didn’t really need the efficiency cores. They were there but I needed horsepower more. The whole idea is obviously that instead of having a CPU optimized only for performance or efficiency, you can have cores optimized for each and then use the most appropriate core type for a task. IMHO this is less useful for high performance SMP use cases and more useful for a device that alternates between demanding foreground loads and light background loads. Often times mobile devices fit this profile.
What’s actually shocking in that is not Apple using what it is basically big.LITTLE that existed for a decade by now on ARM. It’s even expected and very logical.
It’s that both AMD and Intel waited to implement a variation of a decade old tech with OBVIOUS advantages for portables (like laptops, where Intel is big) until Apple done so. Stagnation too much, isn’t?
CapEnt,
You are right, this is why competition is so important. Even if you’re a fan of one company’s technology, you should welcome the competition. The lack of competition makes things stagnant. Obviously we’re talking about CPUs here but this really applies to all markets. The trend towards consolidation we’ve been seeing all around us for decades is bad.
The CPU alone means nothing if the software isn’t able to make use of it. Apple is in control of both worlds so I am not surprised that they were the first to make it actually work in the real world and only on paper. Microsoft was too slow to deliver the software in the mobile world (with Nokia) and Intel and AMD have been unable to produce the hardware for the desktop world.
Nobody really believed in ARM on the desktop until Apple announced that they will make it work.
sj87,
(my emphasis)
Lots of companies like Fujitsu and LG have been incorporating big.LITTLE in their devices since long ago.
https://www.linaro.org/news/fujitsu-semiconductor-joins-linaro/
Even Apple themselves have been using heterogenious ARM CPUs at least as far back as the A11…
https://en.wikipedia.org/wiki/Apple_A11
I think the main reason intel is only introducing it now is because prior to the M1, ARM CPUs weren’t very competitive to x86 markets before. This is why competition is so important!
Actually there were many of us who did believe in ARM on the desktop. I think we even talked about it here on osnews, but the products honestly weren’t that compelling (mostly low end chromebooks). Before apple, manufacturers weren’t pushing high end ARM systems. I’m glad that the M1 finally breaks that mold. The problem with apple products (for me anyways) is that they refuse to sell modular hardware to consumers. I’d still like to see a manufacturer sell a powerful modular ARM desktop with PCI lanes, egpus, DDR, m.2 flash, etc that I can upgrade and repair on my own terms without the manufacturer insisting on selling a whole new system each time…ugh!
Of course I’m aware of mobile phone CPUs and how they work. My key point was that nobody in power seemed to believe in the same technology being able to scale up for “serious” use on the desktop. And sure, the ARM chips prior to Apple were actually really slow in comparison.
That’s why there was this constant debate about whether or not ARM is suitable for desktop due to its “constricted” performance capabilities. The minimum requirement was thought to be at the very least carefully optimized software as the ARM architecture is quite different from the x86.
Apple pretty much proved all of this to be nonsense by beating the x86 CPUs sometimes even when running software in compatibility / emulated mode.
sj87,
It wasn’t a case of ARM being unsuitable per say, it was more a case of more resources getting thrown at x86 and having better fabs.
Of course, but compilers have been optimizing for specific architectures (and even micro-architectures) for decades now, it isn’t really new.
The thing is, even in the past, many of us did predict that ARM CPUs would improve to point of being able to compete with x86 even for high end applications. We didn’t know apple would be the one to do it, but the fact that ARM became competitive on performance in general wasn’t very unexpected.
@sj87
I think the only people surprised about Apple’s performance are the people still obsessed with the ISA wars of the 80s/90s who still haven’t gotten the memo that ISA and uarch have been decoupled for decades now.
What makes the M1 “special” is not the ISA but it’s uArch and of all things it’s disruptive 2.5D power delivery network. Apple was able to make a huge fat core that they run at the optimal speed for the process. And they were able to give it tons of out-of-order resources to keep all the FUs busy. They also have an “extra” die with the PDN under the “main” SoC die. Which is something that neither Intel nor AMD can do right now.
It’s fascinating how Apple has a huge fat core that is extremely efficient. Whereas AMD and Intel have these smaller cores that they have to run at high frequencies burning much more power in order to retire the same number of instructions. I said it is fascinating, because technically Apple is using the “mobile” core vs. AMD/Intel desktop/server core.
The M1 could be running any random ISA: x86, PPC, RISC-V, or ARM… and they would still slaughter the x86 competition in IPC.
As you pointed out; Apple controlling the whole HW/SW stack made the product possible and successful. It’s been a rather impressive transition from x86, specially in the laptop space. It’s also funny how so many people just can’t wrap their heads about the shift that is happening with the SoC taking over the discrete PC architecture.
It’s not a fair comparison thought. Apple and Intel are making the technology work on the same market segments (laptop/desktop) right around the same time.
The article does point out the weakness of heterogeneous computing in that processes may be assigned to the wrong core. Waiting longer for an archive to be processed because the task is assigned to slow cores even when the fast cores are not being used is less than ideal.
Yeah, I think that’s lost on a lot of people. Apple’s QOS system seems especially prone to that. Maybe the algorithm they use is really good. But I think I could write a better one for my specific application on known hardware, at least for now.
But Apple software designers can design once and just have it work on everything reasonably well. Thats probably a better choice than having each individual developer try to figure everything out in a flexible manner that will work as well with the dev’s machine as the cheapest slowest version, and the one that will come out two years from now. So hats off to Apple for making the better choice for their customers.
There should be some QoS in the scheduler that should trigger a migration to the high performance core for a task that goes above a specific timing threshold in the efficiency core, I assume.