Big news from ARM over the past few days. The processor architecture, once strictly an embedded affair for low-power devices, is going big. Not only has ARM announced it’s going 64bit, HP has announced it’s going to build servers with ARM processors. It seems all the pieces are now in place for ARM.
The 64bit announcement is a pretty big one – as far as I know, all other somewhat relevant processor architectures have long since made the switch to 64bit, leaving only ARM as 32bit. Of course, with its main focus on embedded devices, the switch to 64bit wasn’t as high a priority as with other instruction sets. However, now that ARM wants to move beyond all that, and now that it wants to run Windows, the time to go 64bit has arrived.
ARMv8, as it’s called, obviously builds on what came before, but it also adds, for the first time, 64bit processing. ARMv8 consists of two main execution states, AArch64 and AArch32. AArch64 introduces the new 64bit instruction set, while AArch32 is the continuation of the current ARMv7 architecture.
“With our increasingly connected world, the market for 32-bit processing continues to expand and evolve creating new opportunities for 32-bit ARMv7 based processors in embedded, real-time and open application platforms,” ARM CTO Mike Muller said, “We believe the ARMv8 architecture is ideally suited to enable the ARM partnership to continue to grow in 32-bit application spaces and bring diverse, innovative and energy-efficient solutions to 64-bit processing markets.”
Whenever someone introduces a major new revision of an instruction set, I usually turn to Ars Technica’s Jon ‘Hannibal’ Stokes, currently writing for Wired (owned by the same company which owns Ars) on a more occasional basis (at least, that’s how I understand it). His conclusion is that ARM’s 64bit extensions are purely about increasing the size of addressable memory, and not about performance per se. In addition, the increase in register size will negatively affect power usage.
“ARM’s 64-bit parts will pay a price in power efficiency for the boost to memory capacity and forward compatibility,” Stokes states, “The wider register file uses more logic and power on the die, and the wider integers and addresses will increase internal and external bus traffic. At the level of a desktop or server microprocessor, the power cost of these two items will be completely negligible. But in the sub-milliwatt regime where many of ARM’s mobile chips operate, there would be no reason to pay a power penalty for an increased memory capacity that won’t be used anyway.”
Interestingly enough, it seems the talks between Microsoft and ARM about bringing Windows NT to ARM was one of the driving forces behind the move to 64bit. As such, Microsoft made an appearance in ARM’s press release. “ARM is an important partner for Microsoft,” said KD Hallman, a general manager at Microsoft, “The evolution of ARM to support a 64-bit architecture is a significant development for ARM and for the ARM ecosystem. We look forward to witnessing this technology’s potential to enhance future ARM-based solutions.”
In the meantime, HP is (as far as I can recall) the first major OEM to build ARM-based servers – if the reports by The Wall Street Journal and Bloomberg are correct. HP’s going to cooperate with chip start-up Calxeda. In other words, HP has decided to work with a relatively unknown chip maker, instead of one of the more established names. Calxeda is, however, partly owned by ARM itself, so in essence, ARM and HP are working together pretty closely on this one.
Intel is of course not happy with this, since this is basically HP and ARM directly attacking Intel’s bred and butter. “We don’t take any threats to our server business lightly, but there are number of challenges for the ARM architecture to be successful in the server market,” Intel spokesman Bill Calder told the WSJ, “We believe the best-performing platform will win.” AMD didn’t comment.
Like I said in the introduction, all the pieces are now in place for ARM. They flat-out own the mobile space pretty much uncontested, and they’re now moving to servers -I’m sure other form factors, like desktops and laptops, are next.
ARMv8 also introduces hardware AES/SHA-1/SHA-256 instructions, similar to what’s on newer Intel/AMD processors.
It seems it’ll be optional though.. so some implementations may not have it.
Some implementations of ARM SoC already have hardware crypto engines present, since they are primarily embedded processors it entirely depends on what the package was designed for in the first place.
However, when it comes to lower power non x86 server processors i wonder why MIPS hasn’t made more of an effort? There have been 64bit MIPS processors out there for years so its a tried and tested platform with existing compiler and operating systems support and where compatible hardware for development/testing is a very cheap ebay away… Whereas a 64bit extension of ARM will necessitate writing all these things, and most developers won’t even be able to get their hands on the kit for quite some time.
Just wait a few years, I suppose. Loongson is MIPS, starts getting some adoption apparently – and China is supposedly betting on it big time, to have technology independence.
Just by the virtue of their massive internal market, and how they make… pretty much everything, Loongson adoption should “spill over” to other regions, I guess?
Cavium Octeon, RMI XLR
There are the chinese Loongson 64 bit cpus for both Laptops and Desktops. However, it seems that frequency is low, they are 65nm technology and the performance is rather poor. It seems that the chinese aren’t pushing them far enough performance wise. As they plan to build the fastest computer using Loongson CPUs, I think they will accomplish this by cramming a very large number of CPUs together.
Here it is. Not a world leader in speed but it is in low energy usage.:
http://www.extremetech.com/computing/102461-east-vs-west-china-buil…
More than that, it seems ARM just makes full circle, sort of …wasn’t the development of Acorn Risc Machine primarily about a processor for Acorn Archimedes? (so a ~big, desktop machine)
Perhaps even more hilariously – wasn’t x86 born from, essentially, “embedded CPUs of the 70s”? (but I kinda doubt for full circle in this case anytime soon )
PS. The involvement (with visible & vocal early support) of HP feels a bit like a revenge opportunity for PA-RISC and the Itanium mess…
I am not informed as to what ARM motherboards use as their firmware, but I am sure someone more enlightened will be here.
Is it a BIOS, UEFI or something else?
Could it be that if Microsoft succeeds in their plan of limiting non-Windows OS:s by (ab)using some UEFI mechanisms, we could all simply switch to ARM?
I don’t think there is a mandated one given that I’ve heard rumours of hardware vendors opting for UEFI at times whilst other times using CoreBoot with various payloads (UEFI support being one of those payloads). With the development of ARM v8 it not only adds fuel to the fire regarding Microsofts relationship with Intel (whether things are as close as they once were) but whether the rumour of Apple moving to ARM CPU’s in the future holds any water given that we’re probably at least 2-3 years away from seeing an ARM v8 product appearing in a laptop/desktop/server etc. from a tier 1 vendor:
So in the case of Apple it might be 2015 that we see an A7/A8 chip appear in laptops. Maybe HP’s involvement goes beyond just servers as the article claims and might include desktops, laptops and so forth – the ability to maybe recover some margins and turn around their PC division.
ARM is completely non-standardized as far as the platform goes, although a lot of applications use U-Boot.
And by “completely non-standardized”, I mean, compare to 68000 in the 80s.
The Sun-1, 2, and 3, the Amiga, Macintosh, Sinclair QL, Atari ST, Sega Genesis, Apollo workstations, and HP 9000 workstations were all 68k-based, but (except for later Apollos and HP 9000s) none of them booted the same way, and none of them had hardware that was even close to similar, or even in similar places in the memory map.
The ARM ecosystem is similar in that respect.
Hi,
It’s not “PC BIOS” or UEFI, or OpenFirmware or anything else. It’s complete chaos. Typically the manufacturer creates/installs firmware specifically for the embedded device they happen to be making at the time.
This continues to be the single largest problem with ARM – there is no standardised firmware (and no standardised way of doing hardware auto-detection for anything that isn’t USB or PCI).
This means you can’t create one generic OS for ARM that works on all ARM systems and continues to work for future ARM systems. Instead you end up creating a special version of the OS to suit each specific ARM system (and then hoping the manufacturer installs it, because the end user can’t). It’s also why Microsoft only support about 5 of the many different ARM systems; and why you’ll never see “Windows for ARM” on a shelf in a computer shop where people can buy it and install it on whatever ARM system they happen to have.
For embedded systems it’s probably a good thing, as it reduces hardware costs and the manufacturer doesn’t want the end-user to change the software anyway. For general purpose laptop/desktop/server systems (where the manufacturer makes the hardware, and the end-user decides which OS/s to install when) it’s a show stopper.
It’s not the only problem though. For PCs (desktop/server) hardware there’s a bunch of standards relating to hardware that includes motherboard form factors, power supplies, cases, etc. This means the end user can replace/upgrade any component, and also means that a computer manufacturer can get “off the shelf” parts from 20 different companies and put them together as a complete system. It’s this “componentization” that makes PC compatible (desktop/server) cheap and flexible (and not just for the initial purchase).
Until there’s usable standards (for everything, not just firmware), ARM will never go beyond the embedded market (which includes phones and other disposable gimmicks), regardless of what the CPU/s themselves are capable of.
Ironically, ARM can use “PC compatible” standards to compete against the “PC compatible” market – ARM manufacturers could adopt UEFI, ACPI, the ATX form factor, the ATX power supplies, etc to become a standardised platform fairly quickly.
– Brendan
Brendan,
You bring up some excellent points. I don’t know the answers either.
I have a question, what would it take to place one or more arm processors on an existing PCI bus infrastructure and use existing PC components for the rest of the build?
Of course the drivers would need to be recompiled or rewritten, but hardware wise is there a reason this would be bad? Is there something about ARM processors which rules out efficient use of existing commodity hardware like ram/video/ethernet/power supplies?
As for the BIOS services, it’s true that we need a way to identify/probe the hardware in ways which are specific to the mainboard. However for the most part bios services are only used in the bootloader, after which modern operating system specific drivers take over.
It seems like it would be pretty easy for ARM manufacturers to provide a standard boot loading firmware for their hardware, which end users/operating systems would be free to use or not. Just look at how trivial the x86 bootloading is (just as an example, ideally it’d be a bit more sophisticated though).
The PCI and PCIe buses can be used with any compatible processor architecture. I’m aware of SPARC64, PowerPC designs that support PCI devices.. including ARM.
Intel’s IOP321 is an “XScale” ARM design that included a PCI-X bus, for example.
As Brendan said, if an ARM motherboard in ATX form factor was released.. they could use the same conventional hardware, and even memory.
Most open source operating systems treat the PCI bus as platform independent and have glue for PCI hosts controllers. This allows PCI device drivers to be portable, assuming the developer takes into account Endian, alignment and 32/64-bit portability issues.
So if the firmware was standardized, or at least the boot procedure.. a common “ARM PC” port for many operating systems could be quite trivial.
So here’s hoping.
Edited 2011-10-29 03:20 UTC
The x86 starts up internally by jumping to the end of addressable memory, which is a ROM address containing a jump instruction to the beginning of bios code.
For instance:
The 8086 jumped to 0xffff0.
The 286 jumped to 0xfffff0.
The 386 jumped to 0xfffffff0.
It would be pretty easy to write an OS loader like grub in this environment without any bios, obviously with direct IO for things like IDE controllers.
Another idea would be to simply compile a minimal Linux kernel, for this mainboard (since the device drivers are already implemented there), and then flash it into the CPU reset address. This kernel could have a basic UI and settings, but it’s main purpose would be to “kexec” the real OS.
This would be the ultimate solution in terms of flexibility. Much better than either BIOS/UEFI IMO if the source code is open (and it would be, since it’s GPL).
I’d be fairly confident I could personally implement it in a few weeks if it were on x86, but unfortunately my knowledge falls off sharply for ARM. I don’t see why ARM would be that different though. The biggest hurdle would be one of standardization, rather than implementation.
Linux as firmware has already been done, c.f. LinuxBIOS, now known as Coreboot
waynemccombe,
“Linux as firmware has already been done, c.f. LinuxBIOS, now known as Coreboot”
Yea, but that’s x86 where unfortunately proprietary bios solutions have already “won” over open solutions. Coreboot doesn’t support any of my mainboards (I checked in the past).
I agree that coreboot would be an excellent candidate.
but coreboot doesn’t currently support ARM, and AFAIK there aren’t any other generic/standardized bootloaders for ARM either.
http://www.coreboot.org/ARM
“coreboot on ARM is a work-in-progress. coreboot currently does not support ARM.”
This would be an excellent opportunity to establish an open firmware standard for ARM desktop systems – before proprietary implementations take over and we end up being stuck with stock firmware like in x86 world.
Hi,
Linux as firmware was attempted, and mostly failed because Linux is far too big to fit in firmware. Even if this wasn’t a problem, Linux as firmware would still be a major disaster as there’s no standardised driver interface.
Coreboot on it’s own is good for chipset initialisation, but doesn’t provide a standardised environment that other software can rely on, and is therefore useless on its own. Coreboot isn’t designed to be used like that. Coreboot is designed to start a “payload”, where the payload is typically a standardised environment that other software can rely on – a “PC BIOS” payload or a OpenFirmware payload, a GRUB payload or a UEFI payload. You wouldn’t want a “PC BIOS” payload on ARM (the “PC BIOS” standard/s were never intended to be portable). GRUB has no standardised environment other than multi-boot, which wasn’t intended to be portable either (“multi-boot 2” which is intended to be portable seems stillborn). That only leaves “Coreboot with OpenFirmware payload” and “Coreboot with UEFI payload”. In both of these cases you’d be adopting a standard (OpenFirmware or UEFI) and how it is implemented (e.g. on top of coreboot, or without coreboot) wouldn’t matter.
I don’t know much about OpenFirmware (but it looks like it’d work fine).
Microsoft would probably push for UEFI on ARM; and I’d guess device manufacturers would rather just provide one boot-time driver in “EFI byte-code” rather than one in “EFI byte-code” for 80×86 and another in “Fcode” for OpenFirmware.
I like open source, and wish some sort of open source firmware was realistic. In my experience it’s not – none of the open source projects seem capable of defining a formal standard (which is completely different to implementing something and accidentally ending up with an ad hoc “standard”).
– Brendan
Brendan,
“Linux as firmware was attempted, and mostly failed because Linux is far too big to fit in firmware.”
Maybe, but I didn’t suggest using this linux as the primary OS, just as the equivalent of the “BIOS”. It’s primary job would be to load the real OS. I think one can still build useful specialized kernels that will fit on a floppy.
“Even if this wasn’t a problem, Linux as firmware would still be a major disaster as there’s no standardised driver interface.”
I agree with you, the lack of an ABI is quite annoying. It would be very nice if the linux firmware on my motherboard could load up my ethernet card’s modular drivers. But due to linux being in constant flux, this is impossible without recompiling at a minimum. However if the goal is primarily to initialize motherboard peripherals and then transfer control to the user’s OS, then it wouldn’t necessarily be a problem.
“Coreboot on it’s own is good for chipset initialisation, but doesn’t provide a standardised environment that other software can rely on, and is therefore useless on its own. Coreboot isn’t designed to be used like that. Coreboot is designed to start a “payload”, where the payload is typically a standardised environment that other software can rely on…”
Yes, I get all this, but as I said, most modern operating systems prefer not to use BIOS services anyways, they use their own drivers. I don’t think any BIOS interface can ever encapsulate everything everyone will want to do on their systems, therefor I don’t think it should try…. Just provide enough services to load the OS (from disk/network/flash), and not much else. With x86, we kind of have to keep legacy BIOS stuff around for compatibility, but for an ARM BIOS we should really keep it basic.
“I like open source, and wish some sort of open source firmware was realistic. In my experience it’s not – none of the open source projects seem capable of defining a formal standard…”
Hey this is what I said It’s not a technical problem so much as a standardization one.
The reason I would like open source firmware isn’t just philosophical though, there are times I would have liked to flash my own tools into the BIOS for failover and remote control purposes. Today I cannot do it because the BIOS is closed/proprietary.
Hi,
Loading one file into RAM is a start, but it’s nowhere near close to being a complete/usable environment.
At a minimum you need something for initial output (e.g. generic video output, serial console, telnet, etc) so the firmware or OS can tell the user what went wrong if/when something fails before the OS’s driver/s are started. You also need a simple “pre-boot” device driver interface for network cards and storage devices (so the firmware can load the OS, wherever it happens to be) and video cards (for generic video output). You also need something the OS can use for the detection of any hardware that isn’t covered by PCI or USB standards (whether it’s an overcomplicated mess like ACPI or something simple like the “flattened device tree” that Das Uboot uses) and a way of communicating various other information (e.g. power management, memory map, etc) to the OS. Of course you also need a way for the OS’s early boot code to load more of the OS into memory – the idea of “firmware loads kernel, kernel can do everything itself” is a bad/broken idea that was only useful for monolithic kernels (before they all became modular). This is a 2 part problem – you’d want a “file IO” abstraction (so an OS can ask the firmware to load files without caring if they’re coming from a standard partition or TFTP or whatever) and raw device access (e.g. so an OS can load data from file systems that the firmware itself doesn’t understand).
Once you get past minimum requirements, you start looking at “desirable” things that make it easier for OSs and their boot code. This can include a way for an OS to find out where it booted from (so it can update itself later), some timing services (for small delays) and the ability to get a time and date in a standard way, authentication (so an OS can tell if it’s been tampered with), fault tolerance features (e.g. support for software RAID, the ability to boot an alternative if the primary OS fails to boot, etc). Then there’s generic services for compression/decompression, Unicode support, etc.
You also need to take into account things to make the end user’s life easier. The option to boot from removable media, a utility to set/change the system time and date, support for “dual boot” systems (maybe a pretty boot menu), a utility for partition management (like fdisk), a utility to do a thorough RAM test (like memtest86) and maybe other diagnostic tools, etc. Then you can start looking at remote management tools (e.g. so that people administering server farms can SSH their way in and change firmware setup, etc).
– Brendan
Brendan,
“At a minimum you need something for initial output (e.g. generic video output, serial console, telnet, etc) so the firmware or OS can tell the user what went wrong if/when something fails before the OS’s driver/s are started.”
None of this precludes the use of linux in firmware though, so I don’t follow your criticism. The kernel which is compiled for this main board will clearly support the hardware components which are on that mainboard, which would include standard video and ethernet. It could inform the OS about what it finds, but supporting external devices like PCI ethernet cards wouldn’t be strictly necessary, the OS is going to need drivers to use them anyways – and those drivers are going to be OS-specific.
“Of course you also need a way for the OS’s early boot code to load more of the OS into memory”
Yes, that’s what a bootloader is for. I believe you are referring to the catch-22 condition that exists in bootloader code where OS drivers haven’t yet been loaded. This is very similar to the problem which linux ‘initrd’ solves where the kernel is loaded but not the file system. Well, there are many ways to solve this. What x86 systems do is to have the BIOS load a single boot sector from the first bootable media it finds. That sector typically calls bios software interrupts to search for second stage loaders in the file system, which load the full OS. We could do it this way for ARM64, but the consequences of this design result in really contorted BIOS emulation schemes for alternate bootable media such as thumb drives and network booting you refereed to. I’d personally want to see something more elegant; ideally the bios would be able to load an arbitrarily sized kernel+initrd and run it strait away, much like grub.
Assuming the BIOS were modular enough, one could even dynamically add support to fetch files from within alien file systems. Of course it is extremely unlikely to reach the level of standardization that would be required for this to work. I’m just talking possibilities.
“Once you get past minimum requirements, you start looking at ‘desirable’ things that make it easier for OSs and their boot code.”
Well sure, but the most common problems solved by traditional x86 bootloaders would disappear if the BIOS could only load a full image into RAM in the first place instead of being limited to a single sector (several bytes of which go to partitioning).
“…Then there’s generic services for compression/decompression, Unicode support, etc.”
Oh please no…these kinds of things should not go in the BIOS! It’s like the AAD instructions of the x86 – they don’t belong at all and yet they remain for compatibility. In my opinion, the BIOS should not be needed for features that the OS can do better in it’s own way.
“You also need to take into account things to make the end user’s life easier. The option to boot from removable media, a utility to set/change the system time and date, support for ‘dual boot’ systems (maybe a pretty boot menu), a utility for partition management (like fdisk), a utility to do a thorough RAM test (like memtest86) and maybe other diagnostic tools, etc. Then you can start looking at remote management tools (e.g. so that people administering server farms can SSH their way in and change firmware setup, etc).”
It’s interesting that you mention these things, since this is precisely what I was thinking when I was proposing linux for use as a standard ARM64 firmware.
Hi,
The kernel compiled for the mainboard would need to include drivers for all hardware that may be plugged in, including hardware that won’t exist until some point in the future. You can’t expect end users to buy a new video card, stand around waiting for Linux to support it for 2 years, then reconfigure and recompile their firmware before reinstalling it.
The alternative is a standardised driver interface, so hardware that won’t exist until some point in the future can have a driver included on the device’s ROM. Bonus points if the driver is in some form of portable byte-code and *not* native code for any specific system, so that the same “driver in ROM” can work on all systems (e.g. ARM, 32-bit 80×86, 64-bit 80×86, Itanium, etc).
Linux could inform the OS about hardware it “finds” (e.g. ACPI tables for the motherboard compiled into the firmware binary). You still need a standardised way of making that data available to the OS.
If the OS is loaded as a “huge slab of boot code, kernel and initrd” it’d be nice if the firmware could decompress that huge slab before attempting to execute the compressed code. The generic output (used to tell the user when things went wrong before the OS’s drivers start) should support unicode. If the firmware supports compression and Unicode anyway, nothing is lost by allowing an OS’s early boot code (that runs before the majority of the OS itself is capable of doing anything) to use it too.
Linux is firmly in the “incapable of creating a formal standard for anything” category. There’s no way any company is going to risk their future on using “we slapped it together for now but might change it completely next week if we’re bored” as the foundation for a (hopefully) long lasting standard.
– Brendan
Brendan,
“The kernel compiled for the mainboard would need to include drivers for all hardware that may be plugged in, including hardware that won’t exist until some point in the future. You can’t expect end users to buy a new video card, stand around waiting for Linux to support it for 2 years, then reconfigure and recompile their firmware before reinstalling it.”
It’s totally wishful thinking that a BIOS could have native support for all devices which could ever be plugged in. We don’t have BIOSes which can do it today, they won’t do it tomorrow either. The best you can do is have the hardware emulate legacy hardware protocols, such as the 0xB800 text console buffer. It’s not ideal, but like it or not that legacy buffer implemented in hardware will never support unicode glyphs. I reiterate my opinion that the BIOS shouldn’t be bogged down with these problems, just keep the bootup minimal and let the OS handle localization and driver issues.
“Linux is firmly in the ‘incapable of creating a formal standard for anything’ category. There’s no way any company is going to risk their future on using “we slapped it together for now but might change it completely next week if we’re bored” as the foundation for a (hopefully) long lasting standard.”
Linux is just an extremely convenient/flexible implementation, not a specification. I think you are totally out of line if you think that a linux firmware kernel couldn’t abide by an open booting standard.
Edit: It sounds like you want the bios to have a greater role than simply initializing the mainboard and booting the OS, maybe that’s where we’re getting hung up? I guess we should consider the merits of doing more than that in a standard way. Generally speaking though I don’t think I’d be wrong in claiming that OS devs want to rely on the BIOS as little as possible. Ideally the BIOS would transfer control to the OS extremely quickly with no interface at all unless something goes wrong.
Edited 2011-10-30 20:58 UTC
Not so fast. The chip-set is what needs fairly sophisticated initialization specific to particular silicone. In addition, every kind of ram since SDRAM needs some timing calibration after every bootup.
Without it nothing works at all.
dsmogor,
“Not so fast. The chip-set is what needs fairly sophisticated initialization specific to particular silicone. In addition, every kind of ram since SDRAM needs some timing calibration after every bootup.
Without it nothing works at all.”
You are completely correct, however I don’t understand where you see a contradiction?
I meant, you would have to re-implement much of the bios rom functionality (incl. the proprietary bits) to even attempt booting linux. Linux bootloaders use bios very heavily.
“I meant, you would have to re-implement much of the bios rom functionality (incl. the proprietary bits) to even attempt booting linux.”
Yes, of course, it would be nice if these init bits would evolve to be open source from the get-go on ARM-64 platforms, but it’s a long shot.
“Linux bootloaders use bios very heavily.”
Actually, if the BIOS supported loading a kernel+initrd directly, then there would be no need for bootloaders at all. It wouldn’t preclude their use, but they’d be optional. The main reason bootloaders need to call the bios in the first place is to read more sectors to load everything into ram prior to the native device drivers being ready. Once things are loaded, most operating systems access hardware directly without the BIOS.
Which really is not a problem, since nearly no users do install OS. Users use whatever the PC/device comes with, they only install applications. And sometimes even accept (vendor) provided OS updates.
And the OS handles the hardware abstraction, making the hw differences a non issue for the applications.
Form factors and power supplies are trivial. And the tighter integration of ARM devices with less need for external circuitry, will make the motherboard PCB much simpler than the densely populated x86 boards.
Adding one or more standardized buses like PCI, PCI-E variants are also fairly trivial. You can actually get loads of ARM boards with such slots today.
I’ve been wondering about this too. Like everyone’s saying: currently it’s a mess with hardware-specific boot loaders and kernel builds. Which in the embedded world isn’t necessarily a bad thing. Imagine if it took your phone as long to start up as your traditional BIOS-based desktop which wastes time doing unnecessary things for the sake of compatibility.
Up until now there hasn’t been a real reason to standardize. End-users aren’t installing operating systems on ARM devices so it does’t matter how it gets there.
But that’s all changing. We’re getting Windows for ARM, ARM processors are rapidly approaching general-purpose desktop performance, and with servers on the way, admins are going to want to install their Linux distros of choice.
I think, in terms of booting, we’re going to see UEFI on ARM. It’s already in the UEFI standard, and Microsoft seems to want it on PCs at least.
I used to be excited about the growing marketshare of ARM, thought it’d be a time to start over. But now I think we’re going to see more of the same crap… or worse.
I would love to have Windows Home Server & Windows Server Foundation running in these processor. Ofcourse, Linux will there for sure.
This is very big threat for Intel & AMD (x86 based processors).
Should give them a lot more elbow room.
Very nice that ARM have at last developed a 64-bit ISA.
It’s a shame that there’s only very few published details at this time, and no instruction set published. The best source of info seems to be ARM’s “Technology Preview” document:
http://www.arm.com/files/downloads/ARMv8_Architecture.pdf
The highlights seem to be when going to AArch64 that there’s now 31 general purpose registers and an improved exception model.
There seems though to be two losses when going to AArch64 when compared with AArch32. The first of these is the loss of the “M” version of load and store instructions. Those would allow you with a single instruction to store multiple registers. It makes sense that these instructions have been lost since there’s now double the number of registers, so there’s no longer space in a 32-bit opcode to define storing/loading of the complete register set. Given the speed at which modern CPUs operate, the depths of pipelines, and the cost of going out to RAM, the new replacement “P” version of load and store instructions will likely result in little if any difference in performance compared to the “M” version. There’s just a cost in terms of needing more instructions to save out multiple registers.
The other loss in AArch64 is described in the PDF as this:
For me, the great beauty of the AArch32 instruction set was that every instruction was conditional. That meant that ARM code would have far fewer branches than comparable code for other architectures, and thus no branching penalties on execution. This change will dictate that AArch64 code will require many more branches than AArch32 code and thus be less efficient.
I guess that change explains why ARM put in a lot of effort into branch prediction logic in the Cortex-A7, since they’d need such logic for an AArch64 CPU.
More conditional instructions means more branches, not less. So does that now make ARM uglier than other architectures?
Edited 2011-10-29 15:42 UTC
That is a confusing statement. Instruction predication generally reduces the number of branches in the code, so the parent is correct.
Predication is nice for in order micro-architectures and can reduce the size of the code, but it does not really make sense for out-of-order micro-architectures which the ARM64 ISA is presumably targeting since Cortex A9 and A15 are both OoO already.
Most CPU architectures I’ve looked at besides ARM only allow for conditions to be checked on branch instructions. Thus on such CPUs all conditional code requires a branch.
The 32-bit ARM instruction set lets every instruction be conditional. If the condition is not matched the instruction turns into a no-op. This means that you’ll avoid branching for a large number of cases – you’d only branch when you really need to.
The reasoning behind this on 32-bit ARM is that, in general, a great deal of conditional code tends to just be a few instructions long. The cost of a no-op is a single clock-cycle, whilst the cost of a branch could be dozens of cycles. You’re also often saving two branches, rather than one, since there’s no need to branch back.
The issue with all instructions being conditional is that they end up modifying condition flags, which then creates dependencies between instructions and makes creating an out-of-order microarch difficult (perhaps this is why they reduced the number of predicated instructions?). The in-order arch with predicated instructions may well turn out to be slower than the out-of-order arch with branch cost amortized via a decent branch predictor and branch target buffer.
With the exception of direct comparison instructions, ARM instructions only update condition flags if you include the “S” flag in the instruction. When writing conditioned ARM code you bear that in mind.
You are right – using conditioned instructions creates dependencies. That’s inevitable. But aren’t such dependencies the kind of thing that chips with multiple parallel execution units have been handling since they first appeared? Part of what branch prediction units have to do? And indeed what any out-of-order architecture has to deal with?
It should be noted that ARM Cortex-A9 and ARM Cortex-A15 are both multi-dispatch out-of-order chips. So whilst it’s difficult, it’s not impossible. Cortex-A9 outperforms it’s in-order predecessor the Cortex-A8.
I don’t know the make-up of the 64-bit ARM instruction set since it’s not been published yet. My guess is that since the instructions are all 32-bits long, just like 32-bit ARM, and they have double the number of registers to deal with, the extra bit needed to specify a register simply ends up meaning there’s not enough bits to make all instructions conditional.
“For me, the great beauty of the AArch32 instruction set was that every instruction was conditional.”
Yeah… I agree, I really like that. I also liked Z80’s shadow registers though .
These “losses” are discussed on beyond3D
http://forum.beyond3d.com/showthread.php?p=1593049#post1593049
Thanks for pointing me towards that discussion. Some of the people there clearly know more than they are allowed to tell. 😉
I think I now understand better the arguments involved.
Removing conditional execution still appears to me to be a loss. I understand that there’s reasons why it may be a good thing to remove, with a significant pay-off in simplifying logic and saving power. However even with branch prediction and multi-dispatch out-of-order execution I’d expect there will be a net loss of performance, since even a correctly predicted branch has an execution cost.
Thought provoking stuff.
My perspective on this is skewed from the fact that I got into ARM with the ARM2. The competition at the time was the 80386 and 68020, and of the three the ARM2 was the high performance chip. I tend therefore to be skewed more towards performance than power consumption.
Removing conditional execution still appears to me to be a loss.
I started with ARM7 (GBA), so I had the same thoughts as you. Now, B3D people almost convinced me in v8 ISA “goodness” =)
But ARM ISA was very unique and now they lost some features and turned mainly into “generic RISC”.
Edited 2011-11-01 16:28 UTC
It seems I am the only one unhappy with seeing how another architecture adopts the “bloat” philosophy.
The Arm arch. was a marvel to behold. Then they started to add stupid instructions with the ARM9 model, got ridiculous with the ARM11 (the 32 bits minivectors thingie), and got messed with the CortexA8 (the double way to perform a float op, the VFP one misteriously crippled, the NEON one just unnecesary).
The launch of a 64 bit architecture was a perfect oportunity to break with all the braindead decissions; just put 2 different cpu’s in the same core, a pipe for old ARM32, and other one for a clean, new architecture.
Just share the biggest, more expensive blocks like the caches, the float muladds, vector permutes, and maybe the exception handling pieces. After a few years, the 32 bits part can be just emulated.
Unfortunately ARM is following the Intel/AMD path. Like eveybody followed the M$ way of doing things.
But remenber, ARM: Apple hits hard M$ just reckoning that crap is crap and doing something better. And you, ARM, hit hard Intel/AMD because of a cleaner achitecture.
Somebody will come in the future and will hit you just as hard, designing something less murky than your patched-away ARM64. Hopefully…
If that is your opinion, what do you think of this effort ?:
“The x32 system call ABI”
“That best-of-both-worlds situation is exactly what the x32 ABI is trying to provide. A program compiled to this ABI will run in native 64-bit mode, but with 32-bit pointers and data values. The full register set will be available, as will other advantages of the 64-bit architecture like the faster SYSCALL64 instruction. If all goes according to plan, this ABI should be the fastest mode available on 64-bit machines for a wide range of programs; it is easy to see x32 widely displacing the 32-bit compatibility mode. ”
http://lwn.net/Articles/456731/
That kind of spaguetti is not really related about word size, is about having different instruction sets live together (the 32 bits x86 and the 64 bits x86 are really 2 different ISAs). The problem is having 2 different ISAs living under the same OS. There are several ways to accomplish that, but none of them nice. PalmOS lived with that, and also Apple.
The ugliest one I have known happened under PalmOS, when they started using ARMs (little endian) in substitution of the old 68k (big endian). The whole kernel & apps were emulated, but you could program direct to the ARM for speed. To call the OS, you were forced to work around the little endian / big endian issue, reversing any argument passed to the OS call.
Yea, this is a mightly cool idea. Do you know about any popular distros remixes compiled with that? I would love if this could work in parallel to x86 and x64 abi in normal 64bit linux.
Edited 2011-10-31 10:08 UTC
I’m not aware of any right now.
AVR64 coming in a decade? (not like AVR32 displaces ARM much; OK, I don’t know how clean AVR32 is – point is, it has only superficial resemblance to the 8-bit AVR)
Or maybe XMOS. Or, if I would have to bet on something, most likely some Loongson descendant…
Thanks for pointing me to XMOS. I didn’t know the architecture, it looks interesting. Not in my line of thinking but interesting anyway.
Imho a modern, clean ISA should be centered into better FP integration with the ALU. Floating point types are very important today; in many CPU designs, FP appears as an afterthought.
I am undecided about the vector programming thing. The Cell SPU ISA (for example) is powerful and clean. It could be a good reference, but few programmers take the effort to vectorize their inner loops these days. In fact, few programmers even know what an inner loop is, these days.
So your proposal is to make a more expensive processor with no capabilities for running legacy 32bit applications natively?
You didn’t understand what I wrote.
There are two ways to ensure backwards compatibility. Extend your old design (with the old mistakes), or include it in the same package.
Example: the Itanium included extra hardware for x86 compatibility. A revision later, x86 was entirely emulated by software, therefore removing the need for extra legacy crap in the desing.
LOL? I find it funny, as I fail to see in which way is Apple hitting MS or ARM hitting x86. Do you consider OS X 5% market share as Apple hitting MS?
More than that, Intel is coming up with Moorestown, ultra low power , x86 microprocessors for embedded market, phones, whatever. It seems to me that Intel is trying to hit ARM.
You seem to be living in the desktop space and rarely looking outside. Who said anything about OSX?
Apple gets their money from the iphone/ipod/ipad set. M$ has been trying to enter this space for years, and still they are totally unable to make a dollar from it.
Here, some news for you:
http://articles.businessinsider.com/2010-05-26/tech/29988890_1_ente…
About Intel trying to enter the low power / embedded market: we will see. They are still quite far, their designs are too inefficient, expensive, lack integration and components, and always come late to market.
All that despite Intel having the best existing fab technology.
At some point, ARM designs will be so complex that the x86 will be able to compete. At that point, a cleaner architecture should be able to blow both of them out of the water.
Although the ARM chip is now associated almost exclusively with mobile devices, it actually debuted in a desktop PC (Acorn Archimedes) in June 1987. Yes, it became the world’s first mass-produced 32-bit RISC computer and for a year or so (1987-1988) was actually the world’s best desktop PC until IBM PC/Intel caught up.
Fast-forward to April 2003 (yes, 16 years later) and we finally get 64-bit desktop (Athlon 64) and server (Opteron) chips. Intel just over followed a year later, but by this time, ARM was considered strictly a low-power (in both speed and energy) CPU for use in embedded devices. Sadly, ARM didn’t follow AMD and Intel into the 64-bit space back then, which retrospectively may have been a mistake.
Now we move to 2007, when ARM finally twigged that 64-bit might actually be useful and they belatedly start development on extending their architecture. They’ve now lost 3-4 years on their potential rivals for 64-bit servers (and maybe eventually 64-bit desktops/laptops/netbooks/phones).
Reaching the current day, ARM finally announce that they will produce a family of 64-bit chips that they started developing 4 years earlier. I was fully expecting a launch date in 2012 (making 5 years of development and being up to 9 years behind their opposition), but *no* – the earliest we’ll see them is in 2014!
It’s simply too little too late – as we all know, it’s the software that makes the CPU family successful. We currently have no 64-bit software for ARM and have very few server and desktop OS’es even running 32-bit ARM (most Linux distros have actually abandoned supporting even 32-bit ARM by now). They’re betting the farm that Windows 8 on ARM will take off, but it won’t run *any* legacy software at all (even by emulation) which was always a strong point of consecutive Windows releases.
In conclusion, I think ARM in either 32-bit or 64-bit form will remain limited to phones and maybe some netbooks. It won’t get any market share on desktops or servers – AMD and Intel are just too far ahead on performance and OS/software availability for ARM to make inroads.
Edited 2011-10-29 15:54 UTC
Consider that there were 64bit server chips (Alpha, PA-RISC, Sparc, POWER) before the Athlon, and they aren’t viable competitors anymore. ARM introducing 64bit at the same time as the Athlon may or may not have also been futile. Perhaps they really made a wise decision doing what they’re good at and capturing the mobile market in the process as they’re now more relevant than Alpha, PA-RISC…
waynemccombe,
“Consider that there were 64bit server chips (Alpha, PA-RISC, Sparc, POWER) before the Athlon, and they aren’t viable competitors anymore. ARM introducing 64bit at the same time as the Athlon may or may not have also been futile. Perhaps they really made a wise decision doing what they’re good at and capturing the mobile market in the process as they’re now more relevant than Alpha, PA-RISC…”
This move to 64bit has me a bit perplexed given ARM’s market demographic. I think ARM needs to do alot of catching up to do performance-wise before it will be a serious contender to the desktop (or cluster farm), 64bit or not. Maybe ARM is doing this for forward compatibility so they don’t have to cross the bridge later on. Frankly though, most desktop users still don’t actually benefit from 64bit registers/addressing today, the high bits are mostly just wasted space. (yes of course I know AMD64 introduced other significant ISA changes as well).
That said, I can think of new OS designs which would make good use of 64bit addressing. For example, if the 64bit (or 48bit, etc) address space was actually backed by NV-RAM behind a large cache, then one could theoretically do away with disks entirely and have all files directly addressable in system RAM. It would eliminate the need to explicitly load and save contents to disk, files just stay in NVRAM even after power cycles. Of course we’d need a very robust OS with robust protections, but it would remove alot of bottlenecks currently incurred by file IO. Consider a web server/database/media player that doesn’t need to do any file IO and who’s changes are persistent.
Of course virtual memory allows us to emulate this with a hard disk or flash drive, but it really doesn’t eliminate the IO bottlenecks.
You would still need to constantly flush/sync the contents of the fast (volatile) ram to the slower nvram (which is slower than a ssd). This is essentially the same as any file io today.
Unless of course you’re saying that the nvram is memory mapped in the processor, which would really simplify file access but would be slower than loading the file to ram then saving it back to non volatile memory.
AFAIK in most Arm SoCs peripherals can be memory mapped (including external flash). Unless I’m missing something there is nothing particularly strange in what you’re proposing (having nv storage sharing the address space)
Edited 2011-10-29 23:44 UTC
_txf_,
“You would still need to constantly flush/sync the contents of the fast (volatile) ram to the slower nvram (which is slower than a ssd). This is essentially the same as any file io today.”
The problem with non-memory mapped NVRAM is that it needs to be explicitly transferred/synced before it can be used. Memory mapped NVRAM would be ready to use instantly. Having it memory mapped eliminates all the bottlenecks in transferring files all the way from the CPU to the south bridge. So maybe it’s a hundred cycles instead of thousands.
“Unless of course you’re saying that the nvram is memory mapped in the processor, which would really simplify file access but would be slower than loading the file to ram then saving it back to non volatile memory.”
I did say it could be cached. If the CPU can address the file directly on the bus, as though it were ram, then there is no need to send multiple IO bursts along the bus for every single file IO operation.
“Unless I’m missing something there is nothing particularly strange in what you’re proposing (having nv storage sharing the address space)”
I don’t think it’s a strange idea at all, I just wanted to say that it would be a good use of 64bit address space.
This is wrong, clearly you’re uninformed.
Often it’s better late to the party than never.
Aren’t the iPad and iPhone based on ARM CPUs?
Isn’t everyone trying to grab a share of the smartphone and tablet market?
Also, are there now more netbooks/notebooks sold to consumers than desktops to end-users?
From what I remember of Apple’s history, they have switched over CPU architecture every 10 years or so. It would not be surprising to see Apple switching completely to ARM-64Bit within the next couple of years – it’s been almost a 10 year wedding to the Intel’s X86 one.
Such switch may bring an end to the Hackintosh era…
Mistake? Nobody will spend millions to design a chip that no one wants.
They lost nothing. ARM acts on demand of their partners. ARMv8 development started much ahead of such a demand. 2013-2014 is a nice time for real products _with_ software.
It took ~5 years for x86 software makers to adopt 64bit.
Edited 2011-10-31 16:44 UTC
I wonder how much desktop ARM CPUs will struggle with latest Photoshop, 3DS Max or Cryengine, to name just a few resource intensive apps.
Windows 8 will come to ARM.
– Yes, but they also supported Alpha, MIPS, Itanium and MIPS which are dead or almost dying platforms. So Windows coming to some architecture is not a guaranteed success. It may be or may be not. It may be a success on tablets but a fail for desktops. Also consider that many popular Windows apps may or may not come to ARM on Windows. Most likely apps like Photoshop, 3DS Max, Maya, games, Visual Studio and other resource intensive apps won’t come to ARM.
ARM is coming to servers.
– Only to those servers that need heavy parallelization and performance it’s not important. Once you need performance ARM is not a choice. So ARM will be present in some class of servers but will not be a universal solution as x86 is.
Apple will switch to ARM.
– Will not as for the foreseable future Intel CPUs will be orders of magnitude more powerful than ARM CPUs.
ARM is better than x86
– Specifically in what way better? They are better for phones and tablets as until Intel’s Moorestown there won’t be an ultra low power x86 CPU. Is it better for desktop usage or servers? I fail to see why. Many people claim ARM architecture is “superior”. I fail again to see why. Many people like ARM because “it’s not Intel.” “Being not Intel” isn’t a technical merrit on itself.
I am not a fanboy and I like to see the right tool being used for the right job. This means right now ARM for low power devices such as phones and tablets, maybe neetbooks and laptops but x86 for desktops and servers.
Some people want ARM to come to desktops see more competition in desktop market. This is a good reason but why not advocate for another x86 maker beside Intel and AMD, to see some competition.
VIA, SIS, Transmeta, Rise Technology, Centaur Technology, National Semiconductor, Cyrix, NexGen, IBM, UMC, NEC all did x86 CPUs at some point in time. VIA still sells x86 CPUs with limited success. I would love some strong competitors coming to x86 market too boost both research&development and price cuts. AMD, on itself , it seems that has troubles fighting Intel.
Or why not promote an entirely new, written from scratch, CPU architecture aiming desktop performance? Well, just from a philosophic point of view, as practically is just an utopia.
twitterfire,
I mentioned earlier that I didn’t think ARM was ready either (in terms of performance).
But I do think x86 platform has a number of problems. The foremost concern on everyone’s minds is tons of legacy cruft, it increases complexity and wastes silicon but we still need to support it – things like MMX and a complex FP stack that overlaps with SSE, etc. Then there’s all the various modes: Real, 286 Protected, Virtual Real, 386 Protected, 64bit Long Mode, VT extensions, etc. Each mode has it’s own way of handling of interrupts, segment registers, faults, memory addressing, etc.
Then there are scalability concerns: the x86 cache coherency protocol is a severe bottleneck in large multiprocessor systems. Back in the 90s intel decided to make synchronization implicit, but this means every single memory access needs to be implicitly synced with other cores/processors. This is easily manageable with two processors, but as more are added implicit synchronization becomes a bottleneck in and of itself. Explicit synchronization would mean that the cores would not need to communicate between each other at all except when threads are signaling each other.
(although I see ARM just recently added cache coherency to their intercore protocol spec as well
http://www.eetimes.com/electronics-news/4216647/ARM-AMBA-cache-cohe… )
I’ve gotta go…maybe more on this later..
Most likely apps like Photoshop, 3DS Max, Maya, games, Visual Studio and other resource intensive apps won’t come to ARM.
Photoshop – Mac/68k
Maya – IRIX/MIPS
Why do you think they can’t migrate to WARM from WINTEL? =)
Edited 2011-10-31 21:33 UTC