Update: it’s official now – NVIDIA is buying ARM.
Original story:
Nvidia Corp is close to a deal to buy British chip designer Arm Holdings from SoftBank Group Corp for more than $40 billion in a deal which would create a giant in the chip industry, according to two people familiar with the matter.
A cash and stock deal for Arm could be announced as early as next week, the sources said.
That will create one hell of a giant chip company, but at the same time – what alternatives are there? ARM on its own probably won’t make it, SoftBank has no clue what to do with ARM, and any of the other major players – Apple, Amazon, Google, Microsoft – would be even worse, since they all have platforms to lock you into, and ARM would be a great asset in that struggle. At least NVIDIA just wants to sell as many chips to as many people as possible, and isn’t that interested in locking you into a platform.
That being said – who knows? Often, the downsides to deals like this don’t come out until years later. We’ll just have to wait and see.
“ARM on its own probably won’t make it,”
This engineer had never seen such bullshit.
This has nothing to do with Softbank not knowing what to do with them. Softbank invests money from multiple Saudi and Arab billionaires and sovereign wealth funds. WeWork lost $39B in value. Something had to be sold to make the Saudi princes who are really behind Softbank whole from that massive investment loss. I’ll leave the implied consequences of what would happen otherwise out of this post.
Agree wholeheartedly.
The free money (from oil, fed, or other places) distort the markets, and cause really terrible investment decisions. Or, in this case selling ARM to a buyer which is not good for the company.
I also do not like the “leveraged buyouts”, where the small fish eats the larger one. The saddest one was Toys’R’Us, where otherwise profitable company was milked dry to pay off private investors and banks.
Can you provide links to the Arab connection with Japan’s Softbank?
A simple Google search “softbank saudi arabia” gives plenty of links.
> That being said – who knows? Often, the downsides to deals like this don’t come out until years later. We’ll just have to wait and see.
You rather missed the biggest worry of this specific takeover; the potential for monopoly abuse in the mobile arm device graphics technology market, where nvidia seems to be largely nonexistent at the moment (apart from nintendo consoles).
emphyrio,
Yeah, I’d prefer for ARM to exist as an independent company. However we live in modern times when mergers, acquisitions, consolidation, etc are the name of the game. So the question really isn’t whether nvidia will be good for ARM, but whether nvidia will better than the other suitors. That’s the rub and personally I’d much rather see ARM go to a company who’s business model is mostly just selling & licensing chips. Otherwise you run the risk of a company buying ARM technology for themselves and monopolizing it. Just think of what a company like apple could do as owner of ARM to hold the rest of the market hostage. I’d rather see it go to more traditional chip company.
@Alfman Now that you mention that I’m shocked that Apple didn’t end up buying ARM.
rahim123,
It’s surprising to me too, but maybe they put in a bid and were just being secretive about it? I don’t know. $40B is a ton of money, but for apple is seems like a drop in the bucket. Maybe they signed an exceptionally long term deal with ARM holdings, but if not does that mean nvidia outmaneuvered apple? Anyone remember why apple was boycotting nvidia anyways? Haha.
Apple has a architecture license in perpetuity – Apple will always be able to make their own ARM designs, no matter what happens to ARM Holdings.
The only thing Apple would gain from buying ARM Holdings is if Apple wanted to operate ARM as-is, because literally anything else would run afoul of antitrust laws.
Drumhellar,
A “perpetual license” is unclear. Yes on the face of it it lasts forever, however what’s not clear or implied by that is what’s covered by the license. For example I may have a perpetual license to use windows, but my perpetual license is bound to a specific version of windows. I may need a new license for future versions.
So, can anyone point me to what exactly is covered by this “perpetual license”? I’m not trying to be dense, I’m just trying to be informed
Yeah, antitrust risks may have played a part.
Alfman,
ARM seems pretty tight-lipped about their architecture licenses. I found an announcement from 5 years ago that ARM had given a 64-bit architecture license to a 7th company, but wouldn’t say who (It was generally accepted that it was to one of the 7 that own a 32-bit architecture license). Some suspected it was Intel, but ARM wouldn’t reveal (Apple already had a 64-bit license)
Looking at articles contemporary to Apple’s acquisition of an ARM license in 2008 (They sold most of their original stake back in the ’90’s), it took a couple months before it was more than just suspected that the new licensee on the block was Apple
No need for Apple to get involved, they have a perpetual license to the instruction set and that’s all they need to build their own custom design chips, which incidentally are better than anybody else’s designs. It’s the rest of the industry that are dependent on off the shelf ARM chip designs that should be worried.
ISA is an evolving matter. It is not static. Apple may have perpetual v8 license, but what about v9?
> Apple didn’t end up buying ARM.
Government (*some* government, if not US then EU) would probably have given them hell on a platter.
Agreed, people are worried about a corrupt few, but this is very bad news if it happens to the entire market, a market that dwarfs the wealth of any individual or sector.
>> That being said – who knows? Often, the downsides to deals like this don’t come out until years later.
—
Not years. Months, or even weeks.
I have zero doubt I’m my mind this is about hurting two rivals.
Apple, currently using AMD graphics. With the switch to ARM in full swing all Nvidia needs to do to get back at Apple is make a tiny tiny Terri to the licensing of ARM and wham!
—
AMD.. as holder of the keys Nvidia can make a minor change to ARM, swap out the licensing for it with something less open, and charge, like Apple, AMD to implement we features. AMD makes multiple RISC processors and ASICs. Including ARM variations.
Nice game I see while everyone is slapping aApple around for enforcing their store policies, and protecting their users; big government somehow lets this go through.
There’s a tiny itty bitty chance I’m wrong but I’ve been in this industry long enough to tell the difference Between a pride horse and a mule! This has every sign of being a disaster for all but Nvidia
nvidia is on par with apple and microsoft in its worst years.
Sounds like a sweet deal for MIPS and RISC-V.
On the other side:
Kicking nviida out of Macs after bumpgate.
Transitioning from Intel to ARM.
And than beeing forced to buy nvidia again…
Sucks to be Apple…
Apple was one of the founders of ARM when it was spun off from Acorn. I’m pretty sure their licence is ironclad.
That can be true for the 32 bits ISA but I doubt that’s the case for the 64 bits one which is a total different beast and it’s the one Apple uses now.
Apple only licenses the ISA.
There’s a shitload of regulations that NVIDIA won’t be able to hurdle through, a lot of people think that “making an offer” is as good as “NVIDIA purchased of ARM is a done deal.”
Current license yes. Any minor change Nvidia makes could use a different licensing type.
> Sounds like a sweet deal for MIPS
Not so fast. MIPS isn’t in a good place, ownership-wise, or legally:
Is MIPS dead?: https://www.cnx-software.com/2020/04/22/is-mips-dead-lawsuit-bankruptcy-maintainers-leaving-and-more/
Loose lips sink MIPS: https://www.eejournal.com/article/loose-lips-sink-mips/
MIPS is dead, and RISC-V will be mainly in the IoT space.
From ARM, Apple only licensed the ISA, they have their own microachitecture and own GPUs. Apple’s CPUs and GPUs are significantly better than anything ARM (or NVIDIA for that matter) have to offer in the SoC space Apple is going for.
RISC-V is already moving well out of IOT space….. RISC-V server cores already exist that are competitive with last gen ARM ones, and if big companies basically slap a RISCV decoder on thier current designs you get a RISCV that performes as fast as anything else with good code density.
RISCV is pretty much what you would make with a clean slate ARM ISA….and SiFive’s IP also has the ability to group togethter and mix and match any of their cores. So, just like ARM you can do things like Big cores + smallish background task core + management core.
Who is doing RISC-V server cores?
https://www.westerndigital.com/company/innovations/risc-v
If you want to know who is working themselves in the direction of server risc-v its none other than Western Digital. They have already well along to be able to built huge mother clusters.
https://chipsalliance.org/ is also a interesting group.
Those seem to be for embedded systems (which would make sense, given Western Digital intention of application; their products)
cb88,
I’m also curious. Do you know of a vendor that selling finished products?
I haven’t been watching it too closely, but I haven’t seen anything even remotely competitive to ARM yet.
Take this for example, a mediocre processor with a high price tag.
https://hackaday.com/2018/02/03/sifive-introduces-risc-v-linux-capable-multicore-processor/
https://hackaday.com/2019/02/11/building-a-risc-v-desktop/
I would be very interested if you could point me to something that’s both competitive and available for consumer/turnkey applications (ie I am not interesting in buying the IP or going through the process of fabricating them).
The ARM and RISC-V ISAs are quite different, RISC-V is more similar to the MIPS ISA (they came from the same designer, I think). In fact. slapping a RISC-V decoder wouldn’t be so simple. Some differences, for example, are that RISC-V (like MIPS) doesn’t have status flags but branch instruction that can directly branch somewhere else, also on RISC-V all the instructions are very regular and following the same logic everywhere it is possible, which is also one of the constraint of the RISC-V ISA : you can use it freely, but if you want to add your own modules you need to follow the spirit and the letter of the core ISA. ARM, for whatever reason, let their licensee do anything with their addons, and it can be bad…
edit: In a way, RISC-V (and MIPS) are a bit like “functional programming in assembly” in the sense they don’t manage global states (status flags) which are a pain to manage in Out of Order decoding…
RISC-V does have a status register and branch instructions that check status flags. Not sure also what do you mean by “functional programming in assembly” – everything in RISC-V is mutable by default so the same caveats (locking, cache coherency) apply like in ARM or x86/x64.
“Kicking nviida out of Macs after bumpgate.
Transitioning from Intel to ARM.
And than beeing forced to buy nvidia again…
Sucks to be Apple…”
Actually Apple is the player in the best position in relation to this deal, they have a perpetual license to the instruction set and that’s all they need to build their own custom design chips, which incidentally are better than anybody else’s designs. It’s the rest of the industry that are dependent on off the shelf ARM chip designs that should be worried. Apple can just watch what unfolds and get on building their new generation of ARM Macs, and iterating their chip designs to stay well ahead of the competition. Personally I am very excited about the design innovation that ARM will unleash on the Mac, especially a generation or two of silicon down the line (which in Apple’s case is just a couple of years away). Good times to be a Mac user!
Strossen,
Is there any proof that apple’s designs are superior to anybody else’s designs, or is it actually just that TSMC have better fabs than anybody else? You may not care about the distinction, but it makes a difference in terms of the accuracy of claims that “apple has better designs”. I don’t have evidence either way, but if you do I’ll ask you to submit it.
How do you know? Is their license public? I imagine their agreement covers existing ARM technology, but unless you’ve seen an agreement that says otherwise, it’s not automatically a given that it covers all future ARM technology including after a change of ownership. If nvidia adds new innovations to ARM, I’m not so sure that old ARM licenses will cover these new innovations. Again, I make no pretense that I’m privy to the details, but do you have any evidence for the claims?
Let’s say for the sake of argument the license does give apple access to all ARM technology past and future (note I’m not willing to accept this is true without evidence, but hypothetically…). Even then, it doesn’t necessarily mean nvidia couldn’t add new modules under a different trademark that isn’t covered by the license. “ARM cores with super-scale technology™” Apple: “we want that” Nvidia: “sure, sign here for a super-scale license”. Again, all fictitious, but I’m just making a broader point about making assumptions. The details do matter, especially when it comes to definitive conclusions about what apple can do under it’s license agreement.
Ultimately, it is better to have clear sources when making claims, agreed?
Apple’s custom cores are very very good, basically close to X86 in terms of performance per cycle.
On the ARM ecosystem, they have the best microarchitecture.
Apple as of now, has a bigger design group than ARM, so they’re going to be OK. Their main connection with ARM was on ISA/ABI ecosystem.
javiercero1,
I think that is reasonable, however Strossen’s claim went further than that and needs to be backed with some sort of evidence IMHO.
What are you basing that on?
Well, it makes for interesting speculation. If apple & nvidia refuse to be bedfellows going forward, would apple consider forking it’s ARM chips and breaking ISA/ABI compatibility with the rest of the industry? There’s no doubt that apple is big enough to do it, though I don’t know whether it’s license allows it. And if they did, how would the industry react to such a branch?
“ARM on its own probably won’t make it,”
ARM was coping OK before SoftBank took advantage of a falling pound, and they have no shortage of customers for the designs.
The real problem is that SoftBank needs to get paid, and there is a shortage of entities that can afford and want to pay the price.
“At least NVIDIA just wants to sell as many chips to as many people as possible, and isn’t that interested in locking you into a platform.”
Actually, that could be a problem. What has been important about ARM historically is that they haven’t been selling chips, but chip designs. Which means Nvidia, Samsung, Qualcomm, Apple, etc. can design their own SoCs, combining ARM cores and GPUs as they see fit.
Nvidia has skin in the game to want to only sell ARM SoCs that include an Nvidia GPU. To that extent, Microsoft, Amazon and even Google might be better, as they don’t have a reason to limit the choice of GPUs. / push their own.
But any way you cut it, there is potential for their to be problems between the possible new owners and at least one of the current design licensees.
grahamtriggs,
Microsoft and google are quite dependent on partners to manufacture windows and android/chromebook devices. It’s not completely balanced, but at least it’s not completely imbalanced either. However if you transfer ARM to either of them then it significantly shifts the balance of power. Being in control over both hardware and software, google or microsoft would have a lot more power to strong arm their partners, which is bad. But at least their business model isn’t like apple’s, who might have the ability and inclination to withhold the technology from their direct competitors.
It’s why I’d prefer for ARM to go a hardware company that isn’t tied to a platform.
I don’t see how ARM can remain a neutral party if it’s part of Nvidia, so I’m inclined to agree with the narrative that this will set the company, its technologies and the smartphone industry on a completely new course. Maybe this will precipitate a more rapid move to RiscV, maybe Intel will have another shot at the smartphone market, or maybe Huawei has an alternative up its sleeve.
Chris Williams from the Register puts it like this:
https://twitter.com/diodesign/status/1304881920405241856
[edited to remove duplication]
flypig,
That’s the thing though, there’s no guaranty that any new owner would be neutral.
I think that would be good, but realistically it would take years to ramp up a new ISA competitor. The general trend is consolidation, it would be a risky and expensive venture with no certainty for ROI, especially on an open architecture like RISC-V with no royalties.
Alfman
That is the problem identified, it’s the never ending consolidation of smaller entities that is the real issue.
Long term, I’m not confident at all that this purchase will lead to better outcomes or lower cost processors. Low cost is not really Nvidia’s gig!
cpcf,
Indeed, I’m not at all confident that this is better for consumers either. I’d rather ARM remained independent, but then that’s not how our world works
Competition in a free market is determined by supply and demand rather than individual players. However the gotcha for free markets is that corporations have become so enormous that there’s very little competition left. All of this consolidation has resulted in oligopolies/duopolies at the top. And while many of us can see where this is taking us, not many of us have a hand on the steering wheel.
Yes, true, but different owners would be more or less neutral. I’m not a great follower of company activities (it’s the technology I find interesting), so I don’t have much insight, but simply put: Nvidia is an ARM licensee, whereas Softbank isn’t. That changes the dynamic.
Yes, I totally agree with all you say. The shift from computers to phones drove the transition from Intel to ARM, and everyone now thinks the prize is the de-facto architecture for IoT (Softbank) or AI (Nvidia). I’m not entirely convinced either of those will drive the next big architecture transition, but something will.
A move to RiSC-V? Nowadays only Apple, Qualcomm and Samsung design their own cores. The rest like Mediatek license the design from ARM.
Only Apple has a fully custom core. Qualcomm and Samsung use “semi custom” cores, which are based on ARM’s designs and improved further.
This is going to be a regulatory shitshow.
A full move to RISC-V will take some 10 years at least.
Remember a time when ARM was mostly being used for embedded ? That’s wher RISC-V is currently at. Hardly even started.
Lennie this is not simple.
–A full move to RISC-V will take some 10 years at least.–
The risc-v foundation says 5 years.
–Remember a time when ARM was mostly being used for embedded ? That’s wher RISC-V is currently at. Hardly even started.–
Except when you look at the work western digital is working on for clustering for Risc-V this is what happens 7 years before now with arm as in 2013 with HP Moonshot. So we are already seeing the HP Moonshot for Arm with Risc-V done by Weston Digital now. So is Risc-V only being worked on for only embedded usage at the moment the answer is no . Parties like Western Digital by what they are releasing are absolutely working on Risc-V for very large cluster servers.
Also when the HP Moonshot was done the Linux distribution support of the arm platform was way worse than the current Risc-V distribution support.
Something to be highly aware of is that the Risc-V form of clustering that is a direct link to the L2 cache is not in fact possible intel or arm cpu cores. So its going to be interesting to see how this tech plays out.
Risc-V foundation 5 years from now forecast is in the range of possible. I would say 5-15 years. So 5 at the least not 10. Risc-V may not be as far off as it first appears particularly once you start looking at where Risc-V R&D vs historic Arm R&D. Lot of ways risc-v is moving faster though the R&D this is kind of to be expected as they get to learn from Arm and Intel miss steps.
> Except when you look at the work western digital is working on for clustering for Risc-V this is what happens 7 years before now with arm as in 2013 with HP Moonshot.
Where do I look?
All I could find is information for their RISC-V cores (like “SweRV Core EH2”); where they’re obviously intended for embedded use, and where the product brief itself say things like “This core was designed to target datapath controller applications for NAND flash.”.
Are you sure you didn’t get confused? E.g. see some chips being used for arranging data transfers in large storage area networks (where performance is irrelevant because other hardware does the data transfers) and falsely assume these chips are being used for high performance compute?
–Are you sure you didn’t get confused? E.g. see some chips being used for arranging data transfers in large storage area networks (where performance is irrelevant because other hardware does the data transfers) and falsely assume these chips are being used for high performance compute?–
You missed something
https://blog.westerndigital.com/omnixtend-fabric-innovation-with-risc-v/
Weston digital Uncore prototype (Omnixtend proto cpu) does not use SweRV cores instead uses four 64 bit U74 SiFive risc-v cores and one 64 bit S71 SiFive risc-v core that was detailed last year here:
https://www.opennetworking.org/wp-content/uploads/2019/09/5.30pm-Richard-New-FINAL-1.pdf
Uncore work here state target is 8 core per chip. Also 2 socket setup of Uncore can join up with each other without a chipset.
Interesting that this design makes the chipset a network switch chip. Also take close note of the setup photo on the omnixtend-fabric-innovation-with-risc-v that only one module has a SD flash. That right both modules are in fact booting from the single storage device you are looking at one horrible messy single computer.
The uncore proposed final design in the pdf PCIe 5.0 with OpenCAPI and 100 Gbps or better networking running omnixtend along with the current best Sifive cores by at least 8. This is per chip features. This kind of design does not make sense for anything other than compute.
High performance compute that a question. Really Arm moonshot was not for high performance compute either instead of mass volume compute. The design uncore may be better at first for massive volume compute. Same issue as clusters.
omnixtend for what it is aligns with HP moonshot in Arm history. Yes you were not thinking that omnixtend has it own core/silicon design.
Ah, OK. Western digital are trying to promote their own fabric (so they can sell more storage for their fabric); couldn’t care less what the CPU is (as long as it helps them sell their fabric and their storage), doesn’t care if it’s a “hetorogeneous” mixture of very different CPUs (e.g. a mixture of Risc-v and ARM and 80×86) as long as it helps them sell their fabric and their storage; and chose a random CPU from some other company because they don’t care what the CPU is; and somehow you think “not caring what the CPU is at all” implies “Risc-v is going to conquer the world”?
One thing we learnt from ARM is that (for desktop and server) the ISA/instruction set is irrelevant on its own, and needs to be part of a standard platform. For better or worse, ARM (ThunderX, etc) mostly just stole everything from PCs and adopted “UEFI + ACPI + PCI” as the platform so they could become (barely) relevant beyond embedded systems. Risc-v hasn’t reached this point yet (they’re mostly just CPU designers with no clue about creating a viable platform) and can’t be taken seriously until they do. That is what I’m looking for – evidence that Risc-V is becoming more than just an instruction set.
Brendan really you are trying to make points and completely failing.
–Western digital are trying to promote their own fabric (so they can sell more storage for their fabric); couldn’t care less what the CPU is (as long as it helps them sell their fabric and their storage), doesn’t care if it’s a “hetorogeneous” mixture of very different CPUs (e.g. a mixture of Risc-v and ARM and 80×86) as long as it helps them sell their fabric and their storage; and chose a random CPU from some other company because they don’t care what the CPU is; and somehow you think “not caring what the CPU is at all” implies “Risc-v is going to conquer the world”?–
One question how are items not designed for OmniXtend fabric going to connect to OmniXtend fabric. Do notice the uncore do have PCIe support. Also moonshot that starts off arm server work in a big way also did not care if it was a heterogeneous mix of x86 and arm to be correct atoms x86 and arm 32bit EnergyCore arm. Early x86 cluster solutions did not care about being only x86 either. There is a repeating history of a new competitor entering the large server market starting off with a fabric/cluster solution and the solution being hetorogeneous so that legacy can be slowly phased out.
Also you have never bothered checking out what the Weston Digital OmniXtend project in fact covers its more than just the fabric.
–For better or worse, ARM (ThunderX, etc) mostly just stole everything from PCs and adopted “UEFI + ACPI + PCI” as the platform so they could become (barely) relevant beyond embedded systems.–
https://riscv.org//wp-content/uploads/2019/06/13.30-RISCV_OpenSBI_Deep_Dive_v5.pdf
Western digital has already past that point and they are not going the route of just steal UEFI+ACPI+PCI. Yes they already have UBOOT UEFI.
–Risc-v hasn’t reached this point yet (they’re mostly just CPU designers with no clue about creating a viable platform) and can’t be taken seriously until they do. That is what I’m looking for – evidence that Risc-V is becoming more than just an instruction set.–
Western Digital is working on complete top to bottom solution and is working well past just Risc-V being instruction set. Western digital is interested in have complete Linux systems top to bottom all Risc-V..
https://www.westerndigital.com/company/innovations/risc-v
Go down to “Open Source RISC-V Software” mouse over the image in that section now notice the alt text/title.
“OmniXtend Implementation Risc-V WesternDigital”
Western Digital under OmniXtend is serous about making a fully functional Risc-V platform. Western Digital has already put down more key parts ARM (ThunderX, etc) had at the time of moonshot.
Using a bridge (e.g. from AMD’s hyper-transport; or Intel’s Omnipath, Quickpath or DMI; or Infiniband or PCI or Thunderbolt), or with whatever any designer of any chip felt like adding to their chip.
There’s significantly more history of “moonshot” projects failing to become viable products.
LOL. “Uboot + UEFI ported from EDK2” just means that scraped up some open source stuff because they have nothing, couldn’t be bothered designing any specifications themselves, and couldn’t be bothered writing any software themselves. Note that Uboot is a boot loader typically used by embedded systems, and EDK2 originated as “Intel abandonware” about 20 years ago.
Don’t you think it’s strange that this is coming from a single vendor (Western Digital); and not coming from RISC-V International as an official standard for all vendors?
Erm, no?
Western Digital are a company that provides storage devices (mostly old mechanical “rotating disk” hard drives, but also enterprise scale storage); that started worrying about how they’re going to survive when everyone is switching to SSD, panicked, and have been grasping at straws (cloud? TV? IoT?) for the last 5 years or so (while trying to buy out every SSD company they can get their hands on). The only thing they’re serious about is throwing mud at the wall to see if anything sticks. Underneath their “risc-v marketing hype” is a layer of wishful thinking floating on the vapor of a long-term pipe dream.
–Using a bridge (e.g. from AMD’s hyper-transport; or Intel’s Omnipath, Quickpath or DMI; or Infiniband or PCI or Thunderbolt), or with whatever any designer of any chip felt like adding to their chip.–
To interface with AMD hyper-transport or Intel Omnipath or DMI or Infiniband you have to pay those patents with the USA restricting tech those may not be an option at all.
The reality is a lot of fabrics are not usable by third parties. PCIe is but it is not design for L2 cache level interface.
–There’s significantly more history of “moonshot” projects failing to become viable products.–
This is true but you cannot get to large server class without a fabric and its a particular class of fabric a L2 fabric. Intel and AMD are not going to be sharing there L2 fabric. The original x86 Linux based computer cluster by network disappeared as well over time. But these failure started the development that lead to viable products latter.
–LOL. “Uboot + UEFI ported from EDK2” just means that scraped up some open source stuff because they have nothing, couldn’t be bothered designing any specifications themselves, and couldn’t be bothered writing any software themselves. Note that Uboot is a boot loader typically used by embedded systems, and EDK2 originated as “Intel abandonware” about 20 years ago.–
Uboot is also used in ARM based servers as the common booting solution . Typically used in embedded systems miss that over 80% of arm servers in production also use Uboot. Exactly why be completely redesign stuff from scratch when you don’t have to. Redoing everything from scratch would add many years before Risc-V could be production ready.
https://riscv.org//wp-content/uploads/2019/12/Summit_bootflow.pdf
The reality here OpenSDI had to be written from scratch for risc-v and this replaces ATF BL31 for arm . Yes there are intel and amd equals as well.
–Don’t you think it’s strange that this is coming from a single vendor (Western Digital); and not coming from RISC-V International as an official standard for all vendors?–
Not really thinking the work from Western Digital is being placed in the risc-v foundation when it comes to platform.
OpenSDI being done by Western Digital as lead is host by the risc-v foundation and is a agree on solution. So what Western Digital is making end up coming from RISC-V International as official standard.
Western Digitial is one of the core members of Risc-V foundation. They are working on platform. Other parties are working on making faster risc-v cpus.
–Underneath their “risc-v marketing hype” is a layer of wishful thinking floating on the vapor of a long-term pipe dream.–
You missed something its not marketing hype when a Western Digital is releasing completed parts to make the platform then getting Risc-V International members to agree off on them. There is not unlimited number of parts to complete to have a production usable platform.
It might be “less of an option” for some people; but the extra cost of licencing/patent fees (if there is any) is likely to be negligible compared to R&D, validation, marketing, etc; and doesn’t apply at all if it’s the same company (e.g. Intel wouldn’t care about paying Intel). Also; if you’re doing something that is going to improve a company’s sales/profit (e.g. you’re creating a bridge so that they can sell more CPUs) they’ll probably be willing to negotiate a “zero cost” licence anyway.
The reality is that the idea of “main memory that’s shared by many CPUs” is extremely idiotic because it means a huge amount of traffic from everywhere has to be handled by a single controller that becomes a performance disaster/bottleneck. This is why every larger system has shifted to NUMA for main memory (despite also having L3, L2 and L1 caches to reduce the traffic to/from main memory).
Western Digital’s worthless marketing nonsense does say “common memory”; but I’d interpret that as “common pool of non-volatile memory for storage”, because (even though Western Digital have no experience in compute whatsoever) it’s too hard to believe that they they’re stupid enough to mean “common main RAM” (and far more plausible that a company that specializes in storage are talking about storage).
Uboot was a common booting solution for ARM embedded systems; and because of that some ARM servers still offer Uboot as an alternative to UEFI. Most seem to be using proprietary UEFI firmware from AMI.
I don’t know how you define “ARM servers in production”; and given that it’s almost impossible to find an ARM server used in production outside of “large enough to do it all themselves and ignore all standards” companies (like Amazon) it’s difficult not to be skeptical of the “80%” claim’s relevance.
OpenSDI is an open source implementation of Risc-v Foundation’s Supervisor Binary Interface specification; which is an interface used for hypervisors/virtual machines and isn’t relevant for real hardware.
Western Digital is not releasing completed server parts.
They’re releasing parts for embedded systems (e.g. their Risc-V CPUs); and working on a fabric that might end up being used in their own “enterprise scale storage” products if it’s ever used at all (which probably would end up using embedded Risc-V CPUs to control/arrange data transfers as part of their “software defined storage” features).
Note: For some historical context; about 10 years ago HP started researching a “global memory shared by many computers over (optical fiber) fabric” architecture that they called “The Machine”. They were full of optimism (hype and bullshit) for about 10 years, then quietly scampered away to hide in a corner never to be heard from again. I got a major “Deja Vu” feeling when reading (parts of) Western Digital’s OmniXtend marketing hype.
–It might be “less of an option” for some people; but the extra cost of licencing/patent fees (if there is any) is likely to be negligible compared to R&D, validation, marketing, etc; and doesn’t apply at all if it’s the same company (e.g. Intel wouldn’t care about paying Intel). Also; if you’re doing something that is going to improve a company’s sales/profit (e.g. you’re creating a bridge so that they can sell more CPUs) they’ll probably be willing to negotiate a “zero cost” licence anyway.–
Nvidia and many other parties has tried to negate for access the Intel and AMD low level fabrics. This is what lead to OpenCAPI on power and still not supported by AMD and Intel. Both AMD and Intel treat their own low level fabrics as something to be protected heavily they cannot have the other one using the improvements to the fabrics they have done. This is a area that you cannot just buy from AMD or Intel as you will get zero cooperation.
–The reality is that the idea of “main memory that’s shared by many CPUs” is extremely idiotic because it means a huge amount of traffic from everywhere has to be handled by a single controller that becomes a performance disaster/bottleneck. This is why every larger system has shifted to NUMA for main memory (despite also having L3, L2 and L1 caches to reduce the traffic to/from main memory).–
Exactly when does this design say handled by a single controller. Its based on Omnixtend is based Tilelink and Tilelink avoids that single controller bottleneck. Omnixtend is based around the same as Tilelink that multi controllers.
Even NUMA system needs a fabric to share memory between many CPUs. NUMA is a optimization to reduce fabric usage but does not remove you need for a good fabric. Atomic memory operations between CPU you have not considered either yes even a NUMA system needs these. Do you want to have the fabric for Atomic memory operations connected to L3 or will you want it connected to L2(L1 is out of course).
Intel and AMD fabrics today with x86 NUMA systems are layer 2 cache fabrics not layer 3 cache. Please note being a L2 fabric does not mean you lose L3 cache. Layer 2 fabric catches the L2 miss then looks if that is going to miss L3 if it going to miss L3 start request out to fabric early. Yes atomic operations between CPU you don’t really want to have to travel to L3 before you can perform a memory atomic operation. This early request to fabric is important because fabric is not exactly fast so you have latency to start off with..
https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/11
AMD one is noted here that L3 is being filled by L2 victim requests not by L3 Miss. If your fabric is at L3 it working on L3 miss. So to have Intel or AMD competitive fabric it has to be connected to L2 Miss so a L2 level fabric same with the atomic operations they are not L3 and you need to spread atomic operations between cores.
–it’s too hard to believe that they they’re stupid enough to mean “common main RAM” (and far more plausible that a company that specializes in storage are talking about storage).–
The functional prototype by Western Digital Omnixtend is common cache. Omnixtend a expand of TileLink yes Tilelink is purely about CPU caches. So just because Western Digital is a storage company does not mean everything they are design is storage.
–Uboot was a common booting solution for ARM embedded systems; and because of that some ARM servers still offer Uboot as an alternative to UEFI. Most seem to be using proprietary UEFI firmware from AMI.–
Uboot is also used as the common way of doing UEFI on arm. So proprietary UEFI is not that common.
–OpenSDI is an open source implementation of Risc-v Foundation’s Supervisor Binary Interface specification; which is an interface used for hypervisors/virtual machines and isn’t relevant for real hardware.–
https://github.com/riscv/opensbi
1. A platform-specific firmware running in M-mode and a bootloader, a hypervisor or a general-purpose OS executing in S-mode or HS-mode.
2. A hypervisor running in HS-mode and a bootloader or a general-purpose OS executing in VS-mode.
That a OpenSDI is not for a hypervisor or virtual machines. Really its like saying a bios firmware for X86 is for hypervisors because hypervisors interface with the bios at start up. M-mode is not the mode hypervisors or OS run in. M mode is machine mode on Risc-V. S mode is what a general OS runs in and HS is what a hypervisor runs it.
Other parts Weston digital is working on is for hypervisor/virtual machines.
–Note: For some historical context; about 10 years ago HP started researching a “global memory shared by many computers over (optical fiber) fabric” architecture that they called “The Machine”. They were full of optimism (hype and bullshit) for about 10 years, then quietly scampered away to hide in a corner never to be heard from again. I got a major “Deja Vu” feeling when reading (parts of) Western Digital’s OmniXtend marketing hype–
HP The Machine was a operational prototype 3 years ago 2017. The fabric here is doomed. Its not even a layer 3 fabric. HP The Machine 2015 they dropped the fiber and dropped the massive processing side of the platform because they could not work out how todo it. Yes HP started with a lot of optimism with unproven tech and run into a huge stack of IP walls. HP had the idea they would demo the idea then ARM, Intel and AMD would come in. HP has not been in the custom silicon game in any major way from before they started HP The Machine project. This is a major difference with Weston Digital for all their hard drives they have been doing custom silicon chips.
Risc-V functional Tilelink that OmniXtend is based on gets interesting. Tilelink has lot of the same objectives as the HP The Machine had at the start so you should see some similarities of course without the massive wishful thinking on tech. Tilelink is design to avoid arm cache sync issue. There have been many functional Tilelink risc-v chips produced. OmniXtend is about using what Tilelink solution on silicon achieved and allow it to extend between chips.
https://www.globenewswire.com/news-release/2018/09/24/1574672/0/en/Barefoot-Networks-Teams-Up-with-Western-Digital-and-Universit%C3%A0-della-Svizzera-italiana-to-Showcase-In-Network-Consensus-for-Use-with-Storage-Class-Memory-and-P4.html
You are partly right that Western Digital being a storage company would have started with storage. Yes the early prototype was not a L2 cache integrated it was like how to-do ram based swap over network.
HP of their optical idea with multi cores on a shared main memory storage never made a functional prototype. Western Digital has a functional prototype on fpga of OmniXtend and the stuff to program the fpga is on github. So if you have the fpgas you can download it and try it out. Western Digital is not marketing hype alone as they have really providing functional prototype so its now from prototype to production stage.
Of course there is no point taking a platform into production without a platform to run on it. So Western Digital has crossed the first major barriers.
oiaohm,
Brendan is right, shared memory is a fundamental bottleneck. You can only scale so much using shared resources, beyond which you’re just chasing marginal returns for exponential effort and cost. We need to change our ways and stop this pretense that memory ought to be sharable between nodes of arbitrarily numerous cores. Of course we can and should tell software developers to deliberately avoid shared resources in order to improve scalability, but then if we do that in earnest, then what’s the point in putting more and more resources into the fabric to provide shared resources to huge numbers of cores? Especially now that we’re looking at CPUs with hundreds of cores on the horizon, the technological and financial costs of such fabric are too high. The hardware should encourage more scalable designs, which fundamentally requires locality.
IMHO a lot of this investment should be placed into low latency network technology instead without insisting on shared memory designs that can’t scale. I’m sure you’ll disagree with my opinion, and that’s fine, we can just agree to disagree.
That’s the Apple curse right there: The moment Apple commits to a new architecture it somehow becomes poisoned. Fortunately they have a backup plan, which is to keep Intel.
Funny thing is that Apple jumped into the Intel wagon the moment the Core 2 Duo processors got released and AMD started faltering, thus giving Intel a near-monopoly, especially in the all-important laptop segment, and now they are exiting Intel just when competition is about to heat up for good.
Yeah, x86 really went down the drain in 2006, oh wait…
x86 didn’t went down the drain in 2006, but it became much more expensive thanks to Intel’s soon-to-be dominance.
“That’s the Apple curse right there: The moment Apple commits to a new architecture it somehow becomes poisoned. Fortunately they have a backup plan, which is to keep Intel”
That comment unfortunately bears no relation to reality. Apple has a perpetual license to the ARM instruction set and designs its own custom silicon, and is thus unaffected by this deal except in so far as it might make life difficult for Apple’s competitors. Because Apple not only designs its own silicon but does so producing competition beating performance on an annual iteration schedule. chip wise things have never looked better for Apple. Once they really hit their stride in desktop ARM chip and system design they will leave Intel in the dust. Every time my Mac’s fans come on while rendering 4K video I yearn for a nice quiet cool ARM system. Not long to wait now.
Strossen,
That comment unfortunately bears no relation to reality. Apple has a perpetual license to the ARM instruction set and designs its own custom silicon, and is thus unaffected by this deal except in so far as it might make life difficult for Apple’s competitors.
Out of curiosity, have details about apple’s ARM agreement been made public or is this an assumption? Can you link to it?
I’ve been watching the benchmarks, while singlethreaded is great, the multithreaded performance has been rather poor so far at least compared to other high end CPUs. This may change, but I find it’s best to wait for the data before making conclusions.
Well, the current Apple’s SoC only have 2 big cores, so they’re not going to fare well against 6+ core parts.
But the single performance thread is there, the MT scaling is pretty straightforward with increases in # of cores these days.
I’m really curious to see what will be from the Chinese ARM IP licensees like Rockchip, Huawei and Allwinner.
For the rest, nothing will change. NVIDIA is not that stupid to mess with a team that is winning.
They are very aware that if Apple really pull the trick of matching x86-64 performance, a whole new market will appear, and they will fulfill their ages old dream of manufacturing a desktop CPU, since Intel always refused to license x86 IP to them, and even pushed Nvidia’s x86 chipset out of the market with litigation after litigation.
Seems very risky.
Why would anyone buy a company? Three reasons come to mind:
a) For future earning potential
b) To gain knowhow and a working team
c) To gain customers
nVidia is doing fine on its own, and even though ARM is profitable, I am not sure ARM is more profitable than nVidia, so (a) is debatable at best.
nVidia already had chip development knowhow, and targets many markets. And if they wanted to acquire teams, buying out several small startups would be a better idea. So (b) is definitely out.
So that leaves (c) as the likely cause. They would very much benefit from cross sales to existing ARM licensees. They already have an ARM SoC (Tegra) and could easily improve sales by replacing other ARM GPU pairs. That is what makes the entire deal dangerous. It could easily lead to a mono-culture on the mobile GPU front.
The thing is that in the mobile space, NVIDIA’s GPUs kind of suck compared to Apple and Qualcomm.
I think NVIDIA just wants a nice IP business, and they will not mess with a good thing.
But their big push is in automobile and data center, so they may want to have a full CPU/GPU stack for there. If anything the one vendor that should be concerned is intel or AMD, as that would allow NVIDIA to ditch their dependence on x86 systems for their data center products.
javiercero1,
I don’t do much in the mobile space, but on the PC side I don’t have many complaints about the hardware. On the other hand I absolutely hate nvidia’s proprietary driver situation. The datacenter prohibition (with an exception for blockchain) in the license agreement sucks.
I agree, this acquisition will help nvidia round out its portfolio but I don’t see it fundamentally changing their business model.
The data center clause is pretty standard in the industry.
If you’re a business and you can’t afford actual carrier-grade stuff in your data center, you have some other issues to worry about.
There’s still a metric boatload of regulatory hurdles to clear, so I don’t know if this acquisition will be a done deal.
javiercero1,
Some customers need titan cards, great…let them buy ’em. But not everyone needs or can afford that level of hardware. It’s clear that the consumer cards were getting too powerful for nvidia’s liking such that they felt the need to add the no-data center clause, but make no mistake it doesn’t negate the technical merits of using lesser cards in a datacenter application. These artificial license restrictions are all about protecting nvidia’s titan market and not about the suitability of the hardware.
For medical and life saving applications sure that’s needs to be certified, but otherwise it’s kind of ridiculous. The fact is that nvidia specifically approves of 24/7 blockchain applications that push the cards to the extreme in a datacenter setting proves that it’s not about nvidia’s concern for the hardware….this is entirely about employing license restrictions to preempt high end consumer cards taking over titan marketshare.
Of course I understand why nvidia does it, but it’s a case where more competition would help curb nvidia’s dominance.
A datacenter that has to operate with consumer GPUS like that is not a “data center” as much as a 2 bit operation. So you can run whatever you want, not like NVIDIA is going to check.
As I said, these clauses are very standard. It’s about covering their ass from litigation. Consumer GPUs lack things like ECC, and are not certified for those use cases.
javiercero1,
You may not realize how many smaller companies like mine colocate our servers in a datacenter. And while I know NVIDIA’s not going to check, it’s still against the license.
You say that, but by and large most hardware does not have data center prohibitions. If you need ECC, then buy it, if need RAID, then buy it, if you need SAS, then buy it, if you don’t then don’t. Nobody cares including most hardware vendors. Even with nvidia chips you are allowed to do it, rather it’s the software license that started to prohibit data center applications (except for blockchain, which is allowed in data centers even with consumer cards). That’s why it’s so ridiculous.
Correction: I keep saying “titan” when in fact nvidia only allows “tesla” cards to be used in enterprise deployment. The license explicitly allows any card to be used in data center deployments as long as it’s used for blockchain processing.
https://www.digitaltrends.com/computing/nvidia-bans-consumer-gpus-in-data-centers/
What many people do not get in this situation is the economics of consolidation.
Corporate giants look as some income stream and realize they can make a killing at the end-users expense just by adding 1c per part here ot there. Let’s say they make a “gadget” and then gain control of a supply stream covering hundreds or even thousands of companies. They announce 1c per part price rises here or there and it goes mostly unnoticed. But you have to consider when they have that level of consolidation that 1c is an indivisible unit, so as the barest minimum 1c on every part becomes tens or hundreds of dollars of income from a single assembly line.
How many bits and pieces get the 1c price rise is like a throttle on the companies earnings, but ultimately it comes straight out of your pocket!
This is why Nvidia’s buying of Arm is bad as an example of consolidation.
Alfman
–Brendan is right, shared memory is a fundamental bottleneck.–
He is right and wrong. This solution is sharing cache level. Not direct main memory per CPU.
–You can only scale so much using shared resources, beyond which you’re just chasing marginal returns for exponential effort and cost. We need to change our ways and stop this pretense that memory ought to be sharable between nodes of arbitrarily numerous cores. Of course we can and should tell software developers to deliberately avoid shared resources in order to improve scalability, but then if we do that in earnest, then what’s the point in putting more and more resources into the fabric to provide shared resources to huge numbers of cores?–
The problem here is Amdahl’s law as there is a limit to how fair parallel a software developer can make the code. The Amdahl’s law means you have no choice but to deal with shared memory/data between cores.
–Especially now that we’re looking at CPUs with hundreds of cores on the horizon, the technological and financial costs of such fabric are too high. —
Your numbers are a order of magnitude out at least. 1024-core 64-bit RISC-V chip was tapped out in 2016. Yes that chip in 2016 was a Risc-v Tilelink with some very interesting features. It was a token ring style Tilelink and for 1024 cores it only had 1 L3. Turns out did not perform badly. Tilelink between L2 and L3 is a multi cast solution this means if 2 cores L2 need X information at the same time from L3 under Tilelink it sent once from L3 then multi cast. Yes new Tilelink has a switch on the silicon that very much does the same as a network switch just for traffic between L2 to L3.
Yes the Tilelink design reduced the silicon area you need for the same performance and reduces stress on the MMU.
–IMHO a lot of this investment should be placed into low latency network technology instead without insisting on shared memory designs that can’t scale. I’m sure you’ll disagree with my opinion, and that’s fine, we can just agree to disagree.–
OmniXtend is using low latency network technology that is what you P4 switches are. This reduces the amount of custom silicon and removed need for custom vendor cables. Yes it possible that the chipset chip on a future OmniXtend system is just a P4 switch. Yes the idea that shared memory cannot scale as a reason not to-do it is ignoring the fact Amdahl’s law says you have to.
If you do not shared memory for attomics at L2 as OmniXtend does you are then forced to do it over L3 possible in the PCIe bus or worse on a network card in the PCIe bus. We are talking latency increases over 3 times so giving you in worst cases. So we are not talking “marginal returns” doing something like OmniXtend.
https://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect
Yes Intel QuickPath Interconnect is one of the competitors to Omnixtend and QuickPath really not great. Yes your atomic operations have to go through the MMU and the L3. So you have nicely NUMA your workload with multi processes and threads yet your atomic operations that you have do due to Amdahl’s law limitation on how much can be parallel.
The more you NUMA your workload to reduce main memory sharing the more the bottleneck from Amdahl’s law on stuff that cannot be parallel comes in your face needing atomic operations and these are needing to share memory quickly to the CPU. Now that information atomically shared between caches of the cpu is normally that volatile that it pointless sending it to the L3 or MMU if you don’t have to. Fun part is risc-v being tilelink reach from something connect to L2 sending to MMU is in fact straight forwards just push it into the tilelink with directive go to MMU.
So MMU to MMU copy between nodes OmniXtend is competitive to Intel QuickPath in a token ring configuration. Remember OmniXtend has option of switched Intel QuickPath does not. In your atomics for your locking of operations between boards OmniXtend with Risc-V is superior.
Fun part Intel QuickPath is custom cables and that you have to use Intel. The big thing about OmniXtend is how much is stock off the shelf parts as in network switches, network cables… the custom stuff will be in the boards not in your spare cables or switches pile.
Please also note Quickpath mandates you have a MMU with memory. OmniXtend technically can function without ram connected to the MMU on node(Yes playing with the fpga) not the best speed of course but it will work. Yes you can share main memory over the OmniXtend link not that is recommend. Can you get more performance with some workloads running that way yes. This might be important if your super computer gets a supply of dead ram and you don’t have enough to go round.
oiaohm,
If it wasn’t already obvious shared PCI busses don’t scale just as shared memory doesn’t scale, those are fundamental bottlenecks. Massive scalability depends on highly localized resources and no amount of yapping is going to change that.
Amdahl’s law is valid, however it doesn’t negate the fact that shared resources don’t scale well and you are getting marginal gains for exponential effort. Also, it’s a tad ironic to bring up Amdahl’s law when you are the one arguing for hundreds or thousands of cores, haha.
Amdahl’s law focuses on the limits of sequential programming, but it’s important not to overstate it given that so many of our programming challenges today actually fit in the embarrassingly parallel category, which is the exact opposite. We’re only scratching the surface of parallel programming’s potential in areas like AI. Heck the human brain itself is an example of massively parallel computation. You can forget about that level of parallelism using shared memory CPUs. We need to design our future processors with an eye to what’s going to physically scale and not to insist on legacy crutches like shared memory across all cores.
So again, let’s just agree to disagree
–If it wasn’t already obvious shared PCI busses don’t scale just as shared memory doesn’t scale, those are fundamental bottlenecks. Massive scalability depends on highly localized resources and no amount of yapping is going to change that. —
You are just not getting we don’t have a choice but to-do shared memory. You don’t have the same number of MMU to cores.
Tilelink setup still has highly localized resources in L2.
–Heck the human brain itself is an example of massively parallel computation. —
The human brain is a mix of parallel and single thread tasks MRI studies have found single. Interesting enough when you look how human brain transfers data round the brain multi-casts. So there is more to the human brain than massive parallelism. Part of the human brain is a highly optimized multicast fabric .
Having a L3 per 4 cores in 64 core processor this gives you 16 different items and if you have 8 MMU channels. See problem you can now have 16 different items fighting over 1 MMU. Now if you take that up to a 1024 cores 256 L3 and about 8 MMU channels is what you can practically do per chip you are now needing to share that resource between massive number of parties..
Tilelink is fairly major break though its counter to the highly localized resources. The highly localized resources is what lead to the 4 cores to 1 L3. That idea stop scaling when you cannot add more MMU units to keep the MMU to L3 ratio sane. The 1024 core Tilelink risc-v chip by properly implementing a fabric that you need less L3 to get the same performance and you do not need as highly localized as it first appears. Also multi casting is absolutely important.
There is more to massive Scalability than just highly localized resources, Scalablity does depend on being efficient in transport this does not matter if you are talking about a computer or a factory. Yes a factory can just duplicate production lines and end up not gaining performance due to be bottle necked moving supplies around.
Heck if you picture a computer as cubical where each core is a cubical now you are delivering paper work around the office does it make any sense to have 1 person per 4 cubical doing that. That right you have the mail cart driving around the cubical picking stuff up and dropping stuff off(that fairly much what Tilelink is up to with de-duplication in multi cast).
oiaohm,
Sorry man, but I do get it. What you aren’t getting is that it won’t scale well regardless, you’re going to need exponential resources just to get marginal gains. Them’s the facts
Feel free to cite some papers if you want to discuss this. Obviously the brain is a neural network that has both width and depth. It’s worth noting that brain neurons are relatively slow, so evolution would tend to prefer faster reaction times with wide parallel networks over deep sequential ones. To have 1/4 second reaction time, the sequence couldn’t be much over 50 neurons deep
http://thephenomenalexperience.com/content/how-fast-is-your-brain/
In principal you can throw more and more resources at fabric to implement massive end-to-end shared memory processors, but 1) actually using it is going to be a bottleneck regardless, 2) more and more of the die needs to be dedicated to solving this problem with less doing useful work. Many people like you don’t want to change, but like it or not it doesn’t matter what you say because physics is going to be the ultimate decider.