Intel claims other chips also affected by design flaw

Thom Holwerda 2018-01-03 Intel 77 Comments

Update: Google’s Project Zero disclosed details about the vulnerability a week ahead of schedule due to growing concerns, and they indeed confirm AMD and ARM processors are also affected:

The Project Zero researcher, Jann Horn, demonstrated that malicious actors could take advantage of speculative execution to read system memory that should have been inaccessible. For example, an unauthorized party may read sensitive information in the system’s memory such as passwords, encryption keys, or sensitive information open in applications. Testing also showed that an attack running on one virtual machine was able to access the physical memory of the host machine, and through that, gain read-access to the memory of a different virtual machine on the same host.
These vulnerabilities affect many CPUs, including those from AMD, ARM, and Intel, as well as the devices and operating systems running them.

Intel just published a PR statement about the processor flaw, and in it, it basically throws AMD and ARM under the bus. According to Intel, reports that only its own processors are affected are inaccurate, namedropping specifically AMD and ARM just to make it very clear who we’re talking about here. From the statement:

Recent reports that these exploits are caused by a “bug” or a “flaw” and are unique to Intel products are incorrect. Based on the analysis to date, many types of computing devices – with many different vendors’ processors and operating systems – are susceptible to these exploits.
Intel is committed to product and customer security and is working closely with many other technology companies, including AMD, ARM Holdings and several operating system vendors, to develop an industry-wide approach to resolve this issue promptly and constructively. Intel has begun providing software and firmware updates to mitigate these exploits. Contrary to some reports, any performance impacts are workload-dependent, and, for the average computer user, should not be significant and will be mitigated over time.

More to surely come.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

77 Comments

2018-01-03 9:11 pm
Artem S. Tashkinov
If we take AMD’s response into consideration (that their CPUs are not affected) then Intel should expect a slander civil lawsuit. Someone in Intel’s PR department should be taught not to throw baseless accusations at your competitors.
Yes, ARM64 is also affected but the extent of the problem is not known yet.

2018-01-03 9:37 pm
Vanders
The statement had the intended effect: Intel’s share price rallied a little (after dropping this morning) and AMD’s dropped (after climbing all day).
Now all Intel have to hope is that the statement is entirely factual and that the bug isn’t as bad as everyone now seem to think, because otherwise that could be a misleading statement…
2018-01-03 10:05 pm
Delgarde
Actually, they state that processors from many other vendors are susceptible, but they’ve not actually named any of them in the context of that statement â€“ AMD and ARM are specifically mentioned only in the second paragraph, talking about other companies Intel are working with to resolve the issue.
So quite a neatly worded statement, really. They carefully avoid making any claims about AMD and ARM vulnerability, but by mentioning them in the statement, they encourage people to make the association themselves.

2018-01-04 5:44 am
flanque
Yes, this was quite clever and obvious, though on the other hand, I thought I missed something with Thom’s under the bus comment. I guess not.
2018-01-04 4:26 pm
galvanash
I would also stress to people reading this – it isn’t an issue of Intel vs AMD or x86 vs ARM or anything like that…
Yes, Intel processors exhibit a rather aggressive form of speculation that AMD processors do not in a very specific usage scenario that makes them more susceptible to a very specific form of this attack. This particular behavior is not the root cause of the problem though – the root cause is simply a result of how all modern processors work.
In hindsight, and probably due more to luck than intent, AMD ended up with a slightly more resilient implementation of a very very specific thing. Problem is deep down all processor end up doing what is really causing the problem – they execute code speculatively and they currently can’t hide all of the effects of this. The flaw in Intel’s design is not the only way to crack this egg, there are many and more will surface over time…
Everyone is going to have to go back to the drawing board so to speak and work this out… It is a very big problem and it effects the entire industry, not any particular vendor. I think it can be fixed in the long term, and future CPUs will address it on a fundamental level and correct it, but for the time being its all going to be duck tape and bubble gum for everyone…
Hiding kernel page tables just addresses one specific (and very dangerous) form of attack, it doesn’t actually fix anything long term…

2018-01-05 6:37 pm
dionicio
Thanks Galvanash…
Knew six months amounts to bubble gum at this fundamentally wrong path on multi-tasking architecture.
Erlang language could spark better ideas, if translatable to hardware.
On the chipset this philosophy already being inspiration. Hubs, they’re called.
2018-01-05 6:55 pm
dionicio
“- the root cause is simply a result of how all modern processors work. ”
How Big Is This Issue With Cell Architecture?

2018-01-05 11:58 pm
JLF65
Not very. The types of speculation that cause most of the problems for other processors just cause pipeline stalls on the PPE. The only attack that looks possible on CBEA is gaming the branch address prediction. The PPE has a link stack (4 entries) to help speed branch to link instructions, so it MIGHT be possible to do one form of spectre. However, the link stack can be disabled via the HID1 register, so this attack can be rendered null quite easily.
2018-01-07 4:59 pm
dionicio
Thanks JLF65. Even if final solution being totally disabling speculation at critical servers, performance hit will not be that big.
2018-01-07 5:18 pm
zima
Is Cell even used in any notable numbers at “critical servers”?… Anyway, more likely to perform server tasks (and to be hit by this bug) is the main PPC core, not PPEs.
2018-01-07 9:46 pm
JLF65
The PPE IS the main PowerPC core. The SPE is the vector accelerator. They stripped a lot of the complexity from the PPE to get the gates to make the SPEs, stripping most of the out-of-order support, as well as speculation. Instead, the compiler is supposed to handle ordering the instructions to fill pipes, provide hints in the instructions for branch prediction, and stall the pipes rather than speculate. It made the PPE much more simple, and could be clocked at a rather high clock (for the time). Of course, it made efficient code much harder to write – you were back to an era where programmers hand-coded critical routines with a careful eye on filling the pipes and what would cause stalls.
But you do have a point – while there were/are companies making CBEA blades for servers, I doubt they’re a significant percentage. But hey, maybe IBM can interest people in an updated CBEA with all the on-going issues over spectre/meltdown.
2018-01-07 11:01 pm
zima
Oops, I confused the terms / I guess it’s been too long since Cell was in the spotlight.

2018-01-04 5:12 am
galvanash
If we take AMD’s response into consideration (that their CPUs are not affected) then Intel should expect a slander civil lawsuit. Someone in Intel’s PR department should be taught not to throw baseless accusations at your competitors.
As stated elsewhere, Intel never even mention AMD by name and were careful in how they worded their statement – good luck with that lawsuit…
Regardless, it doesn’t matter. AMD processors are definitely affected, and considering the technical details of the attack, any processor that implements any form of speculative execution (which is basically anything remotely modern) is probably susceptible to some form of this attack if it has a cache or some other resource that can be used to perform timing checks. Its just a matter of time really.
Worst-Cause-Scenerio – Every byte of address space in a computer system could be read at any time by a low privilege process. Memory protection is effectively dead. The only real fundamental fix is to eliminate speculative execution entirely (and that would be very very very VERY bad for performance)…
This is a doozy folks. The fix everyone is concerned about doesn’t even begin to cure this issue, it is just emergency treatment for the worst of the immediate bleeding. This is probably going to get worse over time as the bad guys figure out a bazillion ways to exploit this.

2018-01-04 6:19 am
Alfman verbose=1
galvanash,
Worst-Cause-Scenerio – Every byte of address space in a computer system could be read at any time by a low privilege process. Memory protection is effectively dead. The only real fundamental fix is to eliminate speculative execution entirely (and that would be very very very VERY bad for performance)…
This is a doozy folks. The fix everyone is concerned about doesn’t even begin to cure this issue, it is just emergency treatment for the worst of the immediate bleeding. This is probably going to get worse over time as the bad guys figure out a bazillion ways to exploit this. [/q]
These problems do run deep, but if we try to tackle the problems one at a time and gut some of the conventions that got us here in the first place, it may be salvageable.
x86 caches are problematic because the same cache lines are shared across security boundaries. That needn’t be the case. Furthermore the whole motivation for merging kernel address space into user address space had to do with the fact that it avoided expensive TLD cache invalidation on every syscall, but this is a limitation of x86 (and other) CPUs, not something that’s strickly necessary. The sparc architecture offered an alternative design that did not require invalidation across context switches.
http://www.informit.com/articles/article.aspx?p=1218201&seqNum=4
[q]The SPARC has a cache of virtual to physical mappings, just as x86 does. This is called a translation look-aside buffer (TLB). In the case of the SPARC, each entry has a process ID associated with it, so the buffer doesnâ€™t have to be flushed when a new process runs. Unlike x86, the SPARC is unaware of the structure of the page tablesâ€”theyâ€™re entirely the operating systemâ€™s responsibility. Whenever an address is accessed that isnâ€™t in the cache, a page fault is issued, and the OS must provide the correct mapping.
I think the oracles used in “spectre” style attacks will be the most invidious because the statistical analysis can be applied on so many levels. Still, they depend on the ability to accurately measure fast events as well as reproducible results. Reducing userspace clock resolution could help as well as adding more noise to the side channel “signals”. I suspect x86 architectural changes will be unavoidable, but I also wonder how much running something like “folding@home” in the background could help add timing noise in the interim?
Edited 2018-01-04 06:23 UTC

2018-01-04 4:58 pm
galvanash
Interesting duck tape and bubble gum:
https://stackoverflow.com/questions/48089426/what-is-a-retpoline
https://support.google.com/faqs/answer/7625886
Quick and dirty mitigation for spectre style attacks on GCC compiled code (and LLVM from what the Google article said). The Linux kernel will apparently start getting compiled to use this in the near future.
Favorite quote from the Google Article:
If it brings you any amusement: imagine speculative execution as an overly energetic 7-year old that we must now build a warehouse of trampolines around.
ps. We are going to see a lot more stuff like this. Yuck. Hopefully the silicon heroes will figure out a better way in the long term…
Edited 2018-01-04 17:13 UTC

2018-01-04 5:37 pm
galvanash
Forgot to add… If you read this carefully you’ll see that what they are doing here is effectively disabling speculation and thus neutering prediction in the return on an indirect call.
The way it is done incurs little or no performance overhead over non-speculative execution, but since it effectively disables speculation it is definitely not “free” either, and in fact is probably quite expensive for code where speculation was a performance win, unless you provide hints at compile time.
In other words this mitigates Spectre attacks at the expense of making the the call unpredictable… Yuck times infinity. I expect this will be used sparingly in specific places where prediction was either not helpful anyway or is so trivial that hints can be used (or where PGO is effective).
Edited 2018-01-04 17:40 UTC
2018-01-04 9:41 pm
Alfman verbose=1
galvanash,
Forgot to add… If you read this carefully you’ll see that what they are doing here is effectively disabling speculation and thus neutering prediction in the return on an indirect call.
…
In other words this mitigates Spectre attacks at the expense of making the the call unpredictable… Yuck times infinity. I expect this will be used sparingly in specific places where prediction was either not helpful anyway or is so trivial that hints can be used (or where PGO is effective).
Yea, maybe that’s what it’s going to take to “fix” it. It’s definitely a conundrum. I keep trying to think of solutions that leave it in place but attack the signal to noise quality, but it would invariably have to cause some slowdown to the caller so that they cannot distinguish between fast and slow speculative paths.
I think the CPUs could help by measuring the amount of entropy caused by speculative branches. If it’s insufficient the kernel would need to add more.
Theoretically the kernel could do some useful work before returning to the caller.
Edit: These ideas would be contrary to the goals of a low latency kernel though
Edited 2018-01-04 21:55 UTC
2018-01-04 9:54 pm
galvanash
Some further good info… Seems work on this has been happening for a while now on the down low…
https://lkml.org/lkml/2018/1/4/174
and of course would not be complete without a classic Linus tongue lashing…
https://lkml.org/lkml/2018/1/3/797
2018-01-04 10:09 pm
bassbeast
And you’ll note that Linus specifically says patches should be written with “not all CPUs (meaning non Intel) are crap” because from what I’ve read its a HELL of a lot harder to use this attack on ARM and AMD.
Basically its the difference between robbing a family safe and robbing a Brinks truck from what I’ve read, both are “doable” but the second is gonna take someone with serious skills while the former, thanks to intel making their CPUs hyper aggressive when it comes to speculation, is gonna end up being able to be packaged in malware kits and used by kiddies.
2018-01-04 10:31 pm
galvanash
And you’ll note that Linus specifically says patches should be written with “not all CPUs (meaning non Intel) are crap” because from what I’ve read its a HELL of a lot harder to use this attack on ARM and AMD.
His point was the it should be controlled with a flag, i.e. you can turn it off if you want to. It is a little to early to let AMD and ARM entirely off the hook…
His anger was more about Intel being so flippant as to release a finger pointing PR paper seemingly claiming no fault in the situation – and I agree with him. “Everyone else sucks too” is a pretty shitty way to respond to this thing…
That said, AMD processors seem to be immune to one specific attack vector demonstrated thus far – but they are not immune to the actual problem. The fix in question protects a critical part of the OS from the entire class of attacks by simply hiding it (i.e. you can’t leak memory that isn’t mapped anymore). Most of the kernel team seems to be of the mind that it is better to be safe than sorry and for now (and until more research is done) hiding the KPTs by default is the right thing to do.
ps. There is more to this fix than just that btw – there are all kinds of mitigations going into plugging every hole they can think of created by this – this was just the biggest piece of the pie so to speak. The other mitigations definitely do affect AMD and are thus needed anyway (the parts that are not about KPT exposure).
2018-01-06 11:29 pm
bassbeast
Again it is how DOABLE an attack is. If you want to get technical every.OS.on.the.planet. is easily hackable…if a pile of extremely rare conditions all line up exactly for the attacker, but what are the odds of this happening?
Oh and Linus has now put out a diatribe specifically aimed an Intel. not ARM, not AMD, not MIPS, he just gave Intel a giant middle finger for their design choices. Now what does that tell you? And it looks like the lawyers have also figured out Intel is the one that screwed the pooch as they are now getting hammered with class actions. Again not ARM Holdings, not AMD, just Intel.
https://it.slashdot.org/story/18/01/06/014226/linus-torvalds-says-in…
https://yro.slashdot.org/story/18/01/06/0131251/intel-hit-with-three…
2018-01-07 1:58 am
Alfman verbose=1
bassbeast,
Again it is how DOABLE an attack is. If you want to get technical every.OS.on.the.planet. is easily hackable…if a pile of extremely rare conditions all line up exactly for the attacker, but what are the odds of this happening? [/q]
I’ve also been asking this question myself.
[q]Oh and Linus has now put out a diatribe specifically aimed an Intel. not ARM, not AMD, not MIPS, he just gave Intel a giant middle finger for their design choices. Now what does that tell you? And it looks like the lawyers have also figured out Intel is the one that screwed the pooch as they are now getting hammered with class actions. Again not ARM Holdings, not AMD, just Intel.
I’ll admit this event has confused my moral compass more than usual. The trouble is that x86 specs say absolutely nothing about side effects. Without a spec that guaranties that there will be no side effects, is it reasonable for us to assume their absence? Why? The truth is side effects have always had security implications.
Take for example how monitoring the CPU power consumption can reveal a private RSA key. CPUs have a side effect of using ever so slightly more or less energy based on the high bits of the key. Do we blame intel (and other vendors) for that? Most would say no, it’s pretty obvious that the longer a computation takes, the more energy it will pull. It was deemed a software problem we’ve modified software to deal with it.
Spectre would likely blow over quickly if only there were an easy software fix, but this time there isn’t. The lack of a fix, more than anything, is why we’re angry. We want to blame intel, but there seems to have been a massive failure across the security community outside of intel too. For some reason nobody was questioning the implications about this problem even though many of us were aware of their existence of caching side effects.
Heck, even me personally, I’ve used many things without really questioning them. You know what though, if some company wants to pay me to do it, I can question things all day long
2018-01-07 8:05 am
bassbeast
Well according to the engineers at AMD when it comes to doable? Yeah not very likely.
https://www.reddit.com/r/Amd/comments/7o2i91/technical_analysis_of_s…
It appears the one you have to watch for is the “Rogue Data Cache Load” which is the only one where you don’t have to have a bazillion things all line up perfectly for the attacker to pull it off (the other two are trivially easy to patch without real impact and the odds of pulling them off is insanely small) and this bug, which is the one that is gonna cripple performance? ONLY AFFECTS INTEL CHIPS if you have an AMD or ARM CPU? You are completely unaffected.
So you can see there is a reason why Linus and the lawsuits are all targeted Intel, because owners of AMD and ARM systems can have theirs patched without getting royally screwed performance wise, but Intel chips is gonna be boned. On a personal note I’m gonna LMAO if it turns out the performance hit is the same as what Intel did to AMD with the cripple compiler…wouldn’t that just be some delightful karma if Intel did to themselves what they did to AMD during the Netburst debacle?
2018-01-04 10:42 pm
Alfman verbose=1
galvanash,
and of course would not be complete without a classic Linus tongue lashing… [/q]
Haha.
It also looks like they’re trying to make the kernel run-time patchable, but GCC would need to be modified, or they suggested using perl. I found this dialog humorous, but it highlights how even our toolchains are not prepared either.
https://lkml.org/lkml/2018/1/3/828
[q]> > > It should be a CPU_BUG bit as we have for the other mess. And that can be
> > > used for patching.
> >
> > It has to be done at compile time because it requires a compiler option.
> >
> > Most of the indirect calls are in C code.
> >
> > So it cannot just patched in, only partially out.
>
> You can replace the pushl ; jmp with an alternatives section (although
> there might be a lot of them). Even if gcc isn’t smart enough to do that
> perl is.
So you say, that we finally need a perl interpreter in the kernel to do
alternative patching?
Thanks,
tglx
Edited 2018-01-04 22:46 UTC
2018-01-05 8:10 pm
dionicio
OpenBSD going to say -again- Told You.

2018-01-04 2:40 pm
BlueofRainbow
I think you framed the situation extremely well in brand-name neutral fashion:
any processor that implements any form of speculative execution (which is basically anything remotely modern) is probably susceptible to some form of this attack if it has a cache or some other resource that can be used to perform timing checks.
The actual implementation of the exploit will likely be specific to a given processor family.
Anyways, it would be interesting to hear what retired professor Niklaus Wirth would say about the situation. After all, he has sought simple solutions to complex hardware/software design constraints. Speculative execution, while attempting to improve performance, has introduced a security flaw. The additional code complexity required to patch the hole will in turn reduce performance and may in fact introduce further security flaws.
Maybe the next generation of processors should not implement any form of speculative execution?

2018-01-04 6:55 pm
JLF65
Maybe the next generation of processors should not implement any form of speculative execution?
Given how threaded software is today, instead of speculating, just switch threads. Speculation is WASTING hardware that could be running other threads any time the speculation fails… which could be quite often. While switching threads means the previous thread is delayed a little when speculation could have eliminated that delay, you’re still doing work that the program overall needs done by all the threads. Getting rid of speculation only slows single-threaded apps.

2018-01-07 11:11 am
bhtooefr
That’s the approach that the Bonnell microarchitecture Atoms, as well as UltraSPARC T1 (and I think T2?) used.
It hasn’t proved to be sufficiently performant, though – even if you can maintain multithreaded performance, single thread still matters.
2018-01-07 10:00 pm
JLF65
True, but not nearly as much as even just five years ago. Watch any Doom 2016 benchmark that shows the cpu core usage – single threading isn’t an issue at all. Doom really does a bang up job of splitting the game over all available cores evenly. No single core sits at 100% – they all fluctuate almost lockstep in usage. This is becoming more common every year. If CPUs moved to threading over speculation, it would force everyone to do this or fall to competitors who do. No one is making cpus faster, they’re making them more cycle-efficient and with more cores. Where two or four cores were mainstream 5 to 10 years ago, now it’s all 8 to 16 cores… or more! While single-threaded apps can count on a new processor making a thread perhaps 5% faster, a multi-threaded app can count on the same new processor making it anywhere from 30% to 500% faster.
Edited 2018-01-07 22:02 UTC
2018-01-07 10:49 pm
Alfman verbose=1
JLF65,
True, but not nearly as much as even just five years ago. Watch any Doom 2016 benchmark that shows the cpu core usage – single threading isn’t an issue at all. Doom really does a bang up job of splitting the game over all available cores evenly. No single core sits at 100% – they all fluctuate almost lockstep in usage. This is becoming more common every year. If CPUs moved to threading over speculation, it would force everyone to do this or fall to competitors who do. No one is making cpus faster, they’re making them more cycle-efficient and with more cores. Where two or four cores were mainstream 5 to 10 years ago, now it’s all 8 to 16 cores… or more! While single-threaded apps can count on a new processor making a thread perhaps 5% faster, a multi-threaded app can count on the same new processor making it anywhere from 30% to 500% faster.
That’s true, modern games don’t typically push the CPU as much as the GPU. This has been the case for a long time and even an older system is perfectly playable with a good GPU. However for most of my workloads single threaded performance still dominates. Take the gimp and inkscape, they both satisfy my graphic editing needs but unfortunately their effects haven’t been optimized for multicore. It’s a shame because they’d be good candidates for it.
For one project where I worked on software defined radios, my personal computer (4c8T) actually outperformed a far more expensive 12C server in one of the conversions because my computer had a single threaded performance advantage.
It’s still common for software developers to disregard optimizing because CPUs are already fast enough and multithreaded code is hard. So outside of certain niches I find a whole lot of software doesn’t really target multicore, for better or for worse.
Maybe now that the problems with speculative execution have come to the forefront, things may change going forward?
2018-01-07 3:36 pm
dionicio
How about keeping all security, comm related code blocks digested through non-speculative paths?
2018-01-07 3:42 pm
dionicio
Hybrid architectures keeping speculation only to those LOCAL tasks performance hungry.

2018-01-05 4:50 pm
zima
Worst-Cause-Scenerio – Every byte of address space in a computer system could be read at any time by a low privilege process. Memory protection is effectively dead.
Hm, “funny” how we could end up back in the wild west of a situation WRT memory protection like where, say, Amiga was (and still is) 3 decades ago…

2018-01-03 9:12 pm
rmeyers
In the blurb printed above, Intel never actually names AMD or ARM. It says ‘many types of computing devices – with many different vendors’ processors…’ and when it references AMD and ARM it says that it ‘is working closely with many other technology companies, including AMD, ARM Holdings…’.
There is a statement from AMD (as seen on the Techdirt website) where they claim “AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against.”

2018-01-03 9:17 pm
Kochise
If a javascript exploit is already available, it will be easy to actually check if AMD and ARM are affected by the bug/feature.
2018-01-03 10:07 pm
Delgarde
Yes, it’s an excellent piece of weasel-wording â€“ it never actually claims that AMD or ARM are vulnerable (the “many different vendors” are unnamed), but by mentioning them as working with Intel on the issue, it encourages readers to associate them with the problem.
2018-01-04 7:07 am
sj87
Obviously AMD stock prices coming down due to this malicious statement by Intel is proof that Intel pretty much went for this effect and that it worked well. They would not release a dubious statement without dubious intents.
They called out a company in a certain context for certain effect. There was no other reason to name AMD just to mention that they are “working together” to solve an unnamed issue.
Edited 2018-01-04 07:07 UTC

2018-01-03 9:27 pm
Alfman verbose=1
Intel is committed to the industry best practice of responsible disclosure of potential security issues, which is why Intel and other vendors had planned to disclose this issue next week when more software and firmware updates will be available. However, Intel is making this statement today because of the current inaccurate media reports.
Check with your operating system vendor or system manufacturer and apply any available updates as soon as they are available. Following good security practices that protect against malware in general will also help protect against possible exploitation until updates can be applied.
I understand why intel is doing this (to give MS, apple, amazon, and large linux distros a chance to address the flaw themselves before pulling the curtains), however most independent service providers like me who build custom kernels are really getting screwed by it since we don’t get the same privilege and can’t even begin to assess the scope of this flaw until they officially choose to publish more information.
Edited 2018-01-03 21:46 UTC

2018-01-03 10:07 pm
Bill Shooter of Bul Platinum Prime
Yup, you’re getting hosed.
Hopefully, you’re customers will understand the advantages of your service over the security of going with a kernel that does get advanced notice of large bugs.
Really, having maintained a linux kernel briefly, the best bet is to be based off a large vendor, if possible, making integrating any fixes easy as possible.

2018-01-03 11:51 pm
Alfman verbose=1
Bill Shooter of Bul,
Yup, you’re getting hosed.
Hopefully, you’re customers will understand the advantages of your service over the security of going with a kernel that does get advanced notice of large bugs. [/q]
I hear that, but as I’m sure you can appreciate, reality always brings about nuanced problems regardless of one’s approach. Customers, for their part just want things to work, which is a reasonable expectation. However there are countless times I’ve performed “stable” updates on commodity distros only to find out it broke some dependencies for one customer or another.
I try to wash my hands of it by letting them run whatever they need inside their own VMs, but updates can still break. Not a kernel issue, but over the years PHP has been particularly problematic. In one instance a 3rd party dev imposed strict requirements for a magento component that was only compatible with an older version of magento that was only compatible with an older version of PHP that was replaced in the official distro. I only discovered all of this after getting complaints after applying official ubuntu updates.
That’s my experience with mainstream distros, if you have no special needs, updates typically go pretty smoothly, but any custom configuration or locally installed software can still be hell at times.
[q]Really, having maintained a linux kernel briefly, the best bet is to be based off a large vendor, if possible, making integrating any fixes easy as possible.
Yeah, the key is fully automating as much as you can.
Edited 2018-01-03 23:52 UTC

2018-01-04 12:31 am
JLF65
The statement from google doesn’t confirm anything. Other than this statement “These vulnerabilities affect many CPUs, including those from AMD, ARM, and Intel, as well as the devices and operating systems running on them.” there’s not another word about AMD processors, and they even state they know of no ARM processors that are affected. Talk about contradictions!
2018-01-04 12:57 am
Alfman verbose=1
Thom Holwerda,
Update: Google’s Project Zero disclosed details about the vulnerability a week ahead of schedule due to growing concerns, and they indeed confirm AMD and ARM processors are also affected: [/q]
Another link to project zero’s blog contains tons of technical info! Turns out our discussion ideas here on osnews, although the late night pseudo code was buggy/incomplete/sub-optimal, we were heading on the right track
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-me…
[q]We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation. The full Project Zero report is forthcoming (update: this has been published; see above).
These is extremely presumptuous, but what are the chances an osnews discussion may have factored into google’s decision to publish early? Thom, maybe you could track down google visitors in the server logs, haha.
On a serious note though this confirms an extremely bad outcome. The critical vulnerabilities from this are very widespread…damn it.
Edited 2018-01-04 01:04 UTC

2018-01-04 1:56 am
adkilla
It appears to only impact Linux and FreeBSD users using an AMD CPU with the manually turned on BPF JIT. Intel is affected regardless.

2018-01-04 3:29 am
Alfman verbose=1
adkilla,
It appears to only impact Linux and FreeBSD users using an AMD CPU with the manually turned on BPF JIT. Intel is affected regardless. [/q]
From here on out we really need to clarify that there different types of attacks with different attack surfaces:
https://spectreattack.com/spectre.pdf
https://meltdownattack.com/meltdown.pdf
I just finished reading these…whew, intense. Both use the speculative behavior existing in all modern CPUs.
The “meltdown” attack executes in userspace and relies on the kernel pages being loaded into userspace page table. This attack works just as we speculated yesterday by using restricted kernel memory to produce cache side channels. The fact that the supervisor bit is set on these pages doesn’t matter to the speculative code pipeline. So by the time the page fault is generated, it’s too late, the side effects have already been introduced into the caching subsystem.
The authors were unable to use the meltdown attack against AMD processors, but leave open the possibility that it might be possible.
We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. First of all, our implementation might simply be too slow and a more optimized version might succeed. For instance, a more shallow out-of-order execution pipeline could tip the race condition towards against the data leakage. Similarly, if the processor lacks certain features, e.g., no re-order buffer, our current implementation might not be able to leak data.
The kernel patch making it’s rounds removes kernel pages from user space and effectively fixes the MELTDOWN vulnerability. As predicted, it fixes both KASLR and this new vulnerability (for a performance cost):
[q]the KAISER countermeasure that has been developed to mitigate side-channel attacks against KASLR which inadvertently also protects against Meltdown.
The spectre attack relies on the same speculative branch execution but instead of running in local address space it runs in the address space of a target victim. Spectre instead relies on an oracle within the target address space and exploiting the CPU’s speculative branch behavior to produce covert side channel leaks. As such, the KAISER patch does not fix the spectre attack! Assuming a suitable oracle is found in the victim, the attack can work across unrelated processes and even VMs.
You are correct that the kernel’s BPF JIT feature can create an explicit code pattern for the attacker to use with the spectre attack, however in theory suitable candidates can be found in dozens of megabytes naturally occurring code instead of having to be explicitly generated by the attacker. So that’s why even disabling these features can’t completely close the vulnerability outright.
This is so bad…2018 may become a crappy year for sysadmins.
Edited 2018-01-04 03:40 UTC

2018-01-04 3:55 am
hdjhfds
AMD processors named by projectzero are bulldozer variants. I wonder what’s the case with zen, given that it is a whole new design, built from scratch.

2018-01-04 8:26 am
Adurbe
This will become a textbook event in my view. What we donâ€™t know is what the ongoing impact will be. It will almost certainly have a GDP level effect on some economies, will that perpetuate a loss in confidence in the sector? We will have to wait and see!

2018-01-04 10:53 am
Alfman verbose=1
Adurbe,
This will become a textbook event in my view. What we donâ€™t know is what the ongoing impact will be. It will almost certainly have a GDP level effect on some economies, will that perpetuate a loss in confidence in the sector? We will have to wait and see!
We always knew monocultures were bad, this is what we get for ignoring it. At least there’s two implementations with intel and amd, but we really need a healthy market for alternatives to x86. Many of us here have been asking for that, but will society ever learn?
On the one want we should remind everyone that at least it’s a read-only vulnerability, but on the other hand much of the infrastructure that could be targeted contains SSL certificates, user credentials, SSH authentication keys, private vpn keys, etc, so the potential for damages is huge and the expense of replacing all vulnerable hardware at once is inconceivable.

2018-01-04 11:25 am
Adurbe
I don’t disagree. A monopoly has meant this issue is not something that is easily contained and countless systems will not be in a position to be updated. If you thought the HeartBleed attack on the UK’s NHS was bad, this has a terrifying potential..
Maybe its time for Oracle to bring back the SPARC and POWER to get a decent kick again.
On a plus note, nice cheap servers will be on Ebay soon!
2018-01-04 12:23 pm
Vanders
We always knew monocultures were bad, this is what we get for ignoring it. At least there’s two implementations with intel and amd, but we really need a healthy market for alternatives to x86.
Spectre also affects ARM. It potentially affects any architecture with speculative branch execution.

2018-01-04 8:33 pm
Alfman verbose=1
Vanders,
Spectre also affects ARM. It potentially affects any architecture with speculative branch execution.
Yes potentially, but we’ve already seen that AMD is harder to exploit and more secure then intel on the very same x86 architecture due to pipeline differences between them. If we had a healthy diversity in architectures, we’d be increasing the odds that some would be resistant.
If you don’t mind an analogy, it’s similar to how bananas and coffee beans have been inbred so much that we’ve rendered the world supply highly vulnerable to a single strand of disease that are wiping them out.
https://www.nytimes.com/2014/05/06/business/international/fungus-cri…
Scientists are scrambling to add genetic diversity to add resistance, and this is what we must do with computers too.
Upon reflection, there are some shortcomings of the x86 design that exacerbate the threat. Like sharing cache across security boundaries. But fixing this isn’t enough as spectre can measure timing from oracles inside the target itself. It remains to be seen if we can just increase the noise to damage the signal to noise ratio that these timing attacks fundamentally depend on.
Since spectre requires the presence of a specific type of code pattern in the target, one solution might be for CPUs to automatically detect this code pattern and to report it to the OS so that it can mitigate it. There’s a lot of ideas that could potentially work, but the key is never having all our eggs in too few baskets since that’s an unmitigated disaster.
Edited 2018-01-04 20:35 UTC

2018-01-04 10:53 pm
dsmogor
The same verification could be done statically by the os in the ELF loader
2018-01-05 1:18 am
Alfman verbose=1
dsmogor,
The same verification could be done statically by the os in the ELF loader
It could be somewhat problematic because of the CISC nature of x86 opcodes and it’s von Neumann architecture which blurs the distinction between code and data.
An ELF code scanner could scan the binary for problematic patterns, but it could pick up byte patterns that never end up being executed and miss others that do.
To illustrate:
Assuming many megabytes of potentially vulnerable code.
db A B C D E F G H I J K L M N O P…
This scanner could start at A and go to Z, but some of those bytes may be data instead of code. F-J could say “hello”, which by shear coincidence could trigger a vulnerable pattern even though it never gets executed. Second of all you have no idea where an instruction starts due to indirect jumps (common with some software design patterns). So FGH may represent a vulnerable opcode pattern while GHI doesn’t, but the code scanner has no idea which, if either, might be executed ahead of time without evaluating every possible branch of code under every possible input. To say nothing of DEG and HIJ, etc…
The CPU on the other hand does know, because it is actually executing the code in question, so it can just tell the OS at that time that a potential side channel leak is occurring. It would even work with JIT code compilation. I concede this would add complexity to both the CPU and the OS, which I’m not a fan of at all especially if there are better ideas that can work.
Edited 2018-01-05 01:29 UTC
2018-01-05 7:12 am
dsmogor
Perhaps then a JIT like crawling verification could be implemented that would interpret the code on the 1st pass instead of passing it straight to CPU. I’d take a slight hit on 1st pass but that’s still better than flushing TLB cache on each syscall.
2018-01-05 10:46 am
Vanders
AMD is harder to exploit with the Intel code; there’s no reason somebody can’t develop an AMD specific version that works just as well as the Intel version.
Also static analysis isn’t going to help: there’s almost certainly more than one way to exploit this, and it doesn’t help with JIT languages at all (which is why we’re seeing patches arrive to perform process isolation in web browsers: it seems likely it can be exploited in Javascript!)
2018-01-05 7:35 pm
Alfman verbose=1
Vanders,
AMD is harder to exploit with the Intel code; there’s no reason somebody can’t develop an AMD specific version that works just as well as the Intel version. [/q]
It has nothing to do with “intel code”. When you read the papers you can see that the engineers were struggling to attack AMD as well. The only attack they were able to achieve was when they had some control over code inside the kernel (via netfilter’s dynamic code compiler). The different design characteristics in AMD’s speculative execution make it more resistant. This doesn’t mean the attacks are impossible on AMD, but the side channel leaks are weaker even in pristine lab conditions and that’s an advantage over intel.
[q]Also static analysis isn’t going to help: there’s almost certainly more than one way to exploit this, and it doesn’t help with JIT languages at all (which is why we’re seeing patches arrive to perform process isolation in web browsers: it seems likely it can be exploited in Javascript!)
I haven’t recommended static analysis at all. dsmogor mentioned it earlier, but in my response I pointed out why static analysis would be very challenging.
However if the CPU were to collect timing entropy information in real time as a result of the branch predictor, then that could help the OS mitigate the problem before returning to the caller.
2018-01-06 10:26 am
Vanders
It has nothing to do with “intel code”. When you read the papers you can see that the engineers were struggling to attack AMD as well.
The exploit, as presented, works well on Intel CPU’s but not AMD CPU’s, for the reasons you give. That doesn’t mean that it can’t be done just as effectively on AMD; it’s just that the precise details are going to be slightly different, and we haven’t worked out yet how do it because the researchers focused on Intel components first. So you have an exploit that works well on Intel (but poorly on AMD), and you’ll probably have on that works well on AMD (but poorly on Intel): “Intel code” and “AMD code”.
2018-01-06 11:29 am
Alfman verbose=1
Vanders,
The exploit, as presented, works well on Intel CPU’s but not AMD CPU’s, for the reasons you give. That doesn’t mean that it can’t be done just as effectively on AMD; it’s just that the precise details are going to be slightly different, and we haven’t worked out yet how do it because the researchers focused on Intel components first. So you have an exploit that works well on Intel (but poorly on AMD), and you’ll probably have on that works well on AMD (but poorly on Intel): “Intel code” and “AMD code”.
I just take issue with your “just as effectively” claim since AMD doesn’t speculate as deeply as intel. It doesn’t necessarily mean spectre can’t work on AMD with a naturally occurring code pattern, but it does mean attackers are statistically less likely to find vulnerable side channels that work for AMD. I guess time will tell if there are any septre attacks that only work on AMD, but given what I know right now I have every reason to believe that every attack that works on AMD will also work on intel but not necessarily the reverse.
Anyways it doesn’t matter, this whole thing has me flustered. It’s only a matter of time before it starts getting exploited in the wild. Although it may have been exploited by spy agencies for a long time and we would have never known it!
2018-01-06 10:11 pm
Vanders
Sure, I get what you’re saying. I think it’s fair to say though that when it comes to Spectre “easy” is kind of relative. We’re basically talking about the difference in threading a needle with your hands behind your back, and threading a needle with your hands behind your back blindfolded. One is harder, and more impressive, but not by as much as it seems.

2018-01-04 2:01 pm
Megol
Adurbe,
This will become a textbook event in my view. What we donâ€™t know is what the ongoing impact will be. It will almost certainly have a GDP level effect on some economies, will that perpetuate a loss in confidence in the sector? We will have to wait and see!
We always knew monocultures were bad, this is what we get for ignoring it. At least there’s two implementations with intel and amd, but we really need a healthy market for alternatives to x86. Many of us here have been asking for that, but will society ever learn?
On the one want we should remind everyone that at least it’s a read-only vulnerability, but on the other hand much of the infrastructure that could be targeted contains SSL certificates, user credentials, SSH authentication keys, private vpn keys, etc, so the potential for damages is huge and the expense of replacing all vulnerable hardware at once is inconceivable.
Unless with “monocultures” you mean processors with speculative execution I think you are mistaken.
This isn’t an x86 problem. It could be constructed as an operating system problem but that is far-fetched as even a clean microkernel could be vulnerable.
The problem is that of shared caches and shared branch predictor state(s):
The attacker primes the branch predictor in order to make the victim have a branch mispredict at a selected instruction sequence.
The attacker then makes the victim execute that instruction sequence.
The information gained is read by either direct access of a shared memory area or indirectly by statistic analysis trying to detect if the code sequence have flushed some data that was resident before.
So:
Shared branch predictor state means arbitrary information can be read.
Shared caches means that hit/miss timing can be used to transfer the information.
A clean microkernel with copying semantics (avoiding shared memory) makes the leakage part much harder but still possible.

2018-01-04 9:11 pm
Alfman verbose=1
Megol,
This isn’t an x86 problem…. [/q]
[q]So:
Shared branch predictor state means arbitrary information can be read.
Shared caches means that hit/miss timing can be used to transfer the information.
A clean microkernel with copying semantics (avoiding shared memory) makes the leakage part much harder but still possible.
Shared state is definitely the problem, and the solutions going forward will invariably have to remove shared state and to reset branch prediction in certain code paths.
But I already pointed out how sparc was ahead of x86 in this regard since it already did separate page table entries by process. The worldwide consolidation to x86 has had negative repercussions in terms of making the vast majority of the world’s servers share the same security flaws. To be clear, I’m not picking on intel here, having a monoculture of ANY given architecture is bad in and of itself.
Edited 2018-01-04 21:27 UTC

2018-01-04 9:26 pm
galvanash
Shared state is definitely the problem
Isn’t it always?
I kid, but seriously, it amazes me after 25 years of computer work just how often the pendulum swings back around on this one…
Just have to figure out how to apply the Actor Model to hardware
Edited 2018-01-04 21:31 UTC
2018-01-05 12:52 pm
Megol
Megol,
This isn’t an x86 problem….
So:
Shared branch predictor state means arbitrary information can be read.
Shared caches means that hit/miss timing can be used to transfer the information.
A clean microkernel with copying semantics (avoiding shared memory) makes the leakage part much harder but still possible.
Shared state is definitely the problem, and the solutions going forward will invariably have to remove shared state and to reset branch prediction in certain code paths.
But I already pointed out how sparc was ahead of x86 in this regard since it already did separate page table entries by process. The worldwide consolidation to x86 has had negative repercussions in terms of making the vast majority of the world’s servers share the same security flaws. To be clear, I’m not picking on intel here, having a monoculture of ANY given architecture is bad in and of itself.
Don’t mistake the brain-dead Intel bug with x86 – it’s not x86 specific.
Don’t mistake a specific OS design choice as the only choice. A SASOS wouldn’t be affected AFAIK, not using a shared pagetable in order to optimize performance wouldn’t be affected AFAIK.
AMD doesn’t have the page table problem as they doesn’t have a brain-dead bug. Which makes them as safe as the SPARC.
The other problem is one more general but one that can be solved in several ways. Tagging branch prediction entries, flushing branch prediction state at context switch etc.
But long as there are caches there will always be unintended sharing of information, while flushing the L1 data cache on a context switch may be reasonable (but expensive) flushing L2, L3, L4 etc. would not be.
And even without caches one could theoretically gain information from DRAM access times – recent DRAM pages are faster to access.
2018-01-05 8:15 pm
Alfman verbose=1
Megol,
Don’t mistake the brain-dead Intel bug with x86 – it’s not x86 specific.
Don’t mistake a specific OS design choice as the only choice. A SASOS wouldn’t be affected AFAIK, not using a shared pagetable in order to optimize performance wouldn’t be affected AFAIK.
AMD doesn’t have the page table problem as they doesn’t have a brain-dead bug. Which makes them as safe as the SPARC. [/q]
Yes and no, the thing is the two go together hand in hand. Because of the x86 design, operating systems were encouraged to use shared page tables for performance reasons. You are technically correct that they didn’t have to use shared page tables. In fact the countermeasure for meltdown is to stop using shared page tables, but the motivation for doing that in the first place was to reduce the overhead caused by not doing it on x86. That is an x86 problem, one that sparc did not have. Who knows, in the future x86 may adopt sparc’s page table model. x86’s forward/backward compatibility requirements would give x86 even more complex legacy baggage though. (I’m tempted to do away with x86 all together and do a “reboot” as they say in hollywood to get rid of all the legacy complexities).
AMD doesn’t have the page table problem as they doesn’t have a brain-dead bug. Which makes them as safe as the SPARC.
Sure, regarding the meltdown attack, we can fault intel for not respecting security boundaries in speculative execution paths, something amd apparently isn’t vulnerable to.
The other problem is one more general but one that can be solved in several ways. Tagging branch prediction entries, flushing branch prediction state at context switch etc.
But long as there are caches there will always be unintended sharing of information, while flushing the L1 data cache on a context switch may be reasonable (but expensive) flushing L2, L3, L4 etc. would not be.
There actually is a way to eliminate the side channel leaks via cache usage in speculative code, and that’s to produce the same side effects regardless of which path was taken. Then all the timing information would remain ambiguous. Unfortunately this approach doesn’t scale well. And it can’t handle indirect branching.
[q]And even without caches one could theoretically gain information from DRAM access times – recent DRAM pages are faster to access.
I never knew that. Do you have links about this?
Edited 2018-01-05 20:29 UTC
2018-01-07 4:13 pm
dionicio
“But long as there are caches there will always be unintended sharing of information,..”
Not On an Actor Model. Retaking transputer path?
https://en.wikipedia.org/wiki/Transputer
But would refer to the excellent Byte journal, to those who could keep them.
Edited 2018-01-07 16:30 UTC
2018-01-07 4:39 pm
dionicio
Isn’t the controller of an hybrid hard disk, a transputer? If not, it should.

2018-01-04 11:01 pm
dsmogor
In fact the immediate fix to the problem will remove many performance advantages monolithic kernels possess. Could it be a window of opportunity for microkernels?

2018-01-04 5:00 pm
subsider34
AMD issued a statement addressing each of the Variants in turn. According to them, their processors are only really vulnerable to Variant One.
http://www.amd.com/en/corporate/speculative-execution
Other sources are reporting that AMD’s mention of a “near zero risk of exploitation of [Variant Two]” is in reference to how it is only exposed when the Kernel’s BPF JIT compiler is enabled.
http://www.zdnet.com/article/google-reveals-trio-of-speculative-exe…
AMD seems to be coming out best in this. Their choice to implement speculative execution only when it doesn’t have the potential to cause memory access violations (unlike Intel) is proving prophetic. Out of Intel, AMD , and ARM processors, AMD is the only one immune to the Meltdown exploit.
https://www.anandtech.com/show/12214/understanding-meltdown-and-spec…
Edited 2018-01-04 17:00 UTC
2018-01-04 9:07 pm
MrMosis
Is it time to get my Itaniums back out?
2018-01-04 9:19 pm
laffer1
According to Redhat, POWER 8 and 9 are affected as well by spectre. It’s not limited to ARM, AMD and Intel.
2018-01-04 10:26 pm
psychicist
Is Itanium also affected? What about SPARC and mainframe processors?

2018-01-05 10:53 am
Vanders
SPARC is probably affected since the SPARC T4 (I.e. the first CPU to support Out of Order Execution)
Itanium isn’t as it doesn’t support Out of Order Execution: the very thing that made it hard to write for effectively. I don’t know if there’s some irony there or not, but the two remaining users of Itanium can celebrate I guess?

2018-01-04 10:37 pm
gld59
And… patches for Windows 10 and for Firefox have already started rolling out. (They hit my computer 30 to 60 minutes ago.)
2018-01-05 5:37 am
Munchkinguy
I knew I should have stayed with my MIPS processor!
2018-01-05 6:59 am
dsmogor
One important fact about the Google Zero article is how they successfully reverse engineered the branch predictor behaviour using timing characteristics.
With branch predictors being one of the most closely guarded topic of CPU companies, it’s not unreasonable to expect Chinese CPU companies ( incl. ones sponsored by the govt) to have tried this long time ago. If they tried this hard it doesn’t take genius to draw the security conclusions, which means it’s likely they have been in possession of the technique for quite long time.
Another repercussion of this is that it weakens of all private memory based DRM schemes so I’d expect efforts to hack PS4 / Xbox a solid shot in the arm.
2018-01-05 7:54 pm
dionicio
From Wikipedia all ref:
“Jaguar does not feature clustered multi-thread (CMT), meaning that execution resources are not shared between cores..”
But in-core exploit still plausible:
“Out-of-order execution and speculative execution”
Successor Puma Architecture a little better shielded:
“Support for ARM TrustZone via integrated Cortex-A5 processor”
Where Cortex-A5:
“Single-issue, in-order microarchitecture with an 8-stage pipeline”
This Later Arch could be, effectively, software patched.

2018-01-07 11:20 am
bhtooefr
Successor Puma Architecture a little better shielded:
“Support for ARM TrustZone via integrated Cortex-A5 processor”
Where Cortex-A5:
“Single-issue, in-order microarchitecture with an 8-stage pipeline”
This Later Arch could be, effectively, software patched.
That’s… not how this works.
The Cortex-A5 is being used as a security coprocessor, to do things like hold security keys and the like. The Cortex-A5’s pipeline has no bearing on what arbitrary code running on the actual x86 CPU cores is able to do, other than providing isolation of those keys.

2018-01-07 4:48 pm
dionicio
Thanks Bhtooefr. Till this moment though of it as a Security PREprocessor.