Colin Percival, a FreeBSD committer and security team member, has found a local exploit against the current implementation of Intel’s Hyper-Threading Technology. “Hyper-Threading, as currently implemented on Intel Pentium Extreme Edition, Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from a serious security flaw,” Colin explains. “This flaw permits local information disclosure, including allowing an unprivileged user to steal an RSA private key being used on the same machine. Administrators of multi-user systems are strongly advised to take action to disable Hyper-Threading immediately.“
So far all of the examples he seems to have offered are BSDs, and while this is completely understandable (BSD seems to be his area of expertise) perhaps he should’ve taken the time to address more explicitly Windows concerns.
It would be interesting to see if an analagous problem exists for G/Power series processors.
“The flaw affects all operating systems, and for a secure multi-user environment essentially requires that Hyper-Threading be disabled. More information can be found on Colin’s web page on the topic.”
http://www.daemonology.net/hyperthreading-considered-harmful/
#1 Do I need to worry about my home computer?
Probably not. This security flaw is primarily a problem for servers.
Therefore, most users aren’t concerned.
From what I’ve read, the security problem regards the HT implementation on some Intel processors. He signaled the issue to the OS vendors, which are the ones who should investigate the extent to which they’re affected.
So, I think it’s up to the Windows, Linux, etc. security teams to address the issue on their OS, exactly like the *BSDs have promptly done.
Kudos to Colin Percival, from FreeBSD’s security team. This is a great service, both to FreeBSD users and to the users of other OSes sharing a similar implementation – from what I understand, Windows and Linux are included.
”
So, I think it’s up to the Windows, Linux, etc. security teams to address the issue on their OS, exactly like the *BSDs have promptly done. ”
what makes you think it affects any OS other than BSD’s which have copied from each other to make their implementation a security hole
I know you’re just trolling, but to clarify – it’s a problem with the *hardware*. So it will affect all OSes on HTT boxes.
what makes you think it affects any OS other than BSD’s
This sentence from CP makes me think they *might* be affected:
“Hyper-Threading, as currently implemented on Intel Pentium Extreme Edition, Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from a serious security flaw.”
Also, the kerneltrap article:
“The flaw affects all operating systems, (…)”
… which have copied from each other to make their implementation a security hole
I would slow down on that.. It’s universally acknowledged that the BSDs are among the most secure OSes in the world.
This is just one study:
http://www.mi2g.com/cgi/mi2g/press/021104.php
“The world’s safest and most secure 24/7 online computing environment – operating system plus applications – is proving to be the Open Source platform of BSD (Berkeley Software Distribution) and the Mac OS X based on Darwin.”
http://www.daemonology.net/papers/htt.pdf
How complicated is the exploit? I have seen alot of these problems surface that althought they may be a problem it is not practical to exploit it.
“I know you’re just trolling, but to clarify – it’s a problem with the *hardware*. So it will affect all OSes on HTT boxes.
”
thats NOT clear at all.
“I would slow down on that.. It’s universally acknowledged that the BSDs are among the most secure OSes in the world.
This is just one study:
http://www.mi2g.com/cgi/mi2g/press/021104.php ”
it would very nice if you people stop quoting Laura didio from Yankee group and Mi2g. these two are the top on the list of unreliable sources. I am not claiming that BSD is not secure btw just that you guys should stop assuming that every OS is affected.
>these two are the top on the list of unreliable sources
Well.. actually, I’d say the top of that list is occupied by the people who decide not to show their IP.
“Well.. actually, I’d say the top of that list is occupied by the people who decide not to show their IP. ”
matter of privacy really and the fact that I dont control my DHCP gateway
How did you do that?!?
if you read the source you’ll see the repeated need for the rdtsc instruction which can only be executed with a cpl of zero, if you had that privilege level you could just core dump the entire process.
if you read the source you’ll see the repeated need for the rdtsc instruction which can only be executed with a cpl of zero, if you had that privilege level you could just core dump the entire process.
The purpose of this exploit is to create a covert channel. Causing the process you’re spying on to dump core is a sure give away and doesn’t lend to gathering keys covertly.
rdtsc can be executed by non-priveldged processes on most OSes that I’m aware of:
#include <stdio.h>
#include <sys/types.h>
#include <machine/cpufunc.h>
int main(int argc, char **argv)
{ printf(“tsc is %lld
“, rdtsc()); }
Works on FreeBSD as a non-priveldged user. mph must be mistaken that it requires special privs to run.
This is a hardware problem. The shared cache of the HTT is what the problem is.
Intel recommends that, for performance reasons, one schedule from the same address space on the different HTTs. However, all OSes that I’m aware of schedule different processes (address spaces) on the different HTTs. In fact, I believe that HTTs show up as CPUs unless the OS goes to measure to notice that they are just virtual CPUs. This is why OpenBSD users need to disable HTT in the BIOS, for example.
Linux definitely does scheduling in an unsafe manner for HTT processors.
Confirmed. Runs outside of ring 0.
What the hell is FreeBSD Core doing wasting their time here :-P.
the info for rdtsc in ring-0 was from the intel docs, which if executed generates a GPF if executed outside of the ring. the method used to call rdtsc from within the OS appears to be a function call which in itself would add latency to the procedure and rendering it in its current form useless.
First, you should look at the generated code for the program I posted. The following code:
uint64_t x1=rdtsc(); uint64_t x2=rdtsc();
produces:
rdtsc
movl %eax, %esi
movl %edx, %edi
rdtsc
movl %eax, %ecx
subl %esi, %ecx
Modifying the program that I posted here before, I get a resolution of 84 cycles in the values returned from rdtsc() This corresponds to ~25ns in resolution. If there are cache effects, they are minimal. Running the program 20 times in a row gives me 84 every single time.
Do you have other HARD NUMBERS to backup your claim that tsc is useless for timing from userland? These simple experiments show on its surface appears to support Collin’s claims.
I’m sure that if this guy spent three hard-working months investigating and contacting people about this, he wouldn’t have overlooked the fact that rdtsc was ring 0 (if that was the case).
I didn’t claim that tsc was useless from userland, the point that I was trying to get accross was the utilisation of the rdtsc instruction within ring-3 when I beleive that it should be disabled (CR4), then in order to retreive a rdtsc value one must use a call gate which would increase the latency of the instruction.
Perhaps even outside of the limited 100 cycle window depending on the implementation.
… someone asked if he/she should worry about his/her home computer. This flaw only matters if multiple people use the same machine, and those people want privacy from each other. Given two users, Alice and Bill, Alice might be able to run a program that can steal Bill’s private RSA key.
But if Bill is the only user of his machine, he can hyperthread to his heart’s content, even if there’s a way that he can spy on himself.
Also, for a home machine, there are other, easier ways to defeat security. So even for shared home machines this doesn’t matter.
One possible workaround is to modify kernels so that two processes can’t share the same physical CPU, in different hyperthreads, unless both are run by the same user and neither is privileged. But this might not be necessary for all processes; we might want to mark processes with an attribute that says whether we care about this kind of information leak.
How many Windows users run true multi-untrusted-user machines? I’ve seen it occasionally, but you can only run so many remote desktop sessions on a single OC3 line .
Setting the tsd flag is not the way to fix this problem. There are legitimate uses for rdtsc outside of exploits obviously.
#1 Do I need to worry about my home computer?
Probably not. This security flaw is primarily a problem for servers.
Therefore, most users aren’t concerned.
Until your favourite website goes down :p
>Modifying the program that I posted here before, I get a
>resolution of 84 cycles in the values returned from rdtsc()
>This corresponds to ~25ns in resolution. If there are cache
>effects, they are minimal. Running the program 20 times in a
>row gives me 84 every single time.
Hopefully you didn’t just run the timing ONCE per program invocation ? If so, yes, you got “lucky” , the program executed all the handfuls of instructions in its timeslice, and you got similar time each time.
Now consider doing the timings in a loop a few million times, and comparing the times. Or trying to time something that takes a while, and think you can measure it with some nanoseconds resolution…
What if the scheduler kicks in, and schedules another process to run for some milliseconds. Or the OS decides to spend some time processing network packets. Or …
From the documentation:
———————————————————–
RDTSC Read from Time Stamp Counter
Copies the contents of the Time Stamp Counter (TSC) into EDX EAX. (The Pentium maintains a 64-bit Time Stamp Counter (TSC) that is incremented every clock cycle.) When the Current Privilege Level is 0, the state of the TSD bit in CR4 does not affect the operation of this instruction. When the CPL is equal to 1, 2, or 3, the TSC may be read only if the TSD bit in CR4 is 0. Only a supervisor level program may modify the value of the TSC.
Flags: No flags affected.
Encoding: 00001111 00110001
Syntax Example Clock Cycles
RDTSC rdtsc 6, 11
———————————————————–
So as the above indicates, rdtsc is allowed to run from user level if the OS permits it. No exception is generated. I don’t know of an OS that does not permit the use of rdtsc by user level code.
HTT: threads are shared though out the pipeline, no thanks!
It would be more in action of the x86 cpus could not reorder instructions – it might be more effective on Itanium.
With respect to worrying about home computers:
You _should_ worry if you are not 100.00% sure there is no spyware or trojans on your computer and you use the computer for financial transactions (home-banking etc).
On the other hand, the spyware and trojans should make you worry already in that scenario, so this just gives them another tool with which to screw you.
The important thing here is _not_ if the computer is single user, but if you can trust all the code that runs on it or not.
Poul-Henning
In one of the Pentium programming books from 1997, “inner loops” by Rick Booth he goes to great lengths to show us how to optimize asm code and time it to see that the optimization was worth doing.
After seeing that I used it right away, far more accurate down to the cycle than the system clock. I suspect its widely used today and taking it away would be bad idea.
If you know how the cache works and want to snoop its activity then RDTSC could certainly tell you indirectly what it is doing. If this fella hadn’t thought of it, somebody else would have sooner or later.