“After struggling with this issue for well over a year and really pushing hard to track it down in the last two months I was finally able to come up with a test case running ‘cc1’ from gcc-4.4 in a loop and get it to fail in less than 60 seconds. Prior to finding this case it would take anywhere up to 2 days with 48 cores fully loaded to reproduce the failure. AMD confirms. ‘…it isn’t every day that a guy like me gets to find an honest-to-god hardware bug in a major cpu!'”
The question is, How do you patch a CPU?
BIOS update to try and address the microcode bug?
Since (outside of the people who visit this site) most people never update their BIOS, most operating systems will also load updated microcode.
The updated CPU microcode is volatile and therefore must be patched by either the BIOS or OS every boot.
That depends, is it a silicon bug (actual hardware) or a microcode bug (base instruction execution code)?
I am no major-processor expert, but I am an EE and normally if a bug is present in hardware (which costs bazillions of dollars to fix) a patch is created in software (firmware, microcode or whatever) to work around it.
According to the post, AMD confirmed it is a hardware bug. These I believe have been seen in the past, and the last one I remember (don’t quote me on this) was actually fixed by disabling certain instructions so that the bug would not happen.
maybe they can take advantage of the hardware bug. just document it and it’s no longer a problem – all software written to the AMD documentation should work as advertised.
AMD working on fixing the bug instead of covering it up makes me like the company even more.
That would explain the random seg faults I’ve been getting on my home server about once every 3 to 6 months.
>.<
Its not a Joe Shmoe like Bill, Shooter of Bul that found this cpu bug, its Matthew, Father of DragonFly BSD and HammerFS, Dillon.
It would take someone like him, who’s stubborn enough and secure enough in his own ability to fork off a major operating system, to find a CPU bug.
Me, I’m excited when I find a bug in my operating system ( if its not windows), database, or programming language and track it down in code. So I imagine he’s feeling like that only x 1000
Hope it not blown his brain away with ecstasy :/
Kochise
He’s also a compiler developer, having written the DICE C compiler in the past, so he knows/understands the inner workings of a CPU, assembler, etc. It’s not like John Q. Public, some random OS developer, found the bug.
phoenix,
“…so he knows/understands the inner workings of a CPU, assembler, etc. It’s not like John Q. Public, some random OS developer, found the bug.”
As a random OS developer myself, I feel totally dissed.
At what point did “OS developer” fall to the same level as a “script writer”? No disrespect . What’s taken the top spot for prestigious technical occupation?
Big hardware companies
All software developers, OS and compiler included, must kneel before whatever crap they come up with. And god, how bad it can get…
Edited 2012-03-06 22:43 UTC
Baller.
This news story along with your comment made my day.
If you get a thrill out of finding bugs in operating systems, Haiku nightly builds should be a constant orgasm for you.
https://en.wikipedia.org/wiki/Pentium_FDIV_bug
Hopefully AMD is looking close at what Intel did in this situation.
akkad,
These are more common than you might realize. Having hundreds of “erratum” isn’t unusual. For it’s part, Amd already has experience with CPU faults, for example, AMD’s Phenom processor line once exhibited caching errors and the solution for processors in the field was to disable certain caches.
http://techreport.com/articles.x/13741
Something I didn’t know until searching today, is that Linux kernel patches were released to work around the processor bug…so there’s another answer to the first poster’s question “How do you patch a CPU?”.