Fixing Mass Effect black blobs on modern AMD CPUs

Thom Holwerda 2020-07-20 Games 9 Comments

Mass Effect is a popular franchise of sci-fi roleplaying games. The first game was initially released by BioWare in late 2007 on Xbox 360 exclusively as a part of a publishing deal with Microsoft. A few months later in mid-2008, the game received PC port developed by Demiurge Studios. It was a decent port with no obvious flaws, that is until 2011 when AMD released their new Bulldozer-based CPUs. When playing the game on PCs with modern AMD processors, two areas in the game (Noveria and Ilos) show severe graphical artifacts.
[…]
What makes this issue particularly interesting? Vendor-specific bugs are nothing new, and games have had them for decades. However, to my best knowledge, this is the only case where a graphical issue is caused by a processor and not by a graphics card. In the majority of cases, issues happen with a specific vendor of GPU and they don’t care about the CPU, while in this case, it’s the exact opposite. This makes the issue very unique and worth looking into.

An extremely detailed look into the analysis and fix for this very specific bug – and a download with the fix, of course.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

9 Comments

2020-07-20 9:16 pm
Alfman verbose=1
Well, it’s an interesting article and props to them for getting it working. I wish they dug a little further to identify the exact difference between D3DXMatrixInverse and XMMatrixInverse on AMD CPUs. They speculate that it could be a precision issue, but even rounding differences doesn’t clearly explain why the rendering goes black… Their job is done and they wrote a patch to avoid the faulty branch, but it seems like they got too tired of debugging to solve the mystery of the faulty branch, haha.

2020-07-21 10:45 am
JLF65
Given this is lighting we’re talking about, I’m willing to bet there’s a reciprocal square root estimation being done there somewhere that is being fed values close to black (0) that are going slightly negative on AMD SSE2. This would give a NaN when the root is done. Reciprocal square root estimation has historically been done with an eye toward speed rather than accuracy. Maybe the new code validates the inputs all positive before doing the estimate. No one validated inputs originally since that could be a major hit on some systems. If you look at the old Doom code as an example, the inputs to the column and segment rendering had conditional compiling on code to check the input parameters; it is normally disabled for the release binary as opposed to the debug binary.

2020-07-21 1:10 pm
Alfman verbose=1
JLF65,
Given this is lighting we’re talking about, I’m willing to bet there’s a reciprocal square root estimation being done there somewhere that is being fed values close to black (0) that are going slightly negative on AMD SSE2. This would give a NaN when the root is done. Reciprocal square root estimation has historically been done with an eye toward speed rather than accuracy.
Maybe, it’s still speculative though. By far the most obvious place to have square roots in game engines is for distance computation where the inputs are squared anyways and thus cannot be negative. Even allowing for numerical approximation, negative numbers would be highly suggestive of a bug somewhere. It’s really weird that mass effect causes this bug with D3DXMatrixInverse on AMD and nothing else (that we know of) does. I’m definately curious what’s happening, but ultimately it would take more specific input & output debugging to know for sure.

2020-07-22 3:32 pm
FlyingJester
Inverse square roots are extremely common in lighting calculations. It’s also a pretty common classification of floating-point mistake that you do something like this:
float foo(float in){
if(in == 4.0f)
return 0.0f;
else
1.0f / ((in / 2.0f) – 2.0f);
}
In which case certain values will still cause a divide-by-zero. You can imagine other cases where it could end up resulting in a slightly negative number, etc.

2020-07-22 5:53 pm
Alfman verbose=1
FlyingJester,
Inverse square roots are extremely common in lighting calculations. It’s also a pretty common classification of floating-point mistake that you do something like this:
float foo(float in){
if(in == 4.0f)
return 0.0f;
else
1.0f / ((in / 2.0f) – 2.0f);
}
In which case certain values will still cause a divide-by-zero. You can imagine other cases where it could end up resulting in a slightly negative number, etc.
Is that just a random function or is it supposed to represent something specific?
As you probably already know, in practice using equality on floating point variables as that code snippet does is looking for trouble…
https://stackoverflow.com/questions/17404513/floating-point-equality-and-tolerances
The algorithm, compiler, optimization, SSE, etc can change the results of floating point equality. IMHO one should never use equality on floating point without taking extreme care and factoring in the underlying architectural implementation.
https://en.wikipedia.org/wiki/Extended_precision
The Intel 8087 math coprocessor was the first x86 device which supported floating-point arithmetic in hardware. It was designed to support a 32-bit “single precision” format and a 64-bit “double-precision” format for encoding and interchanging floating-point numbers. The temporary real (extended) format was designed not to store data at higher precision as such, but rather primarily to allow for the computation of double results more reliably and accurately by minimising overflow and roundoff-errors in intermediate calculations.[a][10][11] For example, many floating-point algorithms (e.g. exponentiation) suffer from significant precision loss when computed using the most direct implementations. To mitigate such issues the internal registers in the 8087 were designed to hold intermediate results in an 80-bit “extended precision” format. The 8087 automatically converts numbers to this format when loading floating-point registers from memory and also converts results back to the more conventional formats when storing the registers back into memory. To enable intermediate subexpression results to be saved in extended precision scratch variables and continued across programming language statements, and otherwise interrupted calculations to resume where they were interrupted, it provides instructions which transfer values between these internal registers and memory without performing any conversion, which therefore enables access to the extended format for calculations[b] – also reviving the issue of the accuracy of functions of such numbers, but at a higher precision.
Anyways, obviously I get why divide by zero can be a problem, but as it relates to this article, if there were divisions by zero, D3DXMatrixInverse returning NAN would make sense for both AMD and Intel. Hypothetically maybe a floating point comparison was used and could be the source of this bug…? Or maybe the AMD implementation returns NAN whereas intel returns +/-INF? Alas, we’d be in a better place to answer questions like this if the author had provided more output.
2020-07-22 6:16 pm
FlyingJester
Alfman: It can work with inequality too, in . Imagine that it’s a sqrt() instead of a reciprocal.
2020-07-22 8:22 pm
Alfman verbose=1
FlyingJester,
Alfman: It can work with inequality too, in . Imagine that it’s a sqrt() instead of a reciprocal.
Ah, so you meant greater than & less rather than equality. So in effect asking whether these two lines of code are identical…
if (x>=y) return sqrt(x-y)
if (x-y >= 0) return sqrt(x-y)
Well, we can consider the edge cases.
float x=INFINITY;
float y=INFINITY;
printf(“x=%f y=%f\n”,x,y);
printf(“x-y=%f\n”,x-y);
printf(“x>=y %s\n”, (x>=y)? “true”:”false”);
printf(“x-y>=0 %s\n”, (x-y >= 0)? “true”:”false”);

x=inf y=inf
x-y=-nan
x>=y true
x-y>=0 false

Can you think of any other in-range numbers for x and y such that the lines are not equal? All the inrange numbers I tried worked (ie matched) as expected.
I suspect all the values in a real game engine would be finite in nature and even if it were possible to produce infinite numbers it isn’t really clear to me why intel wouldn’t also experience it. I think it’s too convenient to blame precision. The glitch, whatever it is, happens 100% of the time and not just for colors below a certain value or something like that. I really think something else is amiss, but again without data it’s just speculation.

2020-07-21 11:18 am
kurkosdr
Another reminder that despite Windows 8.1 with secdrv.sys enabled is theoretically bug-to-bug compatible with Windows 2000 and earlier, incompatibilities related to hardware do come up, but often result in frame-pacing issues, not graphical glitches.

2020-07-21 11:23 am
kurkosdr
earlier = later