Windows 7 RTM Build 7600.16385 includes a potentially fatal bug that, once triggered, could bring down the entire OS in a matter of seconds: “The bug in question – a massive memory leak involving the chkdsk.exe utility – appears when you attempt to run the program against a secondary (i.e. not the boot partition) hard disk using the “/r” (read and verify all file data) parameter. The problem affects both 32-bit and 64-bit versions of Windows 7 and is classified as a ‘showstopper’ in that it can cause the OS to crash (Blue Screen of Death) as it runs out of physical memory,” reports InfoWorld’s Randall Kennedy. Microsoft is claiming the bug is a chipset driver issue, but Kennedy’s testing of the latest Intel INF Update Utility driver set and VMware virtualized chipset drivers suggests otherwise. “This is clearly a Microsoft bug – and the fact that it manifests itself via the chkdsk.exe utility makes me wonder if it isn’t something intrinsic to the Windows 7 version of the New Technology File System (NTFS) driver stack.” Worse still, user comments suggest that Windows Server 2008 R2 suffers from the same flaw.
…but it should literally take the engineers one day to find and fix the bug. It’s probably something relatively simple, possibly an optimization that they did not think through.
I say this having had experience maintaining operating system and file system utilities for a major OS/company.
LOL! Now I feel like I can’t post sentences like the previous one because of the “10 most annoying” thread.
Maybe, but quick, untested fixes tend to require more fixes later.
There is that… sometimes.
Only “sometimes”? In my experience, a quick fix made under time pressure will almost inevitably introduce at least one new bug…
In my experience sometimes the problem was so simple that it is obvious, and requires only a quick and small code change.
Allow me to elaborate… two “showstopper” bugs I have fixed that caused OS crashes where simple initialization issues. Shared memory variables were initialized, then cleared (accidentally) but another section of code, then referenced.
In these two cases it literally came down to moving two lines of code and documenting that certain things need to happen in a certain order.
…admittedly it was because of a rarely taken path in the code… but you get the point.
Microsoft (VP Sinofsky) already said it’s bug in chipset drivers that Microsoft provides in Windows 7 and in Windows 2008 R2 because they both use same driver. Most likely it’s generic driver for large amount of HW.
I doubt it’s a driver bug if in the summary it says people have been able to repeat it with running as a virtual-machine, which I doubt uses the same driver.
Could be a bug in the NTFS stack rather than a chipset driver – the NTFS drivers don’t change depending on hardware.
How does this rule out driver bug? You are aware that VMs lie guest OS what driver to use, most likely it is some generic all purpose driver. Again my point is based on 2 facts: 1) Few people confirmed that using chipset drivers from manufacturers fix this (Hotfix forum user), 2) Bug doesn’t happen on every machine.
They will have enough time to post an update. No worries here.
I ran into the problem on 64bit build 7127. Dunno if URLs are allowed on first post; there’s a DonationCoder thread on “Windows 7 evaluation” where I briefly mention the issue May 31st.
BSOD confirmed after running out of physical memory (no pagefile). Although the memory growth is visible in chkdsk.exe, I’d guess there’s some other factors involved as well – I’ve never seen a usermode memory exhaustion cause BSOD before.
Should mention that this is a vLited Win7 install, and that I do run intel chipset drivers.
Edited 2009-08-06 00:34 UTC
Disclaimer: I work for Microsoft, and I work on NTFS.
Chkdsk intentionally uses lots of RAM to cache information in order to make the process complete as quickly as possible. The fact that this is using RAM is not, in itself, a problem.
The original report was of a bugcheck during this process. That really would be a problem. The original reporter upgraded drivers and no longer saw the bugcheck.
We have, in the last 30 or so hours, run chkdsk on hundreds of machines to see if we can find the cause of the crash. As of this writing, we have been unable to reproduce the crash condition; and since the original reporter can no longer generate the crash, we are left with one crash on one machine that no longer occurs. We have not been able to examine what caused any machine to crash in this process.
At this point, there’s nothing to indicate that it’s a bug users will frequently encounter, that it’s a showstopper, or anything else.
There is a good chance that the original issue occurred presicely because the disk device was making mistakes (why does one run ‘chkdsk /r’?)
A better blog post describing this bug can be found at http://blogs.zdnet.com/Bott/?p=1235 . See also Steven Sinofsky’s response to the original reporter at http://www.chris123nt.com/2009/08/03/critical-bug-in-windows-7-rtm#… .
If anyone out there is encountering crashes when running chkdsk, please let me know. I’ll make sure it gets investigated promptly.
Since I’m a regular OSNews reader, when there is a nasty bug in the “NTFS driver stack”, OSnews readers will be amongst the first to know. For better or worse (depending on your intentions) today is not that day.
I’ve just ran chkdsk on my system, but terminated when I hit 7GB commit, with chkdsk consuming 5.3GB – didn’t feel like going higher and risking a BSOD right now.
EDIT: happens on both my 140gig windows-stripe and my 36gig regular partition that’s on the same disk as the system partition.
Edited 2009-08-06 00:45 UTC
Again, high memory usage is expected. The BSOD is what’s interesting. On my machine (which has 8Gb RAM) I saw chkdsk get to ~6.5Gb working set, but it still completed without problems.
Such high memory usage sounds very wrong to me, though – I can see the point in caching data structures during the FS integrity checks to speed up things, but why (try to) cache all the data sectors? Isn’t the logic of the surface scan to attempt-read, mark-bad + relocate on error?
BSOD or not, there’s definitely something to fix
OK, haven’t been able to reproduce BSOD – so that has probably been fixed by driver/whatever update. Which is good, I guess.
I still find it problematic memory usage goes through the roof, I have no chance to complete a surface scan since physical memory gets exhausted.
I’ve run chkdsk on a few hard drives mounted through a USB converter during my time with Win 7 RC and haven’t come across anything like that.
Of course, that was with RC, not RTM. Still, I’m not sure this is bad enough to put the brakes on the release timeline.
If this is triggered by a chipset driver, as Microsoft claims, then your USB drives are not likely to exhibit this behavior at all as they do not pass through the hard drive controller chipset (IDE or SATA) on your motherboard. It depends on which “chipset” they are referring to however and also depends on which particular driver is affected.
Quite mouthful isn’t it?
Even if it is in fact a bug in chkdsk, don’t you think a simple update would fix it?
That’s exactly what I was thinking. In fact, the update process prompts you to do updates as the last step!
A bug in an obscure feature that most users don’t even know about and even fewer use is show stopper?
I agree, all the while they’re leaving open a massive security hole? I’d think chkdsk would be low priority by comparison.
Typical Randall C. Kennedy.