“Major Linux distributors have been shipping ACPI in Linux for several years, yet mis-perceptions about ACPI persist in the Linux community. This paper addresses the most common myths about ACPI in Linux [.pdf].” In addition, there are proposals to utilise kexec-based hibernation in Linux.
I have had bad experiences in the past, but nowadays I can suspend/resume in linux with no problems at all (HP Pavilion dv9000/Feisty) I have done it several hundred times at this point, and it only messed up once.
The only thing is that even nowadays, the failures and successes are pretty much isolated cases, and the proper functioning of ACPI is still a roulette.
But, it’s not to be blamed on the OS exclusively and the situation is much better now.
About the article, nice read, there were a few bits in there that I didn’t know about.
I’ve noticed on my laptop that has an nVidia 6600 Go chip in it, that it doesn’t like Suspend very much. But I’m sure that’s due to the closed source nVidia driver. Same thing with killing the console. It’ll go into suspend, but as soon as I resume all I get is a blank screen with a mouse cursor.
The acpi pdf is just one of many interesting reads in the 2 full OLS2007 papers & this one is quite untechnical compared to the rest .
https://ols2006.108.redhat.com/2007/Reprints/
BTW the V1&2 pdf papers contain all the stuff below them .
If anyone is interested in the kexec-based hibernation proposal, you should also read the OLS paper on kdump:
https://ols2006.108.redhat.com/2007/Reprints/goyal-Reprint.pdf
This is shaping up to be the most reliable kernel crash dump facility out there, including the commercial UNIX implementations. These implementations simply hope that the crashed kernel is sane enough to rely on its memory management and I/O subsystems. Certain kinds of crashes will always result in dump failures.
The Linux kdump implementation uses a fresh kernel with a working userspace to dump the crashed kernel. It can use the crashed kernel’s page tables to filter unwanted pages from the dump (userspace, empty, free, and cache pages), but it can produce a full-memory dump even if the crashed kernel got completely trashed by, let’s say, an errant DMA. A working userspace means that dumping to NFS volumes, USB keys, DVD-RW, or a remote file over SSH is simple to implement.
The key to this is the relocatable kernel, which allows the kernel to be loaded (almost) anywhere in memory. The only question is: when and where do we load the capture kernel? As mentioned in the LKML posting about hibernation, it is possible to load the capture kernel as soon as we need it, making room presumably by writing the memory region’s contents to persistent media.
This might work for hibernation, where we are reasonably sure that the kernel is healthy, but it won’t be reliable enough for crash dumping. In fact, to support crash dumping during the boot process, we would prefer to pre-load the capture kernel in reserved memory before we load and boot the production kernel. This approach involves convincing administrators to reserve a small but significant amount of memory (currently 2-10MB) that cannot be used for production in exchange for reliable crash dumps.
Very few users will want both hibernation and crash dump support on the same system. Production servers don’t typically hibernate, and mobile devices don’t typically require first-failure data capture. So users can either choose to have reliable crash dumps or not. They can still hibernate without any reserved memory, and they can still get successful dumps most of the time if they really want.
There is another option that will come into play as virtualization becomes more prevalent. The hypervisor can be used as the capture kernel if any its guest kernels crash. Virtualization can also be used for hibernation. The hypervisor can dump its guests along with their states and then simply shut itself down. On reboot, the hypervisor can resume its guests from where they left off.
The inverse situation, where a guest dumps the hypervisor if it crashes, is actually pretty similar in concept to the current kexec-based kdump design. It stands to question whether kexec could eventually become a degenerate case of the kvm code. That would be a big win for maintainability and quality. Linus would approve.
Out of curiosity, if the capture kernel were preloaded, wouldn’t it be just as vulnerable to getting trashed as anything else in mem? Conversely, if it’s not preloaded, how do you ensure that whatever block of code is responsible for loading the capture kernel isn’t itself trashed? Does it require hypervisor support to be reliable?
You’re correct.
Any code running on bare metal in kernel mode has the potential to barf all over memory until it barfs all over itself. Only hardware virtualization can contain the damage. But this is an unlikely scenario. The most common cause of data corruption is that the code that’s supposed to be playing with the data does something wrong. The odds of the capture kernel getting trashed are slim because nothing in the production kernel is supposed to be playing with its data.
So, there is no design that guarantees that we will always be able to dump a crashed host kernel. But we can dramatically increase our chances by using a separate capture kernel, and nothing in a virtual machine can negatively impact the hypervisor’s ability to dump it. The hypervisor could crash on its own, of course, but not because the virtual machine crashed.
I’ve never gotten suspend/resume working on my Gentoo properly..If I have no X running, then it works just fine, but well, the whole point is to be able to suspend when X is running. From what I know, the problem is with the agpgart module, but in my AMD64 installation it can’t even be built as a module and on the other computer once agpgart is loaded, it can’t be removed..Sucks.
If you are having problems with suspending and resuming your laptop, you might take a look at the work being done with the Hal quirks site:
http://people.freedesktop.org/~hughsient/quirk/
Problems with suspend/hibernate/resume will be a thing of the past soon enough but they need your help in submitting patches.
I’ve never actually used suspend/hypernate on any of the laptops/desktops I’ve owned.
As for ACPI; you’re right thjayo, there is a bit of a ‘hit and miss’ as to the reliability of ACPI implementations out there.
For me, I found that as long as you stick to name brands, install the latest BIOS updates when they’re made available (I’m running HP Pavilion dv6209tx with Ver. 26 of their BIOS) and it is rock solid with Linux and Solaris Express (which I’m running now, latest build – opensol-20070709 ).
The problem is repeatable. I remember Linux a few years ago when it was getting blamed for instability by Eugenia with her VIA chipset and GeForce graphics card – if you choose, quite frankly, crappy hardware (which is generally cheap), you really have to ask yourself, why is it so cheap, what have they cut in terms of quality to get the price so low?
but definitely worth every second I spent on it.
I found it especially interesting in the part which covered how much of functionality usually thought to be ACPI related are actually not, e.g. “function-keys”
On a side note: I wish they would use a single column layout rather than the two columns one. Two columns are awkward to read on-screen, since it requires to scroll back to the page top when finish reading the first column.