Linux: Reboot Like a Racecar with kexec

Submitted by diegocg 2008-10-16 Linux 29 Comments

Kexec is a feature that allows to boot kernels from a working kernel. It was originally intended for use by kernel and system developers who had to reboot several times a day. Soon, system administrators for high-availability servers found use for it as well. As systems get more and more advanced, and boot times get longer, end users can now benefit from it.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

29 Comments

2008-10-16 11:09 pm
bornagainenguin
Too bad there are issues with some drivers and apparently most of these will be with proprietary drivers like the one mentioned in the article….
Still, I don’t know how useful this will be for white box users, but it sounds like this has some serious potential for OEMs and laptop\netbook users if they can build some common hardware profiles for this thing and enable it for those profiles it is known to work well in.
Some serious considerations given to this and the 5 second boot demo are making me think we’ll be seeing nigh instant Linux booting in the near future!
No wonder Microsoft is asking if people are interested in such a thing for their next version of Windows!
Events like this only conspire to reinforce my already strong belief that the computer and IT fields desperately need Open Source and Free Software to provide the indispensable competition Microsoft desperately needs but has made use of their illegal monopoly to sheild themselves from.
I sincerely hope there will always be Linux, Haiku, ReactOS, Syllable…something Microsoft can’t simply bankrupt out of existence to provide that competition and inspire their coders to new heights they would never climb if left to their own devices.
–bornagainpenguin

2008-10-17 12:01 am
Morph
issues with some drivers and apparently most of these will be with proprietary drivers like the one mentioned in the article
One cited example of a driver problem from the example is sufficient evidence that “most” problems will be with proprietary drivers? Come on.
…desperately need Open Source and Free Software to provide the indispensable competition Microsoft desperately needs but has made use of their illegal monopoly to sheild themselves from.
I’m glad that I didn’t have to wait longer than the first comment for someone to fire up the Microsoft bashing. I hate having to read through 8 or 10 insightful, clever comments to get to the Microsoft h4te.

2008-10-17 1:49 am
bornagainenguin
morph whined…
One cited example of a driver problem from the example is sufficient evidence that “most” problems will be with proprietary drivers? Come on.
I’m not trying to be funny; I’m going by experience and what I know of FLOSS culture.
If there are any proprietary drivers running when people report bugs and the app functions okay with the FLOSS driver then most of the time the reaction seems to be ‘too bad’ or some variation of ‘its not my fault, talk to your manufacturer’ and that will put off most users.
Even users who understand the politics of the situation.
…desperately need Open Source and Free Software to provide the indispensable competition Microsoft desperately needs but has made use of their illegal monopoly to sheild themselves from.
I’m glad that I didn’t have to wait longer than the first comment for someone to fire up the Microsoft bashing. I hate having to read through 8 or 10 insightful, clever comments to get to the Microsoft h4te.
I live to serve.
But seriously we only have to look as far as IE6 to see what happens with Microsoft when there isn’t someone around to push and prod them into actually innovating. Once Netscape was no longer a threat Microsoft broke up the IE team and sent them off to do other things.
I don’t think *cough*Iloveyou*cough* I need to remind you *cough*melissa*cough* what the end result of that *cough*codered*cough* was, do I?
Besides, why do you have to focus on the negative? I was saying this was a good thing if you remember?
Or are we to infer from your reception you consider Microsoft a one-trick-pony who won’t be able to keep up with the rest of world if it has to compete…?
–bornagainpenguin

2008-10-17 10:28 am
Morph
what I know of FLOSS culture…If there are any proprietary drivers running when people report bugs and the app functions okay with the FLOSS driver then most of the time the reaction seems to be ‘too bad’ or some variation of ‘its not my fault
Sounds like the problem is (partly) with FOSS culture, then!?
look as far as IE6 to see what happens with Microsoft when there isn’t someone around to push and prod them into actually innovating
Yeah, that’s a good (bad!) example. Fortunately Microsoft’s standards have improved a lot since the bad-old-days of Windows 9x and IE4-6. IE8 is quite developer-friendly and has eg tabs-in-separate-processes, and Vista has some great UI and under-the-hood improvements (don’t believe every criticism you read!). Competition surely did play a role here. I’d guess that Apple, Google and Mozilla are mostly to thank for that!

2008-10-17 2:24 pm
bornagainenguin
morph pointed out…
Sounds like the problem is (partly) with FOSS culture, then!?
Yes, but not really. Corporate developers are all too happy to pass the buck whenever they can as well. And as someone else has already said, this makes sense for them because they can’t fix what they can’t see.
morph posted…
look as far as IE6 to see what happens with Microsoft when there isn’t someone around to push and prod them into actually innovating
Yeah, that’s a good (bad!) example. Fortunately Microsoft’s standards have improved a lot since the bad-old-days of Windows 9x and IE4-6. IE8 is quite developer-friendly and has eg tabs-in-separate-processes, and Vista has some great UI and under-the-hood improvements (don’t believe every criticism you read!). Competition surely did play a role here. I’d guess that Apple, Google and Mozilla are mostly to thank for that!
As far as I can see IE 7 was more or less a new coat of paint, but one which broke quite a few intranet apps due to their reliance on doing things the “IE way” instead of following standards. IE8 looks like the team got back together and have been trying to do something good, something new. I’m actually looking forward to trying ot out if they release it on XP, but I doubt I’ll switch from Firefox without some serious benefits and even then it would have to be able to do everything I can currently do with Firefox as well.
As for Vista? Don’t make me laugh–I’m sorry but as far as I’m concerned Vista is in the same place IE was in with IE6…
It has inhaled air every time I’ve had to deal with it, and the fact it needs three or four times the RAM just to be able to do smoothly what XP can do on half that just shows how broken it is IMHO.
Still…I’m hoping now that Apple is gobbling up mindshare and their usershare is increasing and people are being exposed to Linux on netbooks as well as all the incrememntal improvements to Linux we’re seeing…
I’m hoping all this will be enough to trigger a reaction in Redmond and inspire some competition in Windows beyond the PR campaign.
Still it works both ways too–kexec has been around for years now and this looks like the first time its been mentioned with home users in mind. Makes me wonder what else there is we could be using at home?
–bornagainpenguin
Edited 2008-10-17 14:27 UTC

2008-10-17 12:42 pm
segedunum
One cited example of a driver problem from the example is sufficient evidence that “most” problems will be with proprietary drivers? Come on.
Well yes, logically they will. The good thing about open source drivers shipped with a kernel is that those drivers are tested with that kernel as one whole and their overall quality and stability is kept up to a reasonable level as a result. If a driver presents a problem then an awful lot of people are going to know about it. With binary add-on drivers you have no guarantee whatsoever what will happen, and that approach kind of negates the point of this – being able to boot up reliably fast.
I’m glad that I didn’t have to wait longer than the first comment for someone to fire up the Microsoft bashing. I hate having to read through 8 or 10 insightful, clever comments to get to the Microsoft h4te.
Pity Microsoft. They don’t have a kernel and driver infrastructure that allows them to be able to do useful stuff like this, and like it or lump it, when you get to the common denominator, being able to do this reliably relies on the integrity of the kernel and drivers and ultimately having the source available.
It’s just a shame that we will have to sit through endless reboots of new drivers and patches.
Edited 2008-10-17 12:53 UTC

2008-10-16 11:58 pm
Doc Pain
There’s one thing I don’t understand from the demonstration script shown in the article. It reads as follows:
if [ -x `locate kexec | grep sbin` ]; then
Why does something that is so “near” to the basal operations of the Linux OS relying on a locate database that is up to date (NB: update usually during night time operations periodic script, eventually at the weekend) while it could simply check -x on the default location? Okay, well, I’m implying that there is a default location for kexec. 🙂 My idea would be to check the usual places, maybe like this:
if [ -x /sbin/kexec -o -x /usr/sbin/kexec ]; then
Another variant would be:
if [ ! `which kexec` ]; then
Just a picky sidenote, forgive me. =^_^=

2008-10-17 12:44 pm
segedunum
I think you’re certainly correct there. I don’t know why you would want to rely on ‘locate’ to find kexec. Surely you would check the standard locations?

2008-10-17 12:03 am
Morph
That’s the big speedup; the actual kernel initialisation time isn’t any faster when rebooting with kexec, right?

2008-10-17 1:53 am
TemporalBeing
That’s the big speedup; the actual kernel initialisation time isn’t any faster when rebooting with kexec, right?
Not sure how Linux does it – haven’t looked at the source at all – but it could theoretically be faster than a normal boot operation. How?
Well, basically the various parts of the system would have to load the new kernel, temporarily halt (perhaps the processors System-Maintenance Mode?), serialize the states of everything in a special manner, start the new kernel with the serialized data, the deserialize everything and continue. You don’t even have to necessarily stop all the hardware – just pause the I/O between the serialization/deserialization steps.
So while boot may take 5 seconds – and you have to go through POST, do all kinds of initialization, etc. – kexec() style rebooting could very well be under 1 second since you skip the whole POST step, and you can even skip most of the initialization steps if you serialize/deserialize correctly.
Of course, you have to have two kernels that understand the same serialization structures. So it would have to be versioned and you would have to have some way to fall back to the old kernel in case the new one had a problem in the deserialization.

2008-10-17 5:14 am
Morin
> Of course, you have to have two kernels that understand the same
> serialization structures. So it would have to be versioned and you would
> have to have some way to fall back to the old kernel in case the new
> one had a problem in the deserialization.
I think that kinda defeats the purpose. Exchanging kernels is a risky task anyway, and introducing yet another potential source of problems (like the de-/serialization you mentioned) doesn’t seem very wise to me. Especially if your goal is either kernel development or a high-availability system.
2008-10-17 10:41 am
Morph
serialize the states of everything in a special manner
Yes, possible in theory, but sounds very difficult. Every driver would need to be modified to support it, and for some hardware (eg video cards, which undergo lots of voodoo-magic during video BIOS initialisation) there might be subtle caveats. Sounds like a lot of effort!
Anyway, hopefully in a couple of years EFI will replace BIOS, and slow POSTs will be a relic of the past.

2008-10-17 4:28 am
c0t0d0s0
This stuff looks like the stuff implemented in PSARC 2008/382 in Opensolaris build 100 ( http://www.c0t0d0s0.org/archives/4856-Flag-Day-of-PSARC-2008382-Fas… ). This stuff was introduced to save the time consumed by post. On a 512 GB machine this can take some time.

2008-10-17 8:13 am
zdzichu
Yes, it is essentialy the same stuff. Linux has it since 2002 ( http://lwn.net/Articles/15468/ ).
2008-10-17 12:54 pm
segedunum
Yes. Essentially it’s a very, very loose form of virtualisation, where you boot your kernel from an underlying kernel so you have a greater degree of control over the whole process. For testing it’s useful, because you can boot a kernel into a debugger or a crash dumper. For everyone else, it has the side-effect of cutting out lots of parts of the traditional boot process, making it faster. It’s been around in Linux for quite a while, but in this time of ever shorter boot times these things get a bit more attention.

2008-10-17 10:18 am
Weeman
Wow, so OpenSolaris announces Fast Reboot (which was in the works and available internally for a while) and whoop Linux suddenly releases their internal stuff, too. What a coincidence.
Edited 2008-10-17 10:19 UTC

2008-10-17 10:35 am
ichi
Wow, so OpenSolaris announces Fast Reboot (which was in the works and available internally for a while) and whoop Linux suddenly releases their internal stuff, too. What a coincidence.
Suddenly? Kexec is old.
2008-10-17 1:12 pm
segedunum
Wow, so OpenSolaris announces Fast Reboot (which was in the works and available internally for a while) and whoop Linux suddenly releases their internal stuff, too. What a coincidence.
I knew we’d get one or two people wading in with ‘OpenSolaris does this, Linux is just copying!’ I’m also not too interested in what has been available internally either, as it kind of negates the word open. But I digress.
Unfortunately, Kexec was committed around about 2002/2003. It’s mainly been used by kernel testers over the years because it makes things far easier, but it’s been used by many for a while to get new kernels and updates running on their servers and it’s gained more attention in these times of ‘instant-on’ access.

2008-10-17 2:04 pm
Weeman
I knew we’d get one or two people wading in with ‘OpenSolaris does this, Linux is just copying!’ I’m also not too interested in what has been available internally either, as it kind of negates the word open. But I digress.
Yeah, it’s really a shame that Sun’s not just throwing shit against a wall and looks what sticks, like the Linux folks do, and prefers for things to be designed and work stable enough out of the box… >_>
And since we’re on topic, regarding your highlighting bullshit, I’m certain various BigCo’s related to Linux kernel development are holding back a lot of code until stabilization, too. So don’t give us that “Hurrrrrr, OpenSolaris!” bullshit. Then again, why am I arguing with YOU? It’s like talking to a wall.
Unfortunately, Kexec was committed around about 2002/2003. It’s mainly been used by kernel testers over the years because it makes things far easier, but it’s been used by many for a while to get new kernels and updates running on their servers and it’s gained more attention in these times of ‘instant-on’ access.
So what? Sure wasn’t being used much or promoted outside kernel testing.

2008-10-17 9:26 pm
segedunum
Yeah, it’s really a shame that Sun’s not just throwing shit against a wall and looks what sticks, like the Linux folks do, and prefers for things to be designed and work stable enough out of the box…
Cry me a river. It’s the same sad line of ‘stable’ and ‘out-of-the-box’ reasoning I have had from every Sun consultant for the past ten years as I have yawned at him even after they’ve had their lunch eaten year after year.
Throwing shit at a wall, as you so eloquently put it, and seeing what sticks and then refining it in a Darwinian sense is the mark of having an open source community. Let me know when people are doing that with Solaris without the benevolent leader telling us what is stable and mission critical(tm) ;-).
Who knows? Maybe Kexec is the wrong universal approach and people should just concentrate on making hardware Linux boot faster?
And since we’re on topic
Well, no, we never were. We simply got some smart Alec coming in telling us that Solaris did this first – when it hasn’t ;-).
regarding your highlighting bullshit, I’m certain various BigCo’s related to Linux kernel development are holding back a lot of code until stabilization, too.
Errrrr, nope. Unless you commit early and release often you quite often get left behind in the Linux world, or your code simply gets rejected with all your effort wasted. You might have something in a personal Git repository for a few weeks, but you can’t sit on it for months or years until it is past by IBM or Red Hat marketing ;-).
So don’t give us that “Hurrrrrr, OpenSolaris!” bullshit. Then again, why am I arguing with YOU? It’s like talking to a wall.
I’m sorry. I made an assumption about what the word open was actually supposed to mean there. Quite clearly it means something different in OpenSolaris.
So what? Sure wasn’t being used much or promoted outside kernel testing.
I’m afraid that your original point, like a bull in a China shop, was that Solaris had somehow done all this first and Linux was copying. I hate to burst your bubble, but there it is.
Edited 2008-10-17 21:32 UTC

2008-10-17 2:37 pm
c0t0d0s0
Hmm, i don’t think it’s about “We were first!” from the Opensolaris side. It’s just the impression, that the introduction of PSARC 2008/382 into OpenSolaris led to the thought “Heck, we had the infrastructure for doing something similar in Linux for quite a time. Let’s make it available to end users” and perhaps “Shame on us, that we left the first availability in a standard distribution to the Opensolaris commuity”.
Edited 2008-10-17 14:38 UTC

2008-10-17 9:40 pm
segedunum
It’s just the impression, that the introduction of PSARC 2008/382 into OpenSolaris led to the thought “Heck, we had the infrastructure for doing something similar in Linux for quite a time. Let’s make it available to end users”
Well, this is one article is about something that has been available for five or six years. It’s not about making it generally available to end users in a distribution in a transparent manner yet. It’s entirely possible that this is a pointless general purpose approach and isn’t worth the driver issues it would cause, and the root solution is that people should just concentrate on making hardware and Linux generally boot faster.
and perhaps “Shame on us, that we left the first availability in a standard distribution to the Opensolaris commuity”.
Yer. I’m sure that every Linux distributor is currently crying themselves to sleep at night that Solaris is currently wheeling out FastBoot when they had technology there all the time to do it, and those running 512GB Linux systems and their distributors that have probably been using Kexec for a while ;-). Here’s an article from 2004:
http://www.ibm.com/developerworks/linux/library/l-kexec.html

2008-10-18 2:01 pm
c0t0d0s0
Nevertheless the boot time is an issue all server operating environments have to solve in the next few month in their standard commercial distributions (read as: with support from the vendor).
In twelve month from now i’m sure we talk about system with 1 TB in X86 range and 16-32 TB in high-end UNIX in a single system.

2008-10-17 1:15 pm
sbenitezb
If I remember correctly, Solaris used CDE and would still be using CDE (nice but not very modern) if it weren’t by GNOME/KDE which were developed once Linux became alive. What could be possibly wrong about using some technology the “competition” already has if it, somehow, improves the software? Quit bitching, what you say doesn’t really matter in this world (applies to me too).

2008-10-17 3:59 pm
zenulator
When did the OpenSolaris/Linux wars begin?

2008-10-17 4:02 pm
c0t0d0s0
A few comments ago
2008-10-17 10:04 pm
segedunum
About eight years ago was when it had its roots, and when many people saw that their real operating system running on real hardware was going to get gradually throttled.
When all that didn’t work out we recently got OpenSolaris, and there seems to have been a new war now using that for whatever reason. Meanwhile, few others seem to care apart from those with a sense of humour.
Edited 2008-10-17 22:07 UTC

2008-10-17 4:19 pm
Joshua Clayton
I was interested to read about the fact that the environment and hardware states are still “dirty” with this technique.
That makes me excited. Why
This looks like a project that will necessarily address the same things that make suspend/resume such a nightmare on Linux.
here’s hopin’
2008-10-17 8:00 pm
Milo_Hoffman
Main problem I have with kexec/kdump, is that it currently requires a CUSTOM ADDITION (crashkernel=) to the kernel boot parms to tell the kernel to reserve some space in memory.
This is a major pain in the behind in a large environment. Everytime a kernel updates you have to modify the grub.conf file to have this added onto the boot parms.
Thats easy when your managing every system by hand, but not so easy to do in a large environment.