With Linux 4.0, you may never need to reboot your operating system again.
One reason to love Linux on your servers or in your data-center is that you so seldom needed to reboot it. True, critical patches require a reboot, but you could go months without rebooting. Now, with the latest changes to the Linux kernel you may be able to go years between reboots.
That’s pretty cool. Is there limits on the level of patching that a running kernel can have?
Say, can the scheduler be patched, or even replaced with a new one?
Let’s not make it out to be something more than it really is.
It’s a way to do quick security fixes while in use and without a reboot/downtime.
Never meant to do large changes.
Well, yeah, but I am curious about the limits.
Don’t know if you are a programmer… but let’s try to keep this simple.
What is going is you have a memory address where a function starts. Other locations point to this memory address for the code that wants to call it.
They load a module which creates the new function at a different location (maybe make a backup of the original) and at that memory address of the original function they put a jump to the new location.
Now as you can imagine, if you want to make larger changes than just changing how a single function works…
well, things get complicated fast. 🙂
And it’s already more complicated because you can’t change the function while one of the CPUs has it in use I believe. So there is timing or locking involved.
Especially fun if you need to update more than a single function.
Edited 2015-03-04 08:00 UTC
I haven’t looked into how Linux 4.0 will be doing it, but my method would be to suspend scheduling momentarily, emplace the in-memory call redirection, then resume scheduling.
Using preemption to interrupt all running threads should take only a few milliseconds.
Well, actually, it’s sort of both methods. Because both were proposed. One by Suse and the other by RedHat.
And my description sucked, it’s more complicated, the Wikipedia descriptions are better:
http://en.wikipedia.org/wiki/KGraft
http://en.wikipedia.org/wiki/Kpatch
Later on the made a new patch, which supports both patch formats. The method it uses isn’t in the article:
http://www.infoworld.com/article/2862739/linux/the-winning-linux-ke…
Neither KGraft or Kpatch does data structures. The combined method does not do data-structures either.
https://www.suse.com/documentation/sles-12/art_kgraft/data/art_kgraf…
Section 7 here kinda explains why. You have nvidia or other unknown kernel modules loaded change a data structure is like playing chicken with a train. You know what one is going to lose.
Is there a solution. Kinda but its still horible. Hibernate combined with kexec and again hope you don’t become closed source driver road kill due to not supporting kexec.
Some people wonder why Linux people hate closed source drivers so much. When hibernation and kexec both end up bust at times because of them its kinda understandable.
oiaohm,
I was thinking this too: serialize the system state in running kernel (ie sockets, processes, pids, file handles, etc), kexec into new kernel, load previously saved state in new kernel. This would allow kernel upgrades to be almost arbitrary in nature. As long as both kernels supported the same serialization format (say XML), then the change in data structures would be totally irrelevant.
However this would be the slowest approach as a kexec into a new kernel while loading state might take 10-60s, in which case it’s a dubious benefit over a normal reboot (other than the fact that applications haven’t lost their state).
Edited 2015-03-04 15:16 UTC
Actually 4.0 doesn’t include all the pieces and we have to wait for a later version before it can be used:
https://lwn.net/Articles/634649/
I haven’t looked into how Linux 4.0 will be doing it, but my method would be to suspend scheduling momentarily, emplace the in-memory call redirection, then resume scheduling.
Using preemption to interrupt all running threads should take only a few milliseconds.
The two different methods exist for a key reason.
Kpatch implements your method.
KGraft does not. KGraft takes a RCU method so scheduler does not stop.
Problem here is Pre-emption on a 4000+ core system might take several mins to complete.
Kpatch better for smaller systems. KGraft better for large systems. KGraft introduces some extra limitations on how kernel data structs can be altered.
So neither is 100 percent correct answer.
KGraft allows both the old and new function to be unsafe at the same time.
Linux kernel 4.0 supports both solutions.
Wouldn’t it be much quicker to upgrade my OS? I am hoping someone more knowledgeable can tell me:
If you don’t reboot you can keep most of the data you need in memory. That could mean that Ubuntu does a distupgrade in memory, does a check, and stores the result to the harddrive. I would think that this would be much faster than downloading it to hd, installing it to hd and rebooting.
Over time, some bits might randomly flip in RAM – stuff that remains resident but is read frequently could slowly get corrupted anyway. (If it’s a server, you will want to be using EEC RAM, of course) – either way, the kernel won’t catch the change and undefined scenarios could play out after long uptimes.
Well, with ECC, single-bit errors are repairable, and double-bit errors are at least detectable. Linux kernels after 2.6.30 will also scrub memory looking for bytes that generate errors, and if the same location generates errors too often, it will mark the page as not-to-be-used.
Server gear frequently has BIOS options for scrubbing, as well, if your OS doesn’t support it directly.
This should pretty much eliminate silent corruption in memory.
There are already systems with decade+ uptimes.
For example, here’s a thread on Ars where a user had to shut down a NetWare server, because the bearings on one of the 5 1/2″ full-height hard drives was making too much noise. That system had 16 1/2 year of uptime.
http://arstechnica.com/civis/viewtopic.php?f=23&t=1199529
Of course, that’ 16 1/2 years without security updates. Ew.
Edited 2015-03-04 06:56 UTC
True. I wonder what the mean-time-to-failure is, for crashes caused by random memory-corruption.
No different to any non-patched system, and given I have personally worked on machines with uptimes measured in multiples of years, there’s no reason a patched kernel couldn’t maintain that level of uptime as well.
The hardware (especially power supplies or spinny rust) is going to fail before a random, uncorrected, bit-flip will cause a crash.
This will inevitably be compared to ksplice, now owned by oracle. The benefits of ksplice couldn’t be practically realized without additional service contracts to provide well-tested kernel patches. Although one might assume this was simply due to oracle’s business model, but actually ksplice technology needed some help from developers to support specific kernels.
http://en.wikipedia.org/wiki/Ksplice
So I wonder to what extent this new kernel patching technology is going to be automatic versus require tweaking by distros for each kernel?
How are the no-reboot patches in Linux 4.0 different from ksplice?
http://en.wikipedia.org/wiki/Ksplice
Didn’t ksplice already allow for this functionality?
Well, I suppose the primary difference is that this doesn’t rely on proprietary tools nor require a subscription with Oracle.
That’s cool, thanks. I was under the impression that ksplice was part of the Linux kernel itself.
Quite frankly, I doubt it’s entirely “risk free”. For example, if key data structures are changed (e.g. they’ve change something from a linked list into a hash table) I’d just expect it to screw everything up.
In a similar way; a blocked process has data on its kernel stack; and if any code is changed that relies on the data on a process’ kernel stack (including just updating the compiler without changing any kernel code) then I’d expect that to cause a massive disaster too (like, all processes crashing).
For something important (e.g. critical server), I wouldn’t trust this at all – I’d disable it and reboot when updating the kernel (note: I assume that if downtime is extremely important you’re using fall-over or something in case of hardware failure or hardware upgrade, and rebooting is even less of a problem than it would be for something “non-critical”). For typical desktop systems where it doesn’t matter if you reboot or not; I’m too lazy to figure out the mystic incantations needed and would just reboot in that case too.
– Brendan
Edited 2015-03-04 01:53 UTC
It’s similar to ksplice.
You use it for security updates. Not for new features.
If a memory structure needs to change because of a security update, it’s a more complicated patch and needs some manual coding by the people making the reboot-less-patches.
And self-contained bug fixes. But yeah, no kernel data structure updates allowed.
You can do kernel data structure updates, it usually does means someone has to make the kernel patch by hand.
At least that was how it was with ksplice
Right; but the article (and other articles) are saying silly/misleading things, like “never reboot again” (and aren’t saying realistic/practical things, like “temporarily postpone rebooting for minor security patches“).
– Brendan
It will works for securiry updates – as by nature they are forced to only solve security problem and do not change anything else and not to break compatibility (ie. no data structure change). Mostly it means change some conditions (“if”) or use pointer instead of a value of the variable (this is C language). Your kernel hacker will tell you if the problem could be solved this way (we have them already as kernel package maintainers). And most of real enterprise distributions do patches/fixes/security updates this way (new features are introduced in a [half of a] year timeframe or so) – for example see https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#Version_histo…
First off: isn’t this a journalist known for being a bit of a MS shill?
Secondly: replacing bits of a running kernel is really microkernel territory surely. Binary patches into a monolithic kernel is only going to work part of the time – like when the incoming patch is smaller than the slot you push it into.
It could certainly be popular in data centres though foul ups could be monumental in scale. I am guessing there will be mechanisms for signage, encryption & verification so that the whole thing stays securely on it’s rails.
Well, surely if the replacement function is larger than the original function it can simply be placed elsewhere in memory and a JSR or similar placed in the original function location. That should work fine until the entire kernel is replaced at some date in the future. I don’t see why it really matters whether the function is part of a binary blob in the kernel or a separate module loaded afterwards once its location is known.
Haha, I dare say there’s a good bunch of Linux admins crying in their calomile tea that any dweeb can match their oh so precious uptime numbers now.
Hopefully this encourages them to apply security patches instead of putting their e-peen uptime ahead of everything else.
Uptime for individual servers is really a meaningless metric and have been so for quite some time now. Either your service is so important that you already have a failover or it isn’t in which case rebooting for an update doesn’t matter.
It is even more meaningless when you get to the weird voodoo magic that is the IBM mainframe*, where you can literally replace every hardware component a piece at time without a second of downtime.
If you do that, is it the same server? It’s a modern day Ship of Theseus!
*Higher-end Sun gear used to be able to do this, don’t know about Oracle’s gear. HP Integrity can, too, I think, and others.
Drumhellar it does not exactly have to be that high end. Its possible in some whitebox solutions.
When you get up into Intel Xeon and AMD Opteron you have motherboard interconnects. So a duel motherboard system yes this does have duel to quad power-supply(yes power supply bars so a single failed powersupply dues not disrupt things) can in fact hot-swap everything as long as you are very careful and very sure of what motherboard is in fact the active node.
The Higher end gear has nice ways of displaying active state and sometimes locks to prevent removal of active.
“It’s a modern day Ship of Theseus” kind of normally the core chassis in these systems normally don’t get replace or can be replaced. Some of the most insane IBM had fragmented chassis so it was possible to piece by piece change the chassis.
Back in the day the overall uptime of a system was a benchmark I loved to track. I enjoy reading stories of how many years a system had been on without being shut down. However in the modern era “the system” is often not one server but collections of servers. The state of the collection, shall we say cloud , of servers performing that task is what is important. Therefore rebooting to apply patches isn’t such a bad thing. It’s still cool to say that a system never *has* to go down. However I imagine it doesn’t mean that should be the MO.
Hank,
I agree, having services provided redundantly is the ideal solution, albeit not as practical for desktops/single servers.
With a redundant cluster, one could simply take each node offline one at a time. The cluster’s built in redundancy mechanisms should provide service seamlessly to customers. Now you can perform whatever maintenance you need on individual nodes. When they came back up the can join the cluster again and customers are none the wiser that it was ever down.
That doesn’t always apply if you are, for example, the cloud provider. AWS (& other compute clouds) have had to do a fleet-wide reboot to patch Xen issues twice in the past 12 months. While it’s easy to say “Well, customers shouldn’t rely on any single instance”, the reality is that customers do get grumpy when instances go down. Also when you’re the size of AWS you can’t spend weeks per. AZ slowly rebooting your thousands of virtualisation hosts.
You’re right, though, that the normal procedure should still remain to install a new kernel & reboot.
Uh, are you sure? We have a load of servers on AWS and none of them has had a reboot (that wasn’t done by us) in the last 12 months.
Soulbender,
I had read this too, but I was under the impression it was just some of the older nodes. Still, I have to wonder why live migration wasn’t available? Was there a breaking change that prevented live migration from being used?
Note I’m not an amazon EC2 customer, I’ve only trialed it. For my needs, I’d require virtualization be available to me, which would require nested virtualization to work on EC2 nodes and I don’t think it does. I’m reading that there are third party hyper-visors that work around this limitation and run on EC2:
http://www.virtuallyghetto.com/2013/08/ravello-interesting-solution…
http://www.ibm.com/developerworks/cloud/library/cl-nestedvirtualiza…
In my own lab, Linux nested virtualization used to work (via KVM-Intel), but was broken last year and it hasn’t worked for me since (using stock kernels).
https://www.mail-archive.com/[email protected]/msg111503.html
http://www.spinics.net/lists/kvm/msg112133.html
EDIT: Found a comment on Ars that goes into detail about other potential issues with Xen migration (Which is what AWS uses):
Edited 2015-03-05 20:03 UTC
Drumhellar,
That makes sense to me. Probably would have been feasible to throttle the migrations but it would take a long time and they figured a reboot was better for them than migrating all the bits around the network/drives.
Oops. I accidentally edited that quoted part out of my post after I found a larger, more informed post on the subject.
Drumhellar,
Very informative! Makes me wonder why Xen didn’t always have upgrade paths in place. Each version should always be able to migrate to the next. I guess they didn’t enforce this, but I hope they have this sorted for the future!
How do you actually use this?
Is there something I need to do from userspace to use kpatch/kgraft?
Or distros need to add the support?
How does it work?
This headline needs a hyphen. “No reboot patching comes to Linux 4.0” means that Linux 4.0 will not have a reboot patch. “No-reboot patching comes to Linux 4.0” means that Linux 4.0 will allow people to install patches without rebooting.
In the small one-man-driven-server-world I’m able to reboot any machine at any time I want to (well, mostly – on Samba server for 600 people I simply cannot reboot until 11pm). But in the real business there are more admins and rules what could be done and when it could be done. Reboots are planned ahead of such time (say once per week). But there could be a security fix in the kernel I want to apply even the scheduled reboot is far in the future. And there is a welcome place for no-reboot patches for me to save my ass, because if my server will be hacked, it is over my responsibility. But still I take responsibility over “unnecessary” reboot (and my boss do not understand that our server was not hacked because I did the reboot out of schedule because there is no proof about this I’m able to present in front of his boss).
Edited 2015-03-05 20:25 UTC