Linux kernel 4.0 released

Thom Holwerda 2015-04-15 Linux 23 Comments

Feature-wise, 4.0 doesn’t have all that much special. Much have been
made of the new kernel patching infrastructure, but realistically, that not only wasn’t the reason for the version number change, we’ve had much bigger changes in other versions. So this is very much a “solid code progress” release.

Despite the version number, not a big deal.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

23 Comments

2015-04-15 11:21 am
kragil
Once all distros include live patching it will be a big deal. No more rebooting!
That will take a while though.

2015-04-15 11:57 am
WereCatf
Once all distros include live patching it will be a big deal. No more rebooting!
Doesn’t work like that. You can only “live-patch” bugfixes, but actual new features or such will still require rebooting.

2015-04-15 5:37 pm
Alfman verbose=1
WereCatf,
Doesn’t work like that. You can only “live-patch” bugfixes, but actual new features or such will still require rebooting.
You are right. In addition I seem to remember that about 10% of patches needed custom patch code to ensure they worked property. It’s not something we get for free.
In theory it should be possible for a kernel to never need to reboot again regardless of the features that are added to it. Microkernel architectures avoid the global kernel pointers that make patching structures so hairy, a micro-kernel architecture could actually make live patching fairly straitforward. It might even be possible to run different versions of kernel modules simultaneously using completely different internal structures, allowing us to test out the new kernel subsystems before redirecting requests from the old ones.
Of course none of this will be happening soon given that Linus is strongly in the monolithic kernel camp, where alerting shared kernel structures is inherently difficult. It’s an interesting problem though, one that I’d enjoy working on if I could get paid to

2015-04-16 12:00 pm
Yasu
Why are people so hellbent in not having to reboot? With SSD disks and powerful processors it doesn’t take that much time anymore anyway.
Is it just for the sake of it?

2015-04-16 2:14 pm
Alfman verbose=1
Yasu,
Why are people so hellbent in not having to reboot? With SSD disks and powerful processors it doesn’t take that much time anymore anyway.
Is it just for the sake of it?
It depends on your use case. I don’t think anybody’s talking about normal desktop users here, for them rebooting once a day isn’t a big deal. For services with service level guaranties, downtime equates to lost revenue. If clients are paying for zero-downtime, you’d better be able to deliver it. With respect to “mission critical” services such as traffic control, hospital, 911 emergency dispatch, etc, downtime is downright harmful. Hopefully there are contingencies in place, but even so it can be bad.

2015-04-16 3:14 am
Soulbender
Once all distros include live patching it will be a big deal. No more rebooting!
Meh, who cares? I hardly reboot as it is. I don’t really see who this feature is a big deal for.

2015-04-16 6:47 am
tylerdurden
I doubt is of much utility for the consumer space. This is more relevant for the mission critical people.

2015-04-16 11:47 am
ThomasFuhringer
It seems to me for the desktop user new kernel releases have become irrelevant quite a few years ago. The kernel is long good enough for desktop use. On the server space it is obviously different.
2015-04-17 8:39 am
Soulbender
Even less so there. Mission critical servers already have redundancy and failover so rebooting is no issue at all.

2015-04-17 3:41 pm
Alfman verbose=1
Soulbender,
Even less so there. Mission critical servers already have redundancy and failover so rebooting is no issue at all.
I wouldn’t say “no issue at all”, there are some technical challenges/considerations. Even with redundancy and failover, there’s often some short period of downtime, and you can loose sessions when applications go down and new ones come up. In a complex multi-tier system (database, application server, local application), with multiple points of failure, it may be difficult to implement redundancy that guaranties full session continuity in the event of an outage such that the operators won’t notice an interruption happened (ie blocked operations, hung sessions, lost data, incomplete transactions, etc). This can be trickier if there are many subsystems working together (say the application server is tied to the phone system).
It may be a bit easier to handle a scheduled event like a reboot because the server doesn’t need to be terminated abruptly and it could get phased out in a controlled manor. However even this would require infrastructure in front that’s smart enough to route new application sessions to the new server while allowing old sessions to finish gracefully on the old server.
I think such factors should be taken into account in determining whether rebooting is an issue or not. In any case I think it’s nice to have options like no-reboot updates.
Edited 2015-04-17 15:43 UTC

2015-04-15 11:58 am
WereCatf
Personally, I’m happy that Btrfs raid5/6 can now be considered stable. In 3.19 it worked, but there was a regression that caused it to be slower than it should have been, but in 4.0 that has been fixed.

2015-04-15 12:32 pm
gilboa
Personally, I’m happy that Btrfs raid5/6 can now be considered stable. In 3.19 it worked, but there was a regression that caused it to be slower than it should have been, but in 4.0 that has been fixed.
Stable? AFAIK it is still being developed.
– Gilboa

2015-04-15 4:34 pm
riha
So is the linux kernel, stil in development.
All software is still in development as long as something is maintained and new versions are released.

2015-04-16 5:01 pm
gilboa
So is the linux kernel, stil in development.
All software is still in development as long as something is maintained and new versions are released.
Its sound like you never used Linux in production.
Staging file system != changes in a network driver and/or optimization to the character device layer.
An OOPS within the network layer will require a reboot. An OOPS in a staging file system may require a restore from very recent backup.
– Gilboa

2015-04-15 6:52 pm
Alfman verbose=1
WereCatf,
Personally, I’m happy that Btrfs raid5/6 can now be considered stable. In 3.19 it worked, but there was a regression that caused it to be slower than it should have been, but in 4.0 that has been fixed.
I was having severe write-performance problems with ext4 and could not figure out why. I just tried again using 4.0 and it is working much better. So if anyone else is seeing FS performance problems, 4.0 might be worth a shot!
I also retested raid5/6, performance is unchanged.
Incidentally, I’m quite unhappy with Linux raid 6 write performance. Note that these are for 100% linear raw disk writes, which is ideal for raid owing to zero theoretical disk io overhead.
Single Disk:
Read 1 thread: 122MB/s
Read 4 threads: 486MB/s
write 1 thread: 108MB/s
Write 4 threads:138MB/s
Raid 6 with four disks:
Read 1 thread: 234MB/s
Read 4 threads: 447MB/s
write 1 thread: 84MB/s
Write 4 threads: 98MB/s
Raid 5 with four disks:
Read 1 thread: 348MB/s
Read 4 threads: 776MB/s
write 1 thread: 112MB/s
Write 4 threads:140MB/s
Even raid 10 write speed is disappointing, the theoretical write speed should be about twice the single disk speed (276MB/s), but it’s only 183MB/s, or 45MB/s faster than a single disk.
Raid 10 with four disks:
Read 1 thread: 240MB/s
Read 4 threads: 673MB/s
write 1 thread: 173MB/s
Write 4 threads: 183MB/s
We could chalk it up to a system performance bottleneck, however raid 0 proves that the software can sustain writes on four disks at 424MB/s. Raid 10 writes should support 212MB/s, there’s a big software bottleneck somewhere.
Raid 0 with four disks:
Read 1 thread: 352MB/s
Read 4 threads: 812MB/s
write 1 thread: 305MB/s
Write 4 threads:424MB/s
I realize that the file system will cache writes on top of this, but why is the underlying raid performance poor? Has anyone else noticed this? Any ideas how to get better write performance on a linux raid array?
Edited 2015-04-15 18:54 UTC

2015-04-16 5:02 pm
gilboa
Benchmark tool?

2015-04-16 8:04 pm
Alfman verbose=1
gilboa,
Benchmark tool?
I tried Bonnie to test file system level performance, but when I started suspecting the raid performance I scripted a simple dd command to get raw linear block device performance without conflating file system overhead/caching. If you want to recommend another benchmark tool to check out, I can try it out.
The best case scenario for raid is with very long writes, such that no data has to be read from disk to calculate xors for a stripe. Instead the entire stripe is being written out. In my case it is 2 data blocks plus 2 parity blocks. In this best case scenario, even raid6 should multiply bandwidth by 2.
The worst case scenario for raid-6 obviously happens when writing single sectors randomly such that every single disk in the stripe is has to be read or written for each sector written to the raid. The maximum theoretical write performance collapses down to the write performance of a single drive.
My benchmarks for Dell PERCs bare this out, they do an excellent job of approaching theoretical maximums of either extreme. With the BBU caching they can even drastically exceed those maximums for a short period. I’m willing to accept that software raid will incur some more overhead, but what’s been bothering me is that linux raid6 seems to exhibit the worse case scenario performance all the time! I haven’t examined the source yet, but the performance might be an indication that linux is writing (and rewriting) the stripes one sector at a time rather than more optimally writing multiple sectors one stripe at a time.
I would love to solve this problem because I have a project where linux raid6 would be great. By playing around with file system caching parameters, I get decent performance for regular file I/O. But for this project the raid was to host LVM with raw disk images rather than a normal file system, which is why I’m so interested in optimizing the raid strictly for large linear IO. As far as I can tell it should be able to pull in 200MB/s, which easily saturates the network, but 80MB/s isn’t good enough. I’m just wondering if anyone knows tricks to getting better performance out of it.

2015-04-17 3:10 am
gilboa
If you can send me the script I’m willing to test it on a number of machines with software RAID (5,6,10).
To be honest, my experience is quite different.
I develop a software that writes very large number of streams of simultaneously.
In my experience, high-end RAID6 controllers with 8-16 drives performance marginally better then software RAID6 using the same configuration and even the CPU load is nearly identical.
… and I’m talking about 300-400MB/sec writes across ~16K+ streams.
I should add that software RAID6 does exhibit slower performance when doing with random r/w.
– Gilboa
Edited 2015-04-17 03:11 UTC
2015-04-17 4:22 pm
Alfman verbose=1
gilboa,
If you can send me the script I’m willing to test it on a number of machines with software RAID (5,6,10).
Definitely, I wish OSnews had private messaging so I could send you my email. Anyways, I’ll post a link to the information here when I get a moment to put it together.

2015-04-15 12:40 pm
Drunkula
Something can be stable and still be in development.
2015-04-17 8:38 am
Sodki
Installing updates doesn’t mean your system is secure, it just means that the code on disk is secure. But if you are already running the application you just updated, the unsecure code is living in memory. Rebooting guarantees that you are running updated versions of all your applications and not vulnerable code.
Having said that, the ability to live patch kernel is really invaluable in some scenarios, like virtualisation or containment hosts.
2015-04-18 5:33 am
defdog99
But in a world of zero day exploits and security issues… isnt rebooting a requirement “just to make sure everything picks up the new change” ?

2015-04-18 1:55 pm
hussam
But in a world of zero day exploits and security issues… isnt rebooting a requirement “just to make sure everything picks up the new change” ?
Correct. You still need to reboot. I don’t know how this specifically works but I guess it intercepts the next call to a function and routes it to the updated function in the updated kernel (I could be very much mistaken here).
You really only want this feature when you absolutely need a fix for a bug in a running kernel but still cannot afford to reboot.
Edited 2015-04-18 14:09 UTC