It seems like Google is working hard to update and upstream the Linux kernel that sits at the heart of every Android phone. The company was a big participant in this year’s Linux Plumbers Conference, a yearly meeting of the top Linux developers, and Google spent a lot of time talking about getting Android to work with a generic Linux kernel instead of the highly customized version it uses now. It even showed an Android phone running a mainline Linux kernel.
Android is the most popular Linux distribution by far, so a move to a more generic Linux kernel benefits the ecosystem as a whole.
This would be great for me. Running mainline linux without modification would be totally awesome. However I’m hesitant here because the proprietary driver problem has always been a huge impediment with ARM/android.
I’m glad that google’s explicitly talking about the elephant in the room: the lack of a stable linux ABI. On the one hand the lack of driver ABIs in linux is what’s responsible for keeping users dependent on manufacturers for kernel updates. On the other hand, the linux development community has been adamant that linux cannot or should not support stable ABI like windows does believing that explicit support for proprietary drivers goes against the linux philosophy and would only encourage more proprietary drivers.
I really don’t know that google has what it takes to pressure mainline linux to support a stable API, but if that happened we might finally see android devices disconnected from specific kernels and the possibility to upgrade the kernel independently from the manufacturers. This has long been a gripe of mine with android.
“On the other hand, the linux development community has been adamant that linux cannot or should not support stable ABI like windows does believing that explicit support for proprietary drivers goes against the linux philosophy and would only encourage more proprietary drivers.”
Its not that straight forwards. There are two elephants in the room not one. The stable ABI has a elephant of it own. Its like why desktop 32 bit windows was limited to 4G of memory when it was enabling cpu PAE that supports upto 64 G of memory in 32 bit mode.. Then this same elephant reappears with Spectre and meltdown. The only way to fix these things is rebuild the drivers/applications. So binary driver itself is a disaster.
Linux kernel does not just lack a stable ABI for drivers it lacks a stable API for drivers. Now a stable API for drivers would be a good thing where the source of driver can build always. Stable ABI where you cannot rebuild the driver to alter the machine code to avoid silicon issues is like putting a kick me sign on your back then wondering why you are getting your ass kicked..
So it will be important to be as close to open source mainline as possible so that when major silicon level issues as much of the kernel as possible can be rebuilt.
Its also been fun that Android has been breaking the Linus rule of don’t break the userspace API/ABI. They are talking about truly finalizing the Android kernel to user-space.
So its like most things you are really after half way in the middle. Stable kenrel ABI brings security problems like it or not. Lack of stable kernel API/ABI for driver developers make out of tree driver development hard.
Stable API for Linux kernel driver development without a stable kernel ABI would be the idea. So vendors could provide source of their drivers that would always be rebuild-able no matter the security update required at machine code level.
oiaohm,
Hi oiaohm old chap, I was thinking to myself that you would respond 🙂
There’s no implication that an ABI must be supported indefinitely. For example, windows X drivers may continue to work in windows Y but not windows Z. You pick a policy to balance between the need for change and the need for stability. With linux, most users could benefit from drivers being compatible at least within the same major version, as google is suggesting in the article.
PAE was a hack to give 32bit CPUs access to more ram. Sure it was a problem, however it was a problem for all 32bit operating systems regardless of ABI (including linux). Moving to a new 64bit architecture solved the problem for both linux and windows.
I want to be clear: being stuck with proprietary code is a problem. However doing away with an ABI doesn’t really solve that problem. Either the manufacturers provide support and actively maintain drivers, in which case there is no problem, or they do not, in which case not having an ABI means you can’t update the kernel much less the driver. At least with an ABI you could update the kernel.
I’m not disagreeing with that, ideally we have the source code, however in practice this is not happening for mainstream devices. We’ve been waiting years upon years for totally open source products and yet there’s no indication that manufactures want to change their proprietary ways.
Yes I agree there is a compromise solution in the middle. Stable ABI do not generally create security problems, but never-the-less if there is a good reason to break compatibility, then so be it (I agree with google’s position on this). It still would be beneficial to users to have “best effort” compatibility over none.
Again, I’m not going to argue against the benefit of open source drivers…duh. I understand this point of view and I believe I approached it fairly in my original post. But I will argue that by refusing to allow an ABI, the result is a continuation of the philosophical deadlock with end users stuck running old unsupported kernels 🙁
Honestly I don’t like being right about this, but I’m afraid that it is true.
>”PAE was a hack to give 32bit CPUs access to more ram. Sure it was a problem, however it was a problem for all 32bit operating systems regardless of ABI (including linux). Moving to a new 64bit architecture solved the problem for both linux and windows.”
No this is failure to understand.
. >”At least with an ABI you could update the kernel.”
This is the problem with a kernel driver ABI the issue is that you cannot in fact update the kernel against many faults The driver ABI expects the structures the kernel to provide not to change. If you need to change them to fix a security issue or implement something like PAE you have a breaking change. Linux had fully functional PAE supporting 64G of memory since 2001. Windows 32 bit desktop basically never got it. Windows 32 bit Desktop never got the extra security protection that 64G PAE provided either so run that time with lower security.
Sections of spectre and meltdown fault remain functional in windows due to the fact fixing it would break the driver ABI support. So another case of doomed.
>”Stable ABI do not generally create security problems, but never-the-less if there is a good reason to break compatibility, then so be it (I agree with google’s position on this).”
No every case of a stable kernel space ABI create security problems by restricting what security fixes can be applied in kernel space. Stable Userspace ABI or userspace to kernel ABI don’t have the same problem. Google developers with libcamera are going down the route of stable ABI in userspace wrapped up under protections. So that closed source blob security defects can be contained and isolated. So stable ABI in userspace does security work.
> “But I will argue that by refusing to allow an ABI, the result is a continuation of the philosophical deadlock with end users stuck running old unsupported kernels ”
Stable ABI in kernel space will always mean not all security patches can be back-ported this is the reality of the fixed structures a stable kernel ABI bring into kernel space. There will have to be ABI breaks to fix security issues this will still leave people dead locked on old unsupported kernels.
Stable API in kernel space there are things you can do to all the machine code changes without breaking the source code. So a stable API for driver development would not leave people locked on older kernel as you would be able to put wrappers over the code to move it forwards indefinitely. And does security work.
Stable ABI in userspace should be targeted where possible(something android has had major issues with). This is another solution with isolation protections that could be used indefinitely. And does security work.
Stable ABI in kernel space is basically comporise excepting failure and making excuses why what you are doing with third parties does not have indefinite support. And does not long term security work. This is the hard reality.
How are we going to work with vendors that what there closed source proprietary ways without long term shooting your users in the foot. libcamera I see as one of the valid correct moves to deal with this problem.
Stable ABI in kernel space should be looked at as nothing more than a stop gap as it always brings security problems sooner or latter of either not being able to patch the kernel or user not being able to switch to newer kernel.
Now if you were pushing for more items like libcamera to give those wanting proprietary ways to put there proprietary stuff in a userspace ABI that can be nicely security isolated I would be 100 percent accepting of it. This will not long term screw users over.
.
oiaohm,
…or a failure to explain…why does 32bit PAE matter to the argument over whether to support an ABI today?
Yes, but 32bit to 64bit was a breaking change anyways…ABI compatibility across architectures was never a goal, so the point is mute.
Please provide specific examples…
If the API used by drivers lacks a robust encapsulation model and drivers are designed to mess around directly in kernel data structures then yes it would be problematic. However this to me suggests the driver abstraction was poorly designed in the first place. While we may need to break compatibility with poor abstractions, the goal is to design and encourage good abstractions such that drivers are not messing around with global kernel structures. As a side benefit, good abstractions will help make the kernel more robust, keep subsystems independent and minimize spaghetti code.
Again, I have to ask you to flesh out your claims of security problems with specifics.
BTW, I don’t make any excuses for third parties not having “indefinite support”, I merely observe the fact that they don’t. Like I said earlier “You pick a policy to balance between the need for change and the need for stability.” Both extremes are problematic for users, that’s why a compromise is needed.
You may not admit it, but users do have reasons to want a stable ABI even if you are dead-set against it.
>>…or a failure to explain…why does 32bit PAE matter to the argument over whether to support an ABI today?
Its a failure to learn from history. PAE support failure caused a blockage on the Windows kernel 32 being able to use features it had. If we are not aware of what causes this we will do it again. So windows 32 bit run for over a decade with downgraded security.
>> If the API used by drivers lacks a robust encapsulation model and drivers are designed to mess around directly in kernel data structures then yes it would be problematic.
Except once you are are running third party binary in ring 0 you are screwed. Microsoft did all what you describe with windows drivers. Windows 32 bit had to give up on PAE support not because there encapsulation was not correct. Its that the person coding the drivers designed to disobey the instructions for performance so making when ever anyone enable PAE fully windows desktop could random fail. Ring 0/kernel space you don’t have any proper means to prevent a third party driver ruining you like this.
The best way to enforce good abstractions is not allow third party code directly in kernel space in the first place.
https://www.extremetech.com/computing/301876-one-of-intels-recent-bug-fixes-carries-a-performance-penalty
Please note the PAE issues with windows 32 bit drivers are in fact the same kind of problem as fixing like this condition jump across 32 byte boundaries and other spectre and meltdown fixes. Where you in fact need to change alignments of stuff to fix the security problem properly you have a major elephant against closed source kernel space. If what you are loading in ring 0/kernel space in a binary you cannot fix those alignments and you cannot put absolutely solid privilege restrictions around it. If you are loading into ring3/userspace you may not be able to fix the binary but you can limit is privilege and system access so control the risk of out of date driver..
Like or not to close lot these cpu design security fault off properly from being able to be a privilege exploit you need to update every bit of machine code/binary running in ring 0. Third party closed source driver in kernel space you are not going to be able to do this.
There are options but that are options that will work.
Lets take extfuse this has the Linux kernel take BPF bytecode and turn this into native executable code itself to be run in ring 0. What advantage does this have. Since it a bytecode that is turned to machine code for ring 0 by the kernel any required alignment corrections can be performed in the BPF build by Linux kenrel. This means the extfuse stuff running in kernel space is as secure as the all the other kernel stuff. This kind of path can be secure and have long term support. Yes that is a kernel space to userspace API with the means to accelerated run stuff in kernels space after being validated and modified for current security requirements..
Doing libcamera with a userspace ABI allows you to sandbox the heck out of the untrustworthy third party driver code. The untrustworthy driver code does not get a chance to do invalid abstraction.
History of kernel drivers on windows, OS X and Solaris tells you if you give third party driver makers a chance to screw you over they will.
Yes Microsoft attempted to make singlarity with all drivers that run in kernel space be bytecode because they saw this same problem that you cannot trust third party driver makers to put machine code they make in ring 0 of a operating system if you do they will screw you over.
https://www.battleye.com/support/faq/
Alfman remember you provide means for people to load their driver in kernel space you will end up with items like battleye that disobey what ever rules you set down. The question is not if but when.
This is why libcamera with userspace drivers, fuseext with bpf, How google is replacing lots of their custom networking stuff with bpf stuff.
There are basically a lot of options that work. And work properly from a security point of view.
Then you have the idea of lets make a kernel ABI to allow third party drivers that are just built using normal linker and compiler options. Basically long term this does not work.
Lets say instead you went for we will make a stable API. Then we work out how encode a driver using that stable API to like the BPF bytecode to be built to native by the kernel when it needed. This could be reasonably well performing and secure.
Basically what need is not exactly a kernel ABI in the classic meaning of it.
oiaohm,
Ok, but all your complaints about code running in ring zero have nothing to do with a stable ABI, you are complaining about proprietary code that you cannot control/fix. And on that point we agree: it’s better to have open source drivers if that’s possible.
Of course the linux & android communities may or may not backport kernel fixes/developments, but having a stable ABI is an improvement that opens up the potential for LTS releases like ubuntu has. Sure eventually canonical will stop LTS support after a while, but it would be disingenuous to ignore the benefits of LTS all together just because it will eventually be unsupported. It’s about balancing change and stability, just like I’ve been saying.
Not for nothing, but if you’re looking to build a microkernel, then linux may not be the right choice. Maybe you want linux to go in this direction, which is fine, there can be merit to microkernels, but linux developers traditionally lean towards modular/monolithic kernel designs.
I wonder what your thoughts are on windows, given that it has been moving more drivers into userspace like you suggest.
> > Of course the linux & android communities may or may not backport kernel fixes/developments, but having a stable ABI is an improvement that opens up the potential for LTS releases like ubuntu has. Sure eventually canonical will stop LTS support after a while, but it would be disingenuous to ignore the benefits of LTS all together just because it will eventually be unsupported. It’s about balancing change and stability, just like I’ve been saying.
You should not ignore the disadvantages of LTS either. One of Google Android Kernel developers goals is to get off the LTS kernel and on to the mainline kernel. Because the security disaster of the LTS kernel is quite bad due to security patches that cannot be back-ported.
Another effect of being on LTS means when a fault has been upstream added in mainline by the time patch is back ported to the LTS branch the kernel developer who made the fault he is working on something else so may not be able to fix it. Todo this of course requires getting rid of all Google Android own custom addons to kernel by either up-streaming or using something different that is upstream. Then dealing with the vendor stuff. If you watch all the conference like I had they are serous-ally working on this.
> > Not for nothing, but if you’re looking to build a microkernel, then linux may not be the right choice. Maybe you want linux to go in this direction, which is fine, there can be merit to microkernels, but linux developers traditionally lean towards modular/monolithic kernel designs.
1. Hybrid microkenrel/monolithic/bytecode methods is the Linux kernel. Stuff that has source code being modular/monolithic that is updated when the mainline kernel is happens to be high performance option and if you don’t open it up to third party drivers might be the right move.
2. Then you have a Userspace I/O and Fuse that is you mircokernel stuff to a point in the Linux kernel.
3. The is the libcamera work this is more Microkernel where a subsystem itself runs in userspace with the driver inside that subsystem and the driver inside that subsystem has to put all kernel requests though that user space subsystem first. This happens to be the historic X11 model and a good section of the opengl model.
4.Then we have BPF bytecode this is something like the https://en.wikipedia.org/wiki/Singularity_(operating_system) idea of a byte kernel built by the kernel to native code in the kernel ring 0. This is your fuseext and your network filtering stuff with BPF.
So we have 4 possible paths on Linux. Only one is a kernel space ABI allowing machine code binary drivers. BPF bytecode is not machine code but still can be a lightly documented binary driver keeping some level of secrets.
> > I wonder what your thoughts are on windows, given that it has been moving more drivers into userspace like you suggest.
Microsoft wrote the white paper detailing the problems when they first attempted Singularity OS. Microsoft is attempting to dig there way out of the problem of kernel space ABI by moving more drivers to userspace. Micorsoft second plan of bytecode only drivers failed on Microsoft but that does not mean it was a bad idea over all. We don’t need to dig our way in if we can avoid it.
When a third party is wanting a closed source binary blob driver we need to consider every one of those cases can it be suitable services by 2-4 on my list. If it can be suitable serviced by 2-4 on my list it does not need kernel ABI for machine code binary drivers and give that interface is a mistake.
oiaohm,
I want android to use mainline too, that’s the entire point of this article. I’m not sure if you read the article though because one of the obstacles to using mainline is the lack of stable ABI. Read the article if you haven’t already done so, I am in agreement with it and google.
The decision to have something in the kernel or not should be based on technical considerations rather than whether it’s a third party driver. And even if you believe 3rd party drivers should be in userspace, what are you going to do to convince manufacturers to listen to you? Why should an android manufacturer re-engineer drivers in userspace and incur a performance and battery life penalty to satisfy you?
No doubt those are fun ideas to think about, but it’s tangential to the issue of stable ABIs. A microkernel can have stable or unstable ABIs, a VM kernel like singularity can have stable or unstable ABIs, a monolithic/modular kernel like linux can have stable or unstable ABIs, etc. If you want to argue that linux should change it’s driver model, then more power to you. Seriously you’re welcome to promote microkernels and VM based alternatives as much as you want…however until you convince the linux community to embrace a new microkernel/VM driver model for low level chipset hardware, manufacturers and linux devs are going to continue to rely on drivers that run normally in ring 0 and don’t incur the overhead of userspace calls.
>> I want android to use mainline too, that’s the entire point of this article. I’m not sure if you read the article though because one of the obstacles to using mainline is the lack of stable ABI. Read the article if you haven’t already done so, I am in agreement with it and google.
No what you said is not 100 percent in alignment with what google personal said in the mini conference. libcamera is google personal. Also they mentions in the mini conference they have 3 devices running only mainline kernel only. No third party drivers at all.
So a stable kernel space ABI in the google developers eyes is a stop gap not a long term solution.
>> And even if you believe 3rd party drivers should be in userspace, what are you going to do to convince manufacturers to listen to you? Why should an android manufacturer re-engineer drivers in userspace and incur a performance and battery life penalty to satisfy you?
This is because you did not watch the mini conference complete particular the libcamera bit. It very much your driver ring 0 will be mainline kernel you closed source bit will run inside libcamera containment or not be certified as Android compatible hardware. Libcamera is intentionally LGPLv2.
Basically google is sick of this https://lwn.net/Articles/529392/ kind of ring 0 bull crap that has been pulled by different vendors as well. Yes this samsung case to show its not only small parties that go and make a ring 0 driver that breaks the complete OS security.
>> however until you convince the linux community to embrace a new microkernel/VM driver model for low level chipset hardware, manufacturers and linux devs are going to continue to rely on drivers that run normally in ring 0 and don’t incur the overhead of userspace calls.
No the reality is ring 0 drivers will be mainline Linux kernel only some point in Android future. Google is putting there foot down segment by segment under Android. So it small numbers of upset driver makers at a time.
By the way BPF and ACPI bytecode does not have to incur the overhead of userspace calls either. Virtual machine model does not have to have the userspace over head.
LOL you think I need to convince the community. I have only been stating what the Google people are pushing for.
oiaohm,
As it stands I agree with the article and google’s proposal for stabilizing linux ABI, feel free to link to & quote the exact part you think I’d disagree with.
That userspace component refers to image processing, there’s no reason that has to be done in the kernel, however the low level device drivers (around 1h:07m) are still interfacing to the hardware through the kernel.and piped to userspace.
Technically it could be possible to create userspace device drivers for 100% the peripherals one might find on an android phone and I don’t deny your point about microkernels being potentially more secure than what linux can offer today…but remember: 1) linux devs traditionally aren’t very receptive to microkernels, 2) it would take time and effort to reshape linux devices into a microkernel architecture 3) you’ll still need to develop and commit to an ABI with all the shortcomings you were alluding to, 4) the performance and battery life will likely end up being worse.
>> That userspace component refers to image processing, there’s no reason that has to be done in the kernel, however the low level device drivers (around 1h:07m) are still interfacing to the hardware through the kernel.and piped to userspace.
Go back and check again they are not talking about allowing the low level device drivers to be non mainline.
>> 1) linux devs traditionally aren’t very receptive to microkernels
Except I am not talking about pure micro kernel.
>>, 2) it would take time and effort to reshape linux devices into a microkernel architecture
It will take time to reshape to use more vm bytecode solutions as well. bpf is not microkernel is a different beast.
>> 3) you’ll still need to develop and commit to an ABI with all the shortcomings you were alluding to,
Exactly not all issues can be fixed instantly.
>> 4) the performance and battery life will likely end up being worse.
That the interesting mistake. A steams like ring 0 would always be the best performing problem is reality is not that simple. Linux kernel space ring 0 forbids using particular CPU features. These are cpu features that can in fact save way more cpu time than a context switch will cost. So a ring 0 driver can in fact have a worse performance than the user-space equal. Ideal driver in a lot of cases exists in userspace and ring 0 so it can choose the mode with the right CPU features use able. This is why extfuse is intersting with BPF in ring 0 and the means to go back out to userspace when processing makes sense to be in userspace. extfuse benchmarks get interesting that particular ones are faster because the operation was faster than the context switch overhead having the processing in userspace.
So there are times that a ring 0 driver is the slowest and worst power using option. Yes this is more often that one would think.
This is now where things get really trick that a ring 0 driver loses to a hybrid ring 0/user space solution.
Microkernel issue is double sided. Yes too many context switches kill microkernel solution performance same way it kills traditional fuse performance. Hybrid picks up advantages.
https://www.kernel.org/doc/htmldocs/kernel-api/API-call-usermodehelper-setup.html
Yes it really simple to over look how many driver in the Linux kernel these days are using a usermode helper because context switch problem of Microkernel is not you only problem. CPU feature limitation of ring 0 is also quite a major problem.
Linux kernel drivers these day there are quite few that are hybrid as in being part monolithic design and part microkernel design. So your idea Linux kernel developers not receptive to microkernel ideas is wrong. They are not receptive to a pure microkernel most of the time. Hybrid stuff that is part microkernel and part monolithic they are very receptive to because it performs insanely well with security advantages.
Going after the best power usage and performance leads to some very interesting design choices.
oiaohm,
So far, I still agree with what google is proposing. You keep making claims about this project in your own words, but I’ve asked you a couple times now to quote exactly what you are referring to. If you want to qualify your views as your own opinion, that’s fine, but otherwise will you please cite the references for your claims rather than paraphrasing them yourself? Thanks.
This is what I mean, you tend to make tons of assertions, but often times you won’t back them up with references or proofs. Can you understand why this is a problem? It’s an interesting topic and I don’t mind debating these things. You do say a lot of things that are intriguing and insightful, but I wish you’d put a little more effort in defending your viewpoints using logic, data, and references, and less focus on “What I say is true and that’s all you need to know”, because between peers, that really doesn’t go well. Are you a professor by any chance? That could explain things, haha.
If you want to explain what you mean with examples, references and data, then I encourage you to do so and there’s a good chance we may agree on what you are saying, but I don’t really feel like being lectured to and responding to claims that start out as appeal to authority arguments, which is most of them…
https://www.logicallyfallacious.com/tools/lp/Bo/LogicalFallacies/21/Appeal-to-Authority
If you want to have a meaningful two way discussion, provide supporting evidence for what you are saying!
This was partly a market segmentation thing too. 32-bit Windows Server had PAE with larger memory space, and many drivers worked fine (either directly or after updates), at least the ones likely to end up on a server. They could have enabled it the same way on the desktop variants and let the vendors play catch-up (perhaps with a “We had to disable some features to let X driver work, see your vendor” warning/failsafe) but they didn’t want 32 bit users using more than 4GB ram (they wanted you to get a server edition, or later, 64 bit, for that).
There’s less of a market segmentation case for Linux.
I kind of agree that it was part segmentation. I also had the fun of a HP server that when you enabled Windows Server PAE mode that inserting USB key came instant reboot. Why the USB driver for remove able storage was not PAE supporting in windows 2003 and 2008.
So PAE with 32 bit windows server was only works if you use the right hardware if you get it wrong instant reboot.
Linux PAE on 32 bit worked this was all drivers were in fact built for PAE.
After doing a bit of kernel work, I completely agree with Linus policy. Keep things stable for the user, but don’t spare effort on drivers who choose to remain outside the kernel development community. The kernel is fairly stable internally with a sane and slow deprecation schedule, but it is also a massive, vibrant project with thousands of drivers and hundreds of subsystems evolving at their own pace often without larger coordination.
It already takes months to get code released once it is accepted. Why on earth would we hamstring the process further to help out people who are going to cause more work and pain for us if we accommodate them?
The policy makes it very clear that the mainline kernel owes no duty to maintain, troubleshoot or help with drivers that are not up-streamed into the kernel. Out of tree developers have done their users a disservice to start with, and they know that if your driver breaks you get to keep the pieces.
Joshua Clayton,
That is the crux of the question, do you accommodate out of tree development or not?
As a kernel developer from time to time, all of my patches were out of the tree…and it’s not always because the patches are proprietary, many developers don’t carry enough status or importance to get their code merged into mainline. For example the developer of AUFS (popularized by knoppix/live cds) tried diligently year after year to merge his unionfs into linux when it didn’t have one, however it was never accepted. My own linux kernel needed to support a union file system, yet I had to deal with linux kernel breakages on a regular basis. Several years later mainline linux devs implemented their own overlay FS, which was pretty similar (probably a bit of “not invented here” syndrome).
You’re overlooking that mainline linux development can be somewhat of a privilege and some projects don’t have a choice about existing outside the mainline linux tree.
For developers of small kernel side projects, out of tree development is sometimes a fact of life. Mainline linux cannot just accept all code projects submitted to it. That would be unrealistic even in the ideal scenario where it’s all open source! Merging everything into mainline indefinitely reminds me of the garbage heaps in the movie idiocracy:
https://www.youtube.com/watch?v=ZIBj2GIbGo0
We need a better plan and better organization. The linux approach of throwing everything into the kernel code base was ok when it was a smaller/younger project, but now the monolithic tree is a liability and an impediment to manageability. The linux kernel has become extremely bloated, a fact that linus himself concedes:
https://www.theregister.co.uk/2009/09/22/linus_torvalds_linux_bloated_huge/
Naturally, the solution is to divide the project and take stuff out of the mainline tree, but ideally we would do it in a way that the pieces taken out can still be supported by those who still want/need it….and though I realize the linux masses have been programmed to recoil at the thought, there is an answer is sitting in front of us: stable ABIs.
“” > > Naturally, the solution is to divide the project and take stuff out of the mainline tree, but ideally we would do it in a way that the pieces taken out can still be supported by those who still want/need it….and though I realize the linux masses have been programmed to recoil at the thought, there is an answer is sitting in front of us: stable ABIs.
Problem there are 3 possible answers for stable ABI.
1) Stable userspace ABI this is your common microkernel stuff. That has quite a overhead.
2) Stable bytecode ABI yes the BPF stuff.
https://lwn.net/Articles/759188/
Yes here with infrared (IR) decoding thousands of drivers were in fact nuked from the Linux kernel sources and third party trees. These days those drivers are now BPF uploads to the kernel.. Then you have fuseext that being used to nuke android wrapfs that again is BPF in kernel space…. There are a growing list of lets use bytecode. Yes these lets use bytecode is ending up with performance almost exactly the same as if you had gone to the effort to make a normal kernel module.
3) Stable kernel space ABI for kernel modules this has some serous security problems from the nature of monolithic.
“” > >You’re overlooking that mainline linux development can be somewhat of a privilege and some projects don’t have a choice about existing outside the mainline linux tree..
Yes there are projects that have no choice but to exist outside the mainline Linux kernel tree but that does not mean those projects need the stable kernel space ABI for kernel modules. Due to the security problems of out of tree kernel space ABI for kernel modules they should be looking at the userspace driver or the BPF driver solution first.
Yes the BPF solution is quite high performing in fact can work out faster than shipping a binary build kernel module why because as the kernel gets updated so does the JIT that coverts BPF to native so allowing your driver to pick up new performance optimizations.
There are downside for a third part using kernel space ABI for kenrel modules. There are a lot of cases where parties are choosing kernel modules instead of BPF because it means they would not have to make BPF interfaces for upstream so they can code their hardware control blob.
oiaohm,
Sure. it is neat to think of alternatives to the monolithic/modular drivers used for low level hardware in linux. But it still does not offer an immediate solution to the problem.
In the long term, in theory, one might re-write all of linux’s ring-0 driver dependencies to use a managed language/VM driver model instead. But to be honest linux does not seem like the best choice if you want to buck the monolithic/modular kernel design. If you want something like singularity, it’d probably be better to just use something like singularity in the first place…just my opinion.
>> In the long term, in theory, one might re-write all of linux’s ring-0 driver dependencies to use a managed language/VM driver model instead.
That is not google plan.
>> But it still does not offer an immediate solution to the problem.
Maybe there is not a proper immediate solution to the problem.
Google plan is simple.
1 they stated it they want devices to work with the Linux kernel mainline only no third party drivers at all. This is a stated goal. Yes this stated goal is more extreme than that its 1 kernel binary with drivers per CPU arch. So this is where google developers a heading.
2 they are going to provide a stable kernel ABI as a stop gap not as a final solution.
3 with libcamera and other areas they will put there foot down nuking all non mainline ring 0 drivers from from individual classes of hardware.
You say you support what Google is doing. You don’t you are thinking that the kernel stable ABI is more than a stop gap and the Google developers behind android see the kernel stable ABI as only a stop gap.
Yes there will be ring 0 drivers in the google model. No those will not come from third parties. Managed language like the IR BPF and extfuse and networking stack stuff will be all third party driver makers for android devices will have to run stuff in ring 0 at some point. Because your not mainline ring 0 driver will mean your hardware fails certification at some point. Yes those making cameras for android devices to use currently have google placing their head in the guillotine of if they don’t make their stuff mainline and userspace they will no long be able to provide their stuff into the android market for certified devices. Google will do this again for other classes of hardware until there is only ring 0 drivers from mainline.
Google is being very clear on what their final goals are. This final goals are being caused.
https://lwn.net/Articles/529392/ by like the the samsung mess where you trust a vendor to provide a third party ring 0 driver and they go and break the complete platform security.
Yes google plan is pure brutal.
oiaohm,
Well, I ask that you provide exact links & quotes to support your claims for what google’s plans are after it achieves a stable ABI.
>> Well, I ask that you provide exact links & quotes to support your claims for what google’s plans are after it achieves a stable ABI.
They list a lot. Big thing they mention is objective to get to mainline Linux kernel. Then they mention that there future plans include taking the DRM subsystem out of mainline and back port it into all their LTS kernels.
This is not maintain a stable ABI long term. Once you have all the drivers you need mainline in a subsystem you just cut that subsystem out and backport it. Stuff stable ABI at that point.
What they talk about doing with the single kernel for all android and the fact they have 3 devices that start up on standard mainline kernel without any third party drivers and want more. The plans to basically cut out complete subsystems from mainline and take those straight back to LTS their kernel and to update like that with each revision to the LTS.
Stable kernel ABI is nothing more than a stop gap serous-ally they list all that after the 1 hour mark. By the time media people would have stopped listerning.
Note this is all listed as work that under way while possible ABI stabilization is being done.
items that could worry people are not put as first on the list.
Yes they go on to talk about how they will be getting rid of all google made modifications to the kernel that are not mainline. Yes they need todo this so that a person cannot say hey you have custom modifications why cannot we.
oiaohm,
Once again, you’ve failed to quote anything whatsoever. Pathetic.
“As a kernel developer from time to time, all of my patches were out of the tree…and it’s not always because the patches are proprietary, many developers don’t carry enough status or importance to get their code merged into mainline. For example the developer of AUFS (popularized by knoppix/live cds) tried diligently year after year to merge his unionfs into linux when it didn’t have one, however it was never accepted. My own linux kernel needed to support a union file system, yet I had to deal with linux kernel breakages on a regular basis. Several years later mainline linux devs implemented their own overlay FS, which was pretty similar (probably a bit of “not invented here” syndrome).”
I feel your pain. It can be hard to get stuff noticed, reviewed and accepted and there are those who view the attrition rate of patches by people unwilling to go through the process as part of quality control
“You’re overlooking that mainline linux development can be somewhat of a privilege and some projects don’t have a choice about existing outside the mainline linux tree.”
Here I disagree. It is not so much a privilege of the few, as it is that the effort and attention required can be painful for the little guy. I got several things merged and it was always because I kept things on the back burner and periodically prodded the maintainer, long after my employer no longer cared because the issue was fixed in our private kernel. And I made it as easy as possible for myself by developing against the latest kernel even though we shipped an older one.
“Naturally, the solution is to divide the project and take stuff out of the mainline tree, but ideally we would do it in a way that the pieces taken out can still be supported by those who still want/need it….and though I realize the linux masses have been programmed to recoil at the thought, there is an answer is sitting in front of us: stable ABIs”
I like the idea. I also hope that keeping out of tree drivers in a good state will be aided by the kernel finally blessing an official unit testing framework. The only thing that did it for me was continually rebasing my out of tree work against linux-next. I suppose that could also e done in an automated way
Joshua Clayton,
I don’t see why you’d disagree though. Clearly not everyone’s project gets accepted in the mainline. However I want to be clear that I don’t think mainline should be more inclusive because as I suggested earlier I think it’s already too bloated and if anything I think there are lots of things that could be taken out of mainline.
That’s pretty much what I do too, although for me it is more periodic than continuous, like when I get around to it. I only get away with this because the primary users are clients who’s servers I manage. However if my users were the general public and expected my code to work against arbitrary linux kernels, then I would be up to my neck making & testing code against every kernel…something I don’t get paid nearly enough to do, haha.
I like the idea of having more compatibility within major versions of linux. If anything comes out of google’s push, I hope they can convince the linux community to do this. This way developers could say “this code works with linux 4.x kernels” without regards to what “x” is. I’d be ok with breaking compatibility when they go to 5.x kernels. IMHO this is a good compromise to balance stability and change.
>>> However I want to be clear that I don’t think mainline should be more inclusive because as I suggested earlier I think it’s already too bloated and if anything I think there are lots of things that could be taken out of mainline.
This is https://www.logicallyfallacious.com/tools/lp/Bo/LogicalFallacies/21/Appeal-to-Authority Alfman. There is no evidence to say taking stuff out of the mainline will make thing better.
In fact it will make it worse. This is not a question you only need to look a the kunit and the kselftest work. For a long time Unit testing and end to end testing was outside the Linux kernel. These out of tree test like the Linux kernel testing project really did not work out that great. Evidence like this says taking stuff out of mainline will make it worse in lots of cases.
Also its not take stuff out more is need in mainline. Even the most basic driver over 80 percent of its source can be unit tested. Advantage of proper unit testing is person running unit tests don’t need the real hardware.
The stable ABI fails once you understand the presume problem.
A kernel making stable ABI is presuming they understand how the drivers is using it. The driver developer is presuming the understand what the kernel developers making the ABI expected. Presuming is path to mother of all screw ups. Kernel developer will presume one thing so they are fixing a ABi bug that no one should be using yet some driver somewhere that is not mainline will use it and opps break. Yes lack of unit tests mainline linux kernel has been causing this as well for mainline drivers.
Drivers mainline can have a full lot of unit tests now that kunit is going into mainline this will pickup when the core kernel developers API ideas don’t make the kernel driver makers ideas as long as everything is mainline and the change trips over a unit test.
Unit tests let you in code form document expected behavior. Problem is you need the expected behavior from the driver developers point of view and the core kernel developers point of view with reality meeting somewhere in the middle.
Out of tree drivers make the out of tree drivers unable to show their expected behavior.
Heck even without these unit tests having all the source code it was possible for developers making major changes to run searches across the Linux kernel drivers to attempt to see what the driver developers expected behavior is. Searching source code attempting to find expected behavior is slow and painful and error prone. Unit tests mean developer can modify something run the all the kernel unit tests including the driver ones and see if they have missed some expected behavior. Remember this only works if all the drivers source code with matching unit tests is mainline.
As the Linux kernel gets more complete quality assurance more stuff needs to be mainline not less.
There is a longer example in opengl with the Khronos Group test suite where you find example after example where driver developer presume one thing and program developer presumes something different yet the test suite pass no problem. So graphics driver developers end up testing with real programs as well as the. Game developers need to be testing with driver made vendors as we,ll all the time. So now you have a quad never ending treadmill.(kernel developer checking integration with drivers/ driver developers checking integration with kernel/ driver developer checking integration with programs and programs check integration with drivers) This quad tread mill happens with all closed/open source drivers not integrated mainline be it windows, os x or Linux providing a so call stable kernel ABI does not fix this problem.
Some of the reason why open source DRM drivers are able to develop so fast at times is avoiding lot of these treadmill problems. Of course this has also been hindered by lack of Linux kernel mainline unit tests. So not enough mainline so complete other way over to your idea. Removes a lot of black box presuming as well. Basically the only way to reduce and remove this treadmills of duplicated work is mainline the drivers and the test stuff. So that all parties can run as many of the tests a possible so avoiding having to be double handle stuff to attempt to find out if they missed something.
oiaohm,
First of all, you’re the one guilty of appeal to authority fallacy, which I’ve already called you out on. I readily admit when something is my opinion, as opposed to fact. To be sure, many of my posts are nothing but opinions. My opinion that linux is bloated isn’t that unusual, slimming it down would clearly make it more manageable, the big question is what’s the best way to do this, and there are differing opinions about that too. If you disagree, it’s your prerogative, but once again we’re just talking about opinions here, there’s no right or wrong answers. I honestly could respect your opinion IF you didn’t try to elevate your opinionated claims to the level of fact. You keep failing to provide references/quotes/data/proofs even when asked to do so repeatedly. You simply respond that I’m wrong using more of your opinions. Sorry to be blunt, but such arguments seem lazy to me and IMHO it is self-detrimental to block your mind from taking in new ideas. It doesn’t necessarily make your opinions wrong, but I think you are impeding your ability to expand your understanding of the subject. We’re all here to learn, but to do so effectively requires considering the merit in different points of view.
Personally I accept both the pros and cons of stable ABIs, which is why I believe in a compromise between the two, such as the one google is promoting. I don’t dismiss the arguments against long term stable ABIs, however it doesn’t make much sense to ignore the harms that come from ABI breakages. For some people, and I believe you fit in here, a pure philosophy is more important than pragmatism, but you need to own that. An absolute uncompromising philosophy is not the best approach for everyone, many users would be better served by a more balanced approach. I don’t expect you to admit it, frankly, but you know it’s true.
>>> Personally I accept both the pros and cons of stable ABIs, which is why I believe in a compromise between the two, such as the one google is promoting. I don’t dismiss the arguments against long term stable ABIs, however it doesn’t make much sense to ignore the harms that come from ABI breakages. For some people, and I believe you fit in here, a pure philosophy is more important than pragmatism, but you need to own that. An absolute uncompromising philosophy is not the best approach for everyone, many users would be better served by a more balanced approach. I don’t expect you to admit it, frankly, but you know it’s true.
Google is not promoting a compromise between the two. Why did google special demonstrate 3 devices running on mainline kernel no third party drivers at all and state they want more systems like this. Google developer at the LPC 2019 end up having a arguement with a power management guy in the Linux kernel about them turning off particular regulars because it caused some phones screens go out on with the mainline kernel without the vendor driver. Yes wanting some way to fix this so they did not need vendor drivers.
Actions speak a lot louder than words. Google lead kernel developer actions are clear they see Kernel stable ABI as a stop gap. Something to provide like better power effectiveness over the stock kernel but not something that should be required so devices work.
Alfman drop your logical mistake. Watch lot more of the LPC 2019 videos where the Google android kernel developers are presenting. There comes a very clear story. Yes they will support kernel ABI. Not to take stuff out of the kernel but to work around issue.
Google android kernel developers want more code mainline not less. They are not development the stable ABI to make it simpler to make third party drivers. The stable ABI has a secondary effect of making areas of the kernel API more stable so making it simpler to merge more drivers into mainline as well.
Everything google kernel developers are working on has one linked objective how do we get more code into mainline kernel and need less out side mainline kernel stuff.
>>> An absolute uncompromising philosophy is not the best approach for everyone, many users would be better served by a more balanced approach. I don’t expect you to admit it, frankly, but you know it’s true.
No the lessons from the 2015 power grid mess tell us clearly there is no option for a balanced approach. How did they stop the spread of that by in fact removing kernel ABI driver and replacing it with something they had the source code of.
The problem is I am not talking from philosophy I am talking from looking at active issue reports of what in hell you cannot do if you wish to be able to beat a cyberattack. One of the things you cannot do is trust closed source binary blobs/drivers. If possible you need to be able to run without them. This is not some optional requirement you don’t this you are technically screwed and if you are unlucky like the .Ukraine power grid personal this technically screwed turns into real world with your back against the wall forced to do horrible risk things to get out of it.
oiaohm,
For gods sake man, take your own advice and take action: QUOTE THE DAMN THINGS YOU ARE REFERRING TO! You keep bringing up your opinions, but when it comes to making claims about others you don’t get the benefit of the doubt here, you need to quote them! Since you steadfastly refuse to back up your claims with specific details, I’m not wasting any more time arguing with your opinion. I agree with google’s ABI plans, which is a move in the right direction, if you don’t like that then tough shit for you.
>>> For gods sake man, take your own advice and take action: QUOTE THE DAMN THINGS YOU ARE REFERRING TO! You keep bringing up your opinions, but when it comes to making claims about others you don’t get the benefit of the doubt here, you need to quote them! Since you steadfastly refuse to back up your claims with specific details, I’m not wasting any more time arguing with your opinion. I agree with google’s ABI plans, which is a move in the right direction, if you don’t like that then tough shit for you.
You have not watched the 3 hour LPC 2019 mini conference. I have.
Google ABI plans is not to allow drivers to be removed from mainline. You would know what I was saying if you had watched that mini conf.
To make it simple.
Google wants what Windows Safe mode is as the mainline linux kernel with no third party drivers. As in enough to at least boot a android device up.
Really Alfman you have asked me to quote stuff. Find a single quote that google ABI plans for the Linux kernel is to reduce stuff in the Linux kernel. It does not exist. The mini conference makes it clear google yes will have ABI for third party drivers. They want devices to be able to boot up and run even if not battery effective and maybe without GPU acceleration and other parts off the mainline kernel only as the safe mode that should always run. To get that the mainline Linux kernel is the safe mode more drivers and quality control stuff has to go mainline.
Of course google will prefer devices that are pure mainline if possible without any third party drivers.
Kernel ABI stuff is a stop gap once you watch the full mini conference. With google objectives that the device should run without third party drivers this means you now can do more third party ABI breakage as if a third party driver does not work the users device should not be bricked.
So I would not say google ABI plan is anywhere as nice as what you think it is. Stop gap is really what it is.
oiaohm,
Yes, I am thrilled you understand what I’m asking of you, and yet you still haven’t done so!
That’s a straw man. I said a stable ABI could help to reduce kernel bloat by making out of tree development more viable, but I never claimed that’s what google was planning. They need a stable ABI to address driver issues that plague android. I know you don’t get it, and you probably never will, they’re being pragmatic and I agree with them.
So your argument is to let the Linux team support and maintain every hardware driver written throughout all of time? Their policy is based on the same idealistic non-sense of package managers. They’ll never be able to maintain all of that code, and in the end are just being a pain for end users. With no stable target to build for, most companies just don’t bother making the Linux driver to begin with. Being a solution to your own self inflicted problem is like saying “only I’m allowed to shoot myself in the foot.” You’re not sparing effort on one driver, you’re sparing effort so commercial entities that actually make the hardware see a reason to bother with Linux, and so end users don’t end up doing low level troubleshooting to figure out why something doesn’t work. In practice this method you speak of only ends up demanding everything follow some specific standard like usb-3, and anything not popular isn’t going to get worked on by the kernel developers. Take the keys away from these gatekeepers that don’t have business savvy.
>>> So your argument is to let the Linux team support and maintain every hardware driver written throughout all of time?
This is the problem.
https://www.linuxfoundation.org/press-release/2016/04/the-linux-foundation-launches-first-linux-based-civil-infrastructure-project/
Read here not all time but when you are need life cycles of support of 10 to 60 years.
>>> Their policy is based on the same idealistic non-sense of package managers.
I wish it was just idealistic non-sense not a real world requirement as it is.
Most people are not thinking 60 years of support is required. In that time a company that makes a closed source kernel ABI using driver could be no more.
By 20 years in finding hardware to run tests on is next to impossible. Its not like you can shutdown a power plant or water treatment plant to run some code test just to test if you code change does not break anything.
The hard reality is that we have to get to the point you can rebuild kernel and userspace with modern security fixes and be 99.999% sure it will work when it never been tested on the real hardware and we can do this for 60 years. This is not a pipe dream I wish it was. This is the real problem we have to solve.
dark2 the Quality Assurance processes around the Linux kernel and it matching userspace need to lift by a insane amount to meet the requirement.
oiaohm,
I strongly disagree with not shutting down a power plant before testing new code. I’ve experienced enough linux update glitches to learn that you don’t do an update on a production system if you are not prepared for failure. If you are going to perform an update on a mission critical system with lives and millions of dollars at stake, you either do so on a redundant system, or you do so during planned downtime. No excuses. You are better off not doing an update until you can establish redundancy or can shutdown during routine maintenance.
https://www.forbes.com/sites/jamesconca/2015/05/11/when-should-a-nuclear-power-plant-be-refueled/#6844ddac3d95
So you’d have plenty of opportunities to do planned updates over a power stations natural lifetime. If there happens to be an emergency requiring an unplanned updates, then you need to be prepared to shut everything down as part of the emergency procedures, but I still think it would be pretty dumb to run untested code and it’s kind of unfathomable to me that their engineers would approve untested code modifications for production. None of my clients even approach the $$$ or critical nature of a power plant, yet some of have a multitude of code staging environments to test for faults before going into production, and our deployment procedures include steps for rolling back changes in case something goes wrong. I’d be surprised to learn that power plants had less meticulous procedures than my clients.
>>> If you are going to perform an update on a mission critical system with lives and millions of dollars at stake, you either do so on a redundant system, or you do so during planned downtime. No excuses. You are better off not doing an update until you can establish redundancy or can shutdown during routine maintenance.
This is that you are not at war.
https://en.wikipedia.org/wiki/December_2015_Ukraine_power_grid_cyberattack
Security flaws that must be fixed now is normally this cyberattack class stuff. Problem is this does not waiting around for planned downtime and the damage done by the cyberattack miss configuring things can have compromised your redundancy systems and did in that past example.
Do note that is 2015 the CIP at Linux in 2016 is in response to the after event investigation.
>>>I strongly disagree with not shutting down a power plant before testing new code. I’ve experienced enough linux update glitches to learn that you don’t do an update on a production system if you are not prepared for failure.
Once under cyber attack your system is going to fail if you do nothing. You are not longer talking about prepared for failure its failure is coming can you stop it. How long it takes to validate a fix as well. If you can use a newer system and validate the update in mins/hours vs days and weeks you chance of getting ahead of the attack goes up.
So yes this counter argument of your pure
https://www.logicallyfallacious.com/tools/lp/Bo/LogicalFallacies/21/Appeal-to-Authority
without any facts.
>>> So you’d have plenty of opportunities to do planned updates over a power stations natural lifetime.
When you are needing todo security update because your systems are under-attack this is not planned maintenance any more. This plenty of opportunities idea was proven false in the Ukraine disaster.
>>>>>The hard reality is that we have to get to the point you can rebuild kernel and userspace with modern security fixes and be 99.999% sure it will work when it never been tested on the real hardware and we can do this for 60 years. This is not a pipe dream I wish it was. This is the real problem we have to solve.
This of mine is not a option thing. The reality of when things go wrong this is what you need to be able todo.
Alfman yes the way talked about how to deploying updates in critical infrastructure has failed the real world test. Linux systems that you have had reality have not had high grade quality control processes.
>>> None of my clients even approach the $$$ or critical nature of a power plant, yet some of have a multitude of code staging environments to test for faults before going into production, and our deployment procedures include steps for rolling back changes in case something goes wrong.
None of you clients have been on the receiving end of a targeted cyber attack that effectively takes those system apart. When you are updating stuff to stop a cyber attack you don’t have the rolling back changes option either.
Running out of spare parts due to the damage done mean you don’t have the parts to have you code staging environments on the same hardware because the parts form your code staging environments testing hardware are now in your primary systems. This is a very hard lesson from the Ukraine 2015 power plant that you need your code staging environments as emulation on more modern hardware if everything goes wrong. As modern hardware of the time will be simple get and can process the validation faster. Where the old legacy hardware in active deployment can be very hard to get when thing go radically wrong like a cyber attack. A cyber attack is also when you are forced to deploy updates as fast as you can to try to reduce the damage.
The Ukraine powergrid after report looking at what the cyber attack did to all these plans was very much a cold hard lesson. What was learnt from the Ukraine powergrid mess of 2015 gives us this horrible requirement that we have to work out how todo. for the next time. We are racing the clock either will work out how to do it before the next time or the next time we might not get lucky.
Yes deploying “untested code modifications for production” is one of the thing that brought the Ukraine 2015 powergrid cyberattack under control. But this is not something ever want to end up with as your only option because you don’t have your test platforms any more because the parts out your test platforms are now in your primary systems. Yes they were running out of rolls of dice in spare hardware due to what they had lost in the process.
oiaohm,
These systems should never be connected to public networks. Your scenario is not a code problem, it’s a trust problem. If mission critical systems do get compromised by a privileged insider, then the reality is that no amounts of untested code is going to solve the problem. So even in this scenario the priority needs to be outting the insider and restoring everything to a known state.
>>> These systems should never be connected to public networks. Your scenario is not a code problem, it’s a trust problem.
Go read the December 2015 Ukraine power grid cyberattack.
The power grid control system was not directly connected to public networks. It was taken out by sneaker net. As in someone laptop/phone/what ever got infected got connected to the critical network for some diagnostic reason so resulting in a ticking time bomb getting in.
>>> If mission critical systems do get compromised by a privileged insider, then the reality is that no amounts of untested code is going to solve the problem. So even in this scenario the priority needs to be outting the insider and restoring everything to a known state.
This is the wrong logic for the 2015 Ukraine power grid cyberattack. You hostile insider is a program. A worm program. Not properly tested code that prevents the worm programs exploits for spreading from working so as you restoring systems any missed system will not bring let the infection back as bad as it was.
Also the Ukraine power grid cyberattack demoed many different attacks getting around physical network isolation from internet that kind of horrible. The part of the attack against the UPS units causing them to rapid turn on and off power. So causing power dips and spikes. Result of these dips and spikes what turns a large number of power supply units into scrap metal..
Yes remember a worm like this is destroying you hardware. So you don’t have your electrically equal test platforms any more because you have had savage those systems for parts so the primary systems can work.
I would call the Ukraine power grid cyberattack a true cyberwar attack where the malware worm program is written intentionally to do as much physical damage as possible and hide in as many items as possible. Yes it was hiding in routers, switches, printers….. anything with writable firmware it knew how to write itself into..
Alfman most people setups are not design to live though a 2015 Ukraine power grid style attack. Its a major level of savage. It pays to go read what happens in 2015 with Ukraine power grid as it show a lot of the general ideas fail when put to the real world test.
The idea that you will be able to restore everything to a known good state simply while having the security flaws is not in fact true. Human error will cause you get put a stack of known good that has been tested on hardware back with 1 malware infected hardware and since its a worm all work will be undone.
You need to be able to test updates in emulated because you are going to be short of real physical hardware and you cannot afford to be going around in circles of infected because you risk running absolutely out of hardware.
2015 Ukraine power grid attack changes the model you need to operate from. Anything with known security flaw in the 2015 case has to be treated as known bad not known good. So reality is once the attack hits you have no existing known good. You need to make your known good after the attack and after its damaged your hardware taking out your electrically matched test platforms. All you have left for test platforms is emulation and unit tests .
oiaohm,
You don’t get it, it doesn’t matter how it got compromised, you still have to find the insider and restore to a clean state. It doesn’t make any difference if it happens in Ukraine, US, or anywhere else in the world, in the event of a compromise, the safest route may be to go offline as inconvenient as that may be. It’s irresponsible to test new code of any significant complexity on active mission critical systems if new bugs have the potential to produce catastrophic outcomes. You want to disagree then so be it, I’m just glad people like you don’t run the power grid.