10 years ago, systemd was announced and swiftly rose to become one of the most persistently controversial and polarizing pieces of software in recent history, and especially in the GNU/Linux world. The quality and nature of debate has not improved in the least from the major flame wars around 2012-2014, and systemd still remains poorly understood and understudied from both a technical and social level despite paradoxically having disproportionate levels of attention focused on it.
I am writing this essay both for my own solace, so I can finally lay it to rest, but also with the hopes that my analysis can provide some context to what has been a decade-long farce, and not, as in Benno Rice’s now famous characterization, tragedy.
The end of this massive article posits a very interesting question. What init system does Chrome OS use? And Android? Do you know, without looking it up? Probably not.
What does that tell you?
I still mostly use Gentoo and VoidLinux, which use OpenRC and RunIt respectively. I’ve never looked back either. The boot times are as fast or (more often) noticeably faster than systemd based distros like Fedora. Adding a new service or making a new service from scratch just means writing (or even just symlinking) a script which has no special syntax or properties. You could also just symlink the daemon executable if it needs no extra arguments (as many don’t). The logs are readable as plaintext in /var/log, where they have always been and any amount of googling will tell you to look.
It might be true that systemd vanquished sysvinit. But that’s a strawman at best, as sysvinit was also defeated by other init systems.
I use Slackware, Void, and OpenBSD here, though on my new Ryzen/Radeon build I’m forced to use a systemd based OS (Pop!_OS in this case) to enjoy the hardware to its fullest extent.
Out of all the init systems I use, runit is by far the easiest to use and understand, followed by Slackware’s implementation of SysV. I still haven’t dug into systemd enough to fully appreciate its strengths and understand its shortcomings. Pop “just works”, much better than Ubuntu at least, and I’ve enjoyed it as a main workstation and gaming OS; I haven’t run into any systemd show-stoppers yet though I’ve heard they exist. I am definitely leery of binary logs, and I’m aware of the intention to move the /home folder under systemd’s control. As you, I definitely prefer runit’s simple service management to overly complicated unit files.
I still have my trusty old Core 2 Quad workstation with Slackware, my laptop with Void, and even an old Mac mini I still use for music creation. I haven’t booted Windows at home in a very long time, and I have a feeling it won’t be on my next build at all. As bad a reputation as systemd has, I’d rather trust Linux with or without it, than Windows at all.
Morgan,
I’m with you on simplicity. IMHO an init system should be simple and clean. More complex things should be done from other more specialized daemons, but not everybody agrees or cares about the KISS principal. Oh well, everybody’s entitled to an opinion.
I’ve been around OSNews for a decade, if not more. Even had original posts published here.
It always gives me a kick to see OSNews referred to in historical research.
Anwyay, as this article, I’ve always liked systemd and also never had enough of a need to sit down and fully grok it. Poettering seems like an empathy-challenged individual, not unlike Linus 1.0, but that doesn’t make him wrong on technical merits.
Probably my favorite part of the article, other than its erudition and references to Trotskyism, is the extended quote from Emmanuele Bassi. It helped me understand a part of my own desires and preferences as a developer that I didn’t quite understand. What could be more enlightening when it comes to tech writing than that?
crazed-glue,
As usual, it depends. Systemd was usually judged favorably against sysv init. However against the backdrop of sysv init, almost any init system is an improvement IMHO, be it upstart, runit, openrc, systemd, etc. When it came to my own distro I dumped sysv init without hesitation in favor of my own rund, which to this day I prefer to systemd because it follows the KISS rule, or “Keep It Simple Stupid”. For me, the halmark of a good init system is one that gets out of the way and delegates specialized tasks to specialized daemons. For better or worse systemd turns the init system into something more convoluted. And by doing so it discards the unix philosophy that is exemplified by breaking problems up into simpler tasks that can be solved with simple tools rather than monolithic ones. Shifting to binary protocols and binary logs didn’t (and still doesn’t) have merit to everyone.
I don’t know/don’t care about chrome os and android because I don’t write daemons for them.
Systemd is often misjudged as an init system. Sure, its an init system., but Systemd is really a process manager. It manages them all.. And it does that job better than anything else. No processes can escape its ever watchful eye. It has a lot of nice built in easy to use features. The technical merits are there. There are real and good reasons not to use it, I understand them. However for most use cases, its the best choice. It didn’t just kill sysvinit, it also killed upstart, which was also pretty good. Really, upstart;s failings inspired the design of systemd. The author of upstart, basically at one point laid out how an upstart rewrite based on cgroups would be the better way and lo and behold it was done.
To be fair, they’re all “process managers” when you look beyond sysv. Cgroups are perfect for resource control. In fact I think they’re great when they’re used with containers like lxc, which is even better than systemd for managing them.
https://ubuntu.com/server/docs/containers-lxc
However if one needs to rely on cgroups to monitor a daemon then it’s a poorly written daemon, I certainly hope that developers & admins aren’t encouraged to write daemons this way on account of systemd’s cgroups supporting it. I don’t disagree that systemd can do it, but it’s almost always the wrong way to write a well behaved daemon. Think of a car that allows a driver to drive while laying down across the back seat, sure it’s a feature, but just because it can be done doesn’t necessarily mean it’s a good idea to do it that way. /nonsense_car_analogy 🙂
Fortunately few if any daemons get written that way in practice. What’s more unfortunate though is stuff like this…
https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/
I feel this sort of tight coupling with systemd is a con and it inhibits agnostic code.
Alfman systemd has in fact done lighter coupling to cgroups than upstream planned.
There was a upstream kernel plan that the systemd developers did not like that was first service to start that use cgroups got absolute control. The means to delegate section of the cgroup tree that is a systemd thing. To be able to delegate safely required the complete cgroupv2 thing.
Systemd using cgroups was attempt to get around the PID recycling problem as well. Of course that required being fixed by pidfd.
Systemd was judged fairly against upstart. Instead of cgroups the upstart solution handle bad services was to use ptrace. Yes this using ptrace from service manager directly resulted in breaking applications own internal debugging. So upstart was a absolute lemon.
Runit was design starts before cgroups existed. Places runsv around everything and depend on pstree to remain truthful. Unfortunately this is the Linux kernel we are talking about. Its possible with Linux syscalls for a process to change what its parent is so become connected to another section in the pstree.
Openrc is not developed fair enough on cgroup support yet.
Alfman there are nice odd ball design choices in the Linux kernel process management itself. You can do many things in Linux kernel with processors BSD and other Unix kernels would tell you to get stuffed that not allowed.
Basically systemd is creating its own pstree in cgroups because this is the only way with the Linux kernel to be sure you are looking at a sane and truthful pstree.
PID recycling broke sysvinit design of service management. The changeable pstree in fact breaks most of the init/service management solutions before systemd. Upstart ptrace idea was that it could detect when a process was going todo something to change its process tree position so either reverse it or keep track of it. Same thing comes from systemd usage of cgroups.
Really we need to see more Linux init/ service management system options that accept cgroups is not really optional feature with the way the Linux kernel is designed.
oiaohm,
I hope you understand that doesn’t change my opinion against tight coupling.
Except that it does in fact work and many production systems have been running reliably without systemd or cgroups for a very long time. You and I already went over this in the past, and I disagree with you, However you asserted that the standard methods of monitoring processes were broken because linux would corrupt the process tree via OOM killer. I remain skeptical of those claims you put forward without evidence. And that’s why any claims you make now I am going to press you for a specific testable case proving that monitoring of traditional daemons isn’t reliable such that I can go test it and confirm that you are right or wrong.
http://www.osnews.com/story/28099/debian-fork-promises-no-systemd-asks-for-donations/#comments
I still stand behind my responses:
I apologize for grave digging old posts, but I’m hoping it will save us time in understanding each others views. I assume you still stand behind your original posts?
Please remember to response with some kind of testable proof/evidence that process monitoring wasn’t working reliably before systemd.
One of the cases happens every time you boot. pstree being changeable is how the PID1 in the initramfs changes PID1 from your file system.
Pstree being changed under Linux does not just happen with the OOM killer. Some of the services designed for Linux also exploit this means for a process to change its PID and it location in the pstree so that the restarted core process stays on the same PID value.
Not requiring to walk though the pstree to-do these changes give a performance boost.
Some of the linux kernel corruption of the process table is normal for the unix design of process management as well when you enable pid recycling in different implementations. Pidfd developer did quite a good write-up on this. Solaris/Illumos pstree is unstable as well but there you are expected to put zones around services. So Linux is not unique in this regard. FreeBSD zombies stuff so pstree does not get mangled this can cause FreeBSD to run out of PIDs. OpenBSD and NetBSD have stable handle system for processes this does mean a dieing process tree walk can stall you from starting a new process so this has a performance impact.
Both cgroups and zones have the advantage they are attached meta data to process that generally does not change for the life of the process. A process dieing does not require you to walk a tree and reconnect stuff correctly.
PID as a number was a fine idea when pid numbers were not recycled. Having PID number on child process as the parent of that process was also a fine idea as long as PID were not recycled. But we need handles and/or stable metadata in a system with pid recycling..
Basically there is two ways to some what correctly solve the problem follow the Solaris lead and implement everything Solaris had implemented or prototyped to fix this problem this is what Linux has done with cgroups instead of zones and pidfd yes file based process control was a user space prototype on solaris or follow historic BSD that is OpenBSD or NetBSD of who care how much we stall processes as long as we keep pstree true. Freebsd is a mangled implementation in the middle of what solaris prototype did and what OpenBSD and NetBSD are.
Alfman basically those making init/service management solutions just have to accept the Linux kernel has some design choices you just have to live with. Runit defects under Linux if you put it on solaris/Illumos same defects would come out. Under solaris design kernels if you want to make a service management solution work right you have to use zones and that the way it is. Yes you can basically swap Solaris for Linux and zones for cgroups when talking about service management and be making true statement.
Alfman you saying Linux kernel should fix what they do. Sorry it takes two to tango. In this case the Linux kernel has gone for a performance design choice. Chosing performance at times you have to-do things to pay for it. This case service management need to use cgroups so they get correct information instead of being able to use the pstree for the truth.
Yes the metadata solution of cgroups/zones to get truth has less overhead than keeping pstree truthful.
oiaohm,
You’ll have to forgive me for questioning you, but I’m looking for proof of what you’re talking about in such a way that can be shown to actually disrupt process monitoring. That is what you are alleging after all. It’s *not* enough for you to make blanket assertions, point me to something that would prove what you are saying is true. Please provide at least one test case that I can independently reproduce!
> You’ll have to forgive me for questioning you, but I’m looking for proof of what you’re talking about in such a way that can be shown to actually disrupt process monitoring.
https://securitylab.github.com/research/ubuntu-apport-CVE-2019-15790
Everything I have been talking about has been demoed by different CVE numbers that person managing a system should know about. So your claim it does not happen in production systems is bogus because it been happening repeatedly in production systems so creating CVE problems that have to be worked around. pstree issues and pid recycling is a repeated cause of security issues.
Before you get smart and say I can just uses the new pidfd instead that will fix like that above CVE remember each process include master service management only has X number of allowed file handles. What one way to avoid the file handle limit that right accept and simple using cgroup meta data to trace the processes instead of having your service manager attempt to track the processes so avoiding this complete problem.
Notice something here you are staying well clear of raw PID with pidfd and by using cgroups instead of depending on pstree you are also staying well clear of the fact pstree under Linux based on PID numbers instead of proper handles.
PID number idea is historic and path to repeated security issues.
oiaohm,
What you are referring to is something I’ve brought up before in different contexts, but it doesn’t apply to the parent process for the simple reason that the PID of zombie processes are not recycled until the parent reaps them. This is by design.
I know what you are talking about and I agree with you that using PIDs from external applications (such as the “kill” command) could result in erroneous race conditions where the signal gets sent to a new process using a recycled PID. This is not the case however for parent process monitors that have to read the exit status (aka “waitpid” syscall) before the PID can be recycled.
https://stackoverflow.com/questions/16944886/how-to-kill-zombie-process
That’s your opinion, but I’m still waiting for any proof/evidence that reliable process monitoring of traditional daemons by init systems was not possible before the use of systemd/cgroups.
>A zombie is already dead, so you cannot kill it. To clean up a zombie, it must be waited on by its parent, so killing the parent should work to eliminate the zombie. (After the parent dies, the zombie will be inherited by pid 1, which will wait on it and clear its entry in the process table.) If your daemon is spawning children that become zombies, you have a bug. Your daemon should notice when its children die and wait on them to determine their exit status.
Notice lot of should. 1 due to pid recycling you may not see that the children die that CVE I pointed to was in fact that you don’t always see when children processes die and get replaced..
Alfman good question here. At the point zombie is connected to PID 1 can you please tell me what service caused it? Yes its moved in the pstree so you cannot look at the pstree any more. Service wrapped in cgroup of course the cgroup value is still on the zombie process.
What happens to children processes when parent dies on linux.
https://unix.stackexchange.com/questions/158727/is-there-any-unix-variant-on-which-a-child-process-dies-with-its-parent
Yep test code here. Also moved to PID1 not as a zombie but as a fully active process.
Now how do you tell just using your old school stuff where in heck those processes came from.
traditional daemons < execs shell -> execs X. Shell crashs for some reason that deamon1 correctly handles now X reconnects to PID1.
Deamon2-> execs shell -> execs X as well and it exits fine.
Now deamon1 is named either A or B same with deamon2.. How would you inside you inits not using cgroups identify the source of X. Think of this as a regularly happening problem. Think this is on a system where you cannot be running debugging because that will be too much extra load. Before you say look a logs from the services not all services log exec failures.
You need the service management to solve this. There are a long list of issues like this that do come up.
“waitpid” syscall does not help if parent crashes. You have not considering what events cause the pstree layout to change so losing you the information on what started what. Not having this information makes issue solving way harder and is the cause of issues.
Please note service crashed child connected to pid1 and still running locking some key file you attempt to restart service it will not restart properly instead crash. This happens a lot under the traditional init/service management systems. This is a repeating cause of database corruption and has made it into CVE numbers many times.
Systemd that process connected to PID1 is still in the services cgroup since the service cgroup still exists not all the service is stopped so you don’t attempt to start a service while part of it is still running.
Please note to have this property does not have to be systemd. Just use cgroup instead of depending on pstree and your life gets better.
Basically how many CVE do we have to have before people wake up the old system does not work we have cross 20 thousand CVE relate to issues caused by bad service management.
oiaohm,
Sorry, but you’re taking the CVE out of context and wrongly applying it to the process monitor case. It specifically says:
This is exactly what I was talking about, using PIDs at the command line causes race conditions because by the time the command runs, the specified PID could have been recycled. This is not the case for zombie processes though that cannot be recycled until the parent process reaps it using waitpid. If this is the reason behind your justification for why process monitoring breaks, then the reasoning is invalid. It’s a non issue when the process monitor is also the parent process, which is the case for process monitors generally speaking including the one I wrote.
I don’t disagree with the information you linked to, however it doesn’t really say anything about the (in)feasibility of monitoring daemons with traditional methods.
When you “exec” there is no child process and there is no “reconnecting to PID1” since the original child process doesn’t terminate and the exec call does not create a new PID and furthermore there is zero risk of the PID being recycled until the process actually terminates and becomes reaped. If you don’t believe me you can try it for yourself…
In other words, it’s perfectly acceptable to launch a daemon via script in this manor. There is no issue with PIDs. Now if you “fork” that changes things, but then it’s not a well behaved daemon as I described earlier.
This is nonsense because the scenario we’re talking about is the init system running as PID 1 monitoring it’s children. If it does in fact crash, then linux kernel panics. The same is true of systemd.
https://www.quora.com/Is-it-possible-to-kill-the-init-process-in-Linux-by-the-kill-9-command
To the extent that it might crash, then obviously you want to fix the bug that caused the crash. However there’s zero implication that traditional posix process monitoring causes crashes.
I think most of your criticisms are more rational if I were to understand them as being directed towards the sysv process management implementation. I hope you weren’t thinking that I was defending sysv init, because that couldn’t be further from the truth: Sysvinit is bad. Processes would self demonize and rely on PID files to keep track of things. This was full of race conditions and corner cases. It was not robust and things could go wrong. But my whole point is that process monitoring can be (and has been) done reliably using standard POSIX mechanisms that don’t rely on cgroups. In fact many daemons (all the ones that are well behaved in my book) support easy/safe/reliable monitoring without requiring cgroups.
For example:
https://linux.die.net/man/8/sshd
My linux distro launches the sshd daemon using this flag and it monitors the process (from the parent process pid 1) for termination and will restart it when it happens.
Can you prove to me that this doesn’t work reliably? Nothing you’ve mentioned so far breaks this.
https://www.ssh.com/ssh/sshd/
>My linux distro launches the sshd daemon using this flag and it monitors the process (from the parent process pid 1) for termination and will restart it when it happens.
Ya right. Nothing like being wrong.
Deamon2-> execs shell -> execs X
Remember this.
Now lets apply this to sshd.
sshd -D -first process.
sshd: cessu -second process
sshd: cessu@pts/2 -thrid process
zsh =forth process.
So sshd: cessu crashes for some reason ie bad network traffic hits a bug(yes this happen back in history with the different exploits). sshd: cessu@pts/2 jumps to PID1 now it may not exit. Now sshd -D crashes you restart from you PID 1 and user cessu may not be able to login due to part of sshd still running. This can happen repeatedly until no one can login because you have no pts left because you leaked them.
Yes sshd is one of the problem child where bad tracking equals disaster.
Logind putting cgroups around user allows that zsh to be properly linked to a user login.
Yes this effects runit and runsv in fact runsv adds extra code with possible failures to leak stuff back to PID1 and get incorrect tracking. Its not hard to be better than sysvinit but being better than sysvinit does not mean you have your service tracking right.
>This is nonsense because the scenario we’re talking about is the init system running as PID 1 monitoring it’s children. If it does in fact crash, then linux kernel panics. The same is true of systemd.
Except this for systemd is kind of wrong. Remember systemd is kind of based of of SMF from solaris in it design. SMF does not run as PID1. systemd was put as PID1 because early cgroups mandated to have full control that you be PID1. That is not the case today basically systemd service management being on PID1 is a legacy issue that I would recommend anyone making a working replacement avoid.
Systemd does not have to run as PID1 to run service management or effectively monitor children or run some other process. systemd user does exist.
https://wiki.archlinux.org/index.php/Systemd/User
Yes this can do all the cgrouping around processes by cgroup delegation. So you have the cgroup system do the tracking then your service management technically does not need to be as PID1. In fact the there is a large stack of stuff systemd puts in PID1 that does not in fact need to be there due to the usage of cgroups. This is one of the design faults of systemd.
In theory you could have a really simple init as PID1 that watchdogs systemd and systemd as PID2. Please note you can restart systemd user version of service management without restarting any of the services why because all the status about critical information about curent running services is in the cgroups.
oiaohm,
The user processes spawned by the sshd daemon aren’t the ones being monitored by the init system including systemd. It doesn’t matter if the sshd daemon crashes and it’s children are re-parented to PID1. The sshd daemon doesn’t care and in fact you generally wouldn’t want an sshd restart to auto destruct it’s children as that would interfere with live sessions.
I tested it just now and once I set the systemd sshd service profile is set to “Restart=always” (it wasn’t set this way by default) both systemd and my init behave the same way with regards to auto restarting the sshd daemon.
Well, I tested your hypothesis and spawned ssh sessions until I reached the limit. You are correct that this blocked further logins, which is bad, however you are wrong that systemd did anything differently to improve the situation, it did not. Even after restarting the sshd daemon, the old sessions were still there and I was still blocked! If this were happening to me in a production system, I’d look at upping the limits or possibly creating a new process to kill off the less important ssh sessions, but here’s the thing Systemd is just as vulnerable to system limits in your hypothetical scenario.. Like it or not this is hard to fix even with cgroups because the init system cannot assume that killing the daemon’s children is safe to do, it might well result in data loss.
And this still happens under systemd.
Nothing is being “leaked” as a result of ssh connections being reparented back to PID1. Whether you use systemd or another init system, the resources will be returned when the ssh process crashes or terminates. And in terms of your scenario where it hangs indefinately, this can happen whether you are using systemd or another init system.
I would agree there are some nice features with cgroups, but if you’re still going to maintain that monitoring cannot be done reliably by an init system without them, then I expect you to provide proof / evidence for your assertion.
>>The user processes spawned by the sshd daemon aren’t the ones being monitored by the init system including systemd. It doesn’t matter if the sshd daemon crashes and it’s children are re-parented to PID1.
That is in fact false. This shows you have not looked at systemd without logind and you will see that systemd cgroups do in fact track user spawned processed from sshd under the sshd cgroup. Logind adds a cgroups around sessions so this is clearer on what is session. logind part of systemd causes a different behavior here.
So under systemd be it tracked by the sshd cgroup or the sesssion cgroup nothing re-parented to PID1 is untracked in fact under systemd method.
This is the big difference everything is tracked and that is the systemd way and this is not a new idea in fact original sysv init as in the init you found on the sysv operating system had this idea as well..
>>I tested it just now and once I set the systemd sshd service profile is set to “Restart=always” (it wasn’t set this way by default) both systemd and my init behave the same way with regards to auto restarting the sshd daemon.
Really you never check ps -o cgroup right to understand what systemd was upto.
>>Well, I tested your hypothesis and spawned ssh sessions until I reached the limit. You are correct that this blocked further logins, which is bad, however you are wrong that systemd did anything differently to improve the situation, it did not. Even after restarting the sshd daemon, the old sessions were still there and I was still blocked!
Of course you had logind enabled. So each user session was in fact in it own cgroup. So of course restarting sshd did not kill all. Now in embedded devices you don’t always have logind, If have systemd service management and no logind(something you find in embedded systemd) and you restart sshd out all sessions under sshd. So depending on how systemd is configured it can in fact improve that condition.
Of course you have just proven to me your total lack of knowledge on what systemd is doing with cgroups is about time you stick your noise under the hood with ps -o cgroup and look.
>>And this still happens under systemd.
Yes the leak can happen with systemd. But with systemd due to full tracking you can put a watchdog in place to clean the leak up and restore ability to login without restarting the complete box. The way you do it is slightly different between systemd without logind and systemd with logind. Without logind you just restart the sshd service that gets the lot. With logind you have to use loginctl to kill all existing remote sessions and restart sshd.
>>Yes this effects runit and runsv in fact runsv adds extra code with possible failures to leak stuff back to PID1 and get incorrect tracking. Its not hard to be better than sysvinit but being better than sysvinit does not mean you have your service tracking right.
That is wrong sysv that sysvinit deisgn is from had tracking because PID numbers were not recycled so the pstree remained truthful and the pid was not reconnected to PID1 back where the sysv deisgn comes form. Reconnecting to PID1 comes from PID recycling. So the sysvinit re-implementation under Linux was basically broken out the box in this regard. So you were meant to have full tracking incompatible kernel implementation means you did not with sysvinit.
As you just admitted runsv has possibility of incorrect tracking I do not want incorrect tracking. I am expecting everything tracked so if something goes wrong you have what you need to respond to it.
>> Nothing is being “leaked” as a result of ssh connections being reparented back to PID1. Whether you use systemd or another init system, the resources will be returned when the ssh process crashes or terminates. And in terms of your scenario where it hangs indefinately, this can happen whether you are using systemd or another init system.
systemd could hang indefinitely but also due to cgroups and the include watchdog systems in systemd there is enough information that you can have code on the system detect the problem and auto resolve. Ok the auto resolve can be a bit of a sledge hammer.
> Like it or not this is hard to fix even with cgroups because the init system cannot assume that killing the daemon’s children is safe to do, it might well result in data loss.
This is inverse of the real problem and those argueing that systemd way is not need make this common mistake. Runit is assuming that it sees the service by “waitpid” syscall as gone it safe to start the service when it may not be. If a fragment of that service is still running its not in fact safe. Systemd will send signals to daemons children to attempt to get them to shutdown safely before starting a new instance of that service. Yes systemd will wait for those of the past instance fragments to be gone. How will you have the information what was a children of a service that need to be stopped before new instance of a service starts without using cgroups under Linux(There is not any other effective way). Yes starting a complex service like a database while a fragment of prior instance is still running can and does result in data loss as the new instance conflicts with the old instance fragments you need service management to prevent this.
Fun part the old original sysv the PID of the service would not have gone away until all the children had as well so on the right kernel sysv init and service management worked well. On Linux we got use to sysinit being broken so have the wrong expectations on what a init/service management should pull off. Those making stuff so called better than sysvinit were only comparing to the Linux version of sysvinit not what it a clone of it see what else they were missing as well. Full tracking is a big thing the alternative to sysvinit are missing bar the ones like systemd using cgroups and upstart tried full tracking by ptrace that did not work well.
Alfman its this simple unless you solution init/service management solution has full tracking of all processes I am going to hate it.
oiaohm,
I didn’t say it doesn’t use a cgroup, I said (and confirmed) that it was inconsequential to the results.
We’re both stubborn people oiaohm, but you know what the difference is between you and I? You’re stubborn about arguing that you’re always right without evidence and I’m stubborn at pointing out the importance of evidence to make an argument. Please give provide testable proof, otherwise everything you say is just hot air. Seriously man this is all I’ve been asking for many posts now. I really wish you were as stubborn at digging up evidence rather than at rejecting information that doesn’t fit your world view. Say whatever you will about me, but I usually try and go the extra mile to actually test things before jumping to conclusions. It’s so frustrating to talk to people who don’t have this same conviction.
Neither of us are dumb guys, and yet here we are with an intellectually stupid dialog with no chance of convergence. It doesn’t make sense to keep going this way. You know you aren’t answering my question. I *wish* you cared more about evidence and proofs like I do, but obviously I have to accept that you don’t. Anyways, here’s how this is going to end, you’re going to respond once again arguing that I’m wrong, services like sshd can’t be reliably monitored using traditional means, and once again I’m willing to bet you won’t provide any testable evidence or proof to make your assertions. And when that happens, here’s how I’m going to respond: “Thanks for your opinion, till next time my friend”.
You: I’m right, your wrong.
Me: Ok, prove it.
You: Yes sshd is one of the problem child where bad tracking equals disaster. I’m right, your wrong.
Me: Ok, but give me a test case that proves what you’re asserting.
You: Here’s another reasons cgroups are better. I’m right your wrong.
Me: Yes cgroups have some benefits but you still lack proof for some of your assertions.
You: I’m right your wrong.
No jackass go read that post carefully
Example 1
Setup systemd for embedded without logind. Ok this might be above your skill level Alfman right?
On this setup log a user in by ssh and restart ssh.service. Notice user got nuked. This is why I said you tested wrong.
Example 2.
Setup a postgresql server with intentional process leak.
https://www.postgresql.org/docs/10/plpython.html
Don’t say this cannot happen. Python in postgresql is a untrusted language so it can reach out and run something that leaks a process that locks a file that then your database will not process some query for some reason.
You will notice systemd cgroup solution will shut the database down properly.
Example 3
https://www.php.net/manual/en/function.exec.php
Yep PHP program under Apache can by some developer be leaking processes back to PID1 as well.
Example 4
Someone with mysql is using http://mysql-ninjas.blogspot.com/2019/08/shellbash-commands-execution-from-mysql.html this horrible now mysql can leak out fragments to PID1 as well.
Should I go on. Please note the leaks from Example 2-4 for what is now connected to PID1 without cgroups can look absolutely identical. So now Alfman a service has something wrong it leaked a process to PID1 how without cgroups are you 1 going to spot it early and track it back to what service has a bad arbitrary program/script added. That right you are going to wait until that service miss behaves right Alfman I am not really impressed by that idea. Please not the stupid arguement you are going to audit everything you may not have set-up the server.
https://unix.stackexchange.com/questions/158727/is-there-any-unix-variant-on-which-a-child-process-dies-with-its-parent
Yes the example here gives what you have to setup to cause process to break away from daemon and connect self to PID1 instead.
>However if one needs to rely on cgroups to monitor a daemon then it’s a poorly written daemon
You made the arguement that only badly written services will badly behave. Sorry its not the case. Complex daemons like webservers, databases… that are forking deamons also end up normally with code that allows them to run arbitrary programs(This is true all the way back to the start of unix). This was not a problem on original sysv systems(not Linux) due to no PID recycling and a stable pstree. This is not a problem on a Solaris system using zones around services.
Even if a multi process forking deamon does not have the means to run arbitrary code it can still leak fragments of it self back to PID1 and end up being restarted in way to mix new with old instances of self if you don’t have proper tracking around services causing screw ups.
Basically stop making excuse why you don’t have to deal with expected problem. Unix deamons were never always 100 percent well behaved. Linux deamons are not either.
Really Alfman are we meant to use deamons that are only a single process so your service management solution works right? That is what you are saying when you say cgroups or equal tracking around services is not required.
By the way two other things that common things running arbitrary programs of different levels of questionable is cron and cups. Yes even the old unix printering solution before cups was guilty of the same thing.
Miss behaving unix services is basically status normal.
oiaohm,
Thanks for your opinion, till next time my friend.
I admit to have skimmed over half of the post, for as much as like the political analysis parts, it is just overbearing.
What I got out of its technical middle chapter though does resonate with my own experience when trying to write a service unit for what I thought would be a fairly simple case (a set of N processes managed as a ‘service’, which already had a shell script available to check for their presence and start/stop them as requested, scheduled up to that point as a cronjob), and it is: if it smells like windows, talks like windows and has bugs like windows, it must be windows! 😀
In other words: despite the detailed documentation, project maturity and widespread adoption, systemd feels hard to use, ripe with unexpected behaviour, foot shots and bugs popping up at every corner.
Now I finally understood the underlying reason for this which is: a fundamentally flawed/incomplete model of the internal state, hacked and extended over time to accommodate a myriad of cases which were not properly accounted for in the original design.
Accrued functionality in a monolithic design (even though the project itself might be split across many processes) is not an MS exclusive, but it surely is pervasive in their early APIs, with the ever-expanding list of bolted-on function-call parameters.
Still’ I think it is a step forward from using shell scripts to manage services.
Not an init system, not a process manager – it is a whole operating system. It embraces and extends almost all the APIs and combines them into one monolith block of code, so you can’t just pick one you need without adopting the whole stack. To be fair, it was a gradual process, so I can see how people could have overlooked this trend, but after 2-3 years it was pretty obvious what the goal of the project is.
I didn’t ask for replacing the OS and I used Linux because I more or less liked it for what it was. There were issues but the init system was one of the least important and it was largely solved before SystemD has even appeared.
I don’t benefit from any of the SystemD features, except perhaps udevd, which should have never been merged with SystemD. It just creates unnecessary churn, complications and instability (yes, I had a few case of my systems failing and solutions were not trivial). To me the _net_ value of the whole project is negative.
There are also many questions to ask about governance and the process. It’s not about the usual straw man argument (Poettering). But rather why on earth a single, not very important project has been given the power to replace everything in my system? What has happened to competition and user choice? As it is, users of MS Windows or OS X have more say on future of their platforms because they can at least vote with their wallets.
In terms of the tech I’m not qualified to comment and in fact I barely understood anything beyond the conceptual level. What I took away from the reading was that the whole flamewar surrounding systemd seems to have had a lot more to do with the personalities involved than with any technical failings, which is all the more a shame because it means that valuable collaboration and improvements to code and documentation were surely lost as a result. I don’t know whether systemd will go the way of HAL but whatever happens, the end user likely won’t know and won’t care. Doesn’t mean it’s not important, but it does go to show how oblivious most of us are to the vast majority of the code on which we rely.
“The end of this massive article posits a very interesting question. What init system does Chrome OS use? And Android? Do you know, without looking it up? Probably not.
What does that tell you?”
Nothing really. That knowledge only matters if you need to get into Chrome US or Android at the system level.
Here is my perspective: I actually like systemd and I have yet to run into an IT professional who runs a large installation who doesn’t. And for everyone else, no one is making you use it.
But what *nix desperately needs is a standard API for interfacing with whatever system the OS is using so I can know what services are running, and have a way of starting/stopping them if I need to.
For example, I am writing a bit of software for an embedded linux system and I have to simply know that it is using BusyBox’s implementation of init.d, and that if the service in question is running there will be a /var/run/xxx/pid file, and that I have to run the shell script /etc/init.d/S99xxx [start|stop] and that all those conventions may be different when I test on my desktop system.
Similarly Ion a different project recently wrote a micropython module to interface with the ALSA library, and somehow code that ran perfectly stand alone, ran into some kind of incompatability when run inside of micropython. Somehow libasound2 couldn’t read it’s config files properly. Why? Dunno. But if linux had its own (better) standard approach to something like the Windows Registry or macOS’s plists then that error almost certainly wouldn’t have happened.
So as a guy who writes software that often has to interact with services and system files, I would prefer an imperfect but majority adopted standard (or at least standard interfaces) across *nix systems than the situation and griping we have now
jockm,
Yeah, except for those of us who don’t 🙂
You may dismiss my opinion because I don’t pass your “large” qualifier, but still I run a dozen or so servers using my distro which uses it’s own init system in a similar vein to runit. It’s simpler than both systemd and init scripts and very reliable.
Many people in the systemd camp are comparing systemd against the bad days of sysv init scripts, which I understand, but in fact most of us who are critical of systemd aren’t calling for a return to init scripts at all, merely a better solution that imposes less coupling. Tight coupling tends to cement specific technologies into the stack. It’s frustrating to see kernel functionality (like cgroups) are now having to be accessed through systemd’s cgroup APIs (I posted about this in another thread). The binary protocols and logs are problematic. These tight coupling problems created by systemd were not necessary to solve the very real problems posed by init scripts. It may be easy and convenient for you to say “no one is making you use it”, yet it’s actually harder to change due to tight coupling and the fact that it encroaches on everything. Even things that shouldn’t be part of the init system…
https://askubuntu.com/questions/907246/how-to-disable-systemd-resolved-in-ubuntu
I get that many people’s needs are satisfied by systemd, but not everyone feels that the monolithic approach is the best way to engineer an init system. Does it work? Sure, obviously it does. Is this the best approach for *nix? That’s obviously open to debate and I feel that it isn’t the best approach.
Agreed. Although systemd is probably a defacto standard by now in much the same way sysv was.
I plead ignorance to all that 🙂
Yeah, I think ideally we would have a standard API with an open-ended implementation. If we had that, I would be able to implement the standard in my own distro’s init system and we would both be happy 🙂
You will admit you underscored my point when you said:
😉
And the overall point I was making was that an imperfect standard is better overall. *nix systems are still filled with too many choices so how startup is handled, how config files are managed; and that makes it difficult to write software that has to interact with services, startup, config files, etc; and it creates too many fragile linkages where we rely on conventions and not APIs.
While shell scripts may be desirable for system administrators, they have security implications and performance overhead when running in embedded environments.
So if you want developers on board with choice, then all of you sysadmins need to get together and standardize your APIs for config files, services, etc. Then I won’t care what you use; but until you do I am not so sympathetic because of the pain it causes
jockm,
You don’t need to state it this way, I’ve already agreed with you on the value of having standards for administration.
It sounds like another criticism for sysv, I don’t want to sound like a broken record, but criticizing systemd does not mean I endorse sysv init… I hate sysv! I hate it’s runlevels, the lack of parallelism, the lack of process monitoring, I hate pid files, etc. These are all reasons I went on to create my own init system. I don’t know why we have to keep bringing up sysv’s limitations as though all systemd alternatives have those faults or baggage.
You’re saying this as though I disagreed with you about the benefits of having admin standards. But I thought I was pretty clear that I didn’t disagree with that.
I think you missed something. Because I never know what environment I run in I have to plan for everything. I also stated in my first example I just did a project that used BusyBox’s init system, which is shell script based.
I am talking about generalized problems that developers have to deal with. I don’t know about your init system, and if I ever write software that runs there then I am going to have learn about that too.
My point is that if you want choice then go out there and get common interfaces going, but until that day people like me are going to advocate for systemd because it is at least a standard. I am not arguing with you, but I do need admins like you to do something about it and not just complain about systemd
jockm,
Ok, but then it’s not really a rebuttal to anything I’m saying. You’re using it to make your point about the helpfulness of standards. That’s fine and once again we agree on that.
I have no control over what anyone does but myself. Not for nothing, but it feels like you’re looking to have it both ways: You criticize me over actually having done something about my gripes while simultaneously criticizing me for not doing more than complain. It begs the question, what would you have me do short of just falling in line behind systemd and not expressing my gripes about it? Is that really what you are advocating for here?
Wake up the person here is another embedded developer. We with runit we have the same old problem. When a process moves in the pstree its lost it.
Huge deployments like facebook in fact did presentations on the improvements over sysvinit and runit and other options going the systemd route.
Does your own init system use cgroups to track services so leaks are not possible.
Anyone like me who getting advantages from the systemd solution is not going to think much of a solution that does not measure up. You say don’t compare to sysvinit. Ok then Alfman compare you solution truly head to head with systemd on it ability to deal with miss behaving services correctly.
As I pointed out before miss behaving unix services is status normal in large deployments somewhere. Something will be running arbitrary code that does the wrong things that just the way it is.
oiaohm,
Haha, you’re starting off on another thread because you want to keep going at it?
Like I said, it is reliable and the are no leaks for my use cases. Understand that results matter much more to me than your approval. Obviously if you had anything more than an “I’m right, you’re wrong” argument then you would have said it by now, thanks for your opinion though. Have a good day!
Out of curiosity, can you share more information on the platform you use?
gdjacobs,
I think this was for me, it’s hard to tell because of wordpress.
Sure, I built a customized distro gmlinux to run on my servers. It is strictly designed to run on servers & routers, having a local desktop was a non-goal. Originally with V1 I tried to clean up the file system to create some modularity in the vein of GoboLinux…
https://www.gobolinux.org/
I still believe in this direction, however the amount of work it took to fix hardcoded dependencies for the software I wanted to support was a bigger task than I wanted to deal with, so I scaled back some of those goals. V2 has a more traditional layout. The init system supports provisioning dynamic adhoc services & jobs, even on read only file systems. One of the aspects that I like a lot is the ability to boot or recover to a known state without requiring a root file system (this is actually how it gets installed with nothing more than the vmlinux & initrd boot images).
The distro doesn’t have a huge software library, but all of the management/admin tools are there and I’ll provision customer environments inside of VMs using KVM.
I’ve been on way way too many problems relating to systemd in the last 4-5 years that shouldn’t have happened in prod. Sorry, it’s junk with too many moving parts. It’ll never get my vote. Tried to love it for years. Gave up.
systemd was likely the only init system project whose developers had a future vision of how service management should be done. just like pulseaudio and avahi it might have been highly disruptive, but it was something that was sorely needed, and it resolved a ton of issues – big and small – with linux distributions and the kernel in the process.
people may hate it, but the fact it’s so widely adopted is not result of some evil scheming, but because it’s actually useful.
i hated systemd with a passion, and i think some of its things ought to be separated from main code repository. it has great features like user’s services, timers, service dependency, hardware triggered services, resource control , distro agnostic (and simple) service defintions, as many runlevels as you wish, etc. and by its development linux kernel gained quite a few useful features.
not to mention some distro-agnostic standarization, like the /run directory – which was met with some very vehement opposition at the time.
systemd is not perfect, and its devs were notorious on LKML (and also for attempts at forcing kdbus at short notice). but i think the net gain is positive. things got better for desktop linux as a result and life got easier for people managing multiple distributions or software for them.
I can’t believe I read the whole thing. SOOOOOO LOOOONG. The article is full of cynical snark, but the conclusion is insightful and heartening. Init systems are just not very important. Systemd causes too much pain for a thing that is supposed to be invisible for it not to be superceded by something “better”.
I really never understood all the systemd hate. Shell scripts are a terrible way to handle services. Systemd solved a ton of problems that shell scripts never could. As the article points out people have been trying to find ways to update the init process in Linux for decades. Someone finally succeeds and people go ballistic because it fundamentally changes the way Linux does init. For me that’s the benefit while others see it as the problem.
Using shell scripts for init invariably leads to broken suspend/resume in many cases because init isn’t designed in a way to elegantly trigger events after resuming from suspend. That was always an issue for me that was instantly cured by systemd. Another horror show for traditional init is having every single service implement their own start, stop, status, reload, etc logic. It’s a nightmare and no package maintainer wants to be writing shell scripts for their service.
One of the biggest gripes I always hear is about systemd is the requirement of cgroups because it makes systemd Linux-only or even just because it is an additional requirement in itself. CGroups are one of the most powerful aspects of systemd and you can do some really cool stuff with them together with systemd. One of the least flashy but most useful aspect of requiring cgroups is that it elegantly solves the problem of zombie processes spawned from a service that has since been stopped. That’s just the tip of the iceberg but was a major problem for a long time with Linux init. The fact is the status quo wasn’t feasible and something had to be done.
abraxas,
I agree with you about the shell scripts, most of us do, however the thing is you’re criticism of scripts is largely outdated because most init systems have already moved beyond sysv init scripts anyways. The elimination of sysv style scripts isn’t original to systemd by any stretch. More than likely you wouldn’t have a problem with another less intrusive init system.
Except this isn’t a reason to go with systemd over other alternatives as you’ve concluded. I concede systemd is the defacto standard, but IMHO we could have come up with something better, more portable and more inline with the unix design principals.
For the record I’m not against using cgroups where they’re justified, they have uses, but systemd is too generic to be able to handle orphans intelligently. Many daemons don’t need to spawn child processes, and those that do aren’t handled intelligently by systemd anyways (especially when killmode=controlgroup leads to data loss).
Did most move beyond SysV init scripts? Yes, Did most move beyond shell scripts? No. Even the ones that did left a bunch of problems still unresolved. I see a lot of people say “we could have done something else”. Yes we could have, many even did, but in the end systemd won out. That’s how these things work. Just because some daemons aren’t SUPPOSED to spawn child processes doesn’t always mean they won’t and some are supposed to so you cannot just discount that. You don’t need to set killmode=controlgroup. That’s just one option. Of course if you kill processes without properly shutting them down you could lose data but being able to account for zombie processes is a win in itself. If you were to design a system that had to handle child processes more gracefully you would have to design a system more complex and invasive than even systemd so I’m unsure what you’re arguing for, something more complex or less?
abaxas,
Yeah but that’s mostly a case of how influential your employer is. Most of us who worked on init systems didn’t work at redhat, haha. It’s not always fair, but popularity contests usually go to those who are already popular. But admittedly it’s useless to complain about this.
I wouldn’t say they “aren’t supposed to spawn child processes”. It depends, spawning children may be a bug or it may not be. It may be appropriate to kill them or it may not be. The daemon may monitor it’s children or it may not. They may be critical or they may be non-critical. I don’t think we can make generalizations without talking about a specific example (ie ssh).
Also, if you’ve got a poorly designed and/or misbehaving daemon, it can be problematic even with systemd too. The point being the best design practices for daemons applies to systemd as well as other init systems.
http://www.freedesktop.org/software/systemd/man/systemd.service.html
Even if we can support the old double-forking/pidfile design, these are some kludgy hacks that are best forgotten whether or not we’re using systemd.
Of course you don’t, but if you don’t then there’s not much difference in the behavior of systemd and other init systems.
That’s just it though, I’d rather have the daemon implement graceful behavior rather than systemd perform incorrect dangerous behavior. Even with cgroups, systemd can’t magically know hot to perform a graceful shutdown on behalf of a daemon. Turning off systemd’s child killing behavior leaves management up to the daemon. So you wouldn’t necessarily be increasing the complexity over what it would take to get safe behavior under systemd anyways.