Well, this sure is something to wake up to: a massive worldwide outage of computer systems due to a problem with CrowdStrike software. Payment systems, airlines, hospitals, governments, TV stations – pretty much anything or anyone using computers could be dealing with bluescreens, bootloops, and similar issues today. Open-heart surgeries had to be stopped mid-surgery, planes can’t take off, people can’t board trains, shoppers can’t pay for their groceries, and much, much more, all over the world.
The problem is caused by CrowdStrike, a sort-of enterprise AV/monitoring software that uses a Windows NT kernel driver to monitor everything people do on corporate machines and logs it for… Security purposes, I guess? I’ve never worked in a corporate setting so I have no experience with software like this. From what I hear, software like this is deeply loathed by workers the world over, as it gets in the way and slows systems down. And, as can happen with a kernel driver, a bug can cause massive worldwide outages which is costing people billions in damages and may even have killed people.
There is a workaround, posted by CrowdStrike:
- Boot Windows into Safe Mode or the Windows Recovery Environment
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory
- Locate the file matching “C-00000291*.sys”, and delete it.
- Boot the host normally.
This is a solution for individually fixing affected machines, but I’ve seen responses like “great, how do I apply this to 70k endpoints?”, indicating that this may not be a practical solution for many affected customers. Then there’s the issue that this may require a BitLocker password, which not everyone has on hand either. To add insult to injury, CrowdStrike’s advisory about the issue is locked behind a login wall. A shitshow all around.
Do note that while the focus is on Windows, Linux machines can run CrowdStrike software too, and I’ve heard from Linux kernel engineers who happen to also administer large numbers of Linux servers that they’re seeing a huge spike in Linux kernel panics… Caused by CrowdStrike, which is installed on a lot more Linux servers than you might think. So while Windows is currently the focus of the story, the problems are far more widespread than just Windows.
I’m sure we’re going to see some major consequences here, and my – misplaced, I’m sure – hope is that this will make people think twice about one, using these invasive anti-worker monitoring tools, and two, employing kernel drivers for this nonsense.
How you’d get compensation ?
You chose to run their software and agreed to their terms and conditions.
“To add insult to injury, CrowdStrike’s advisory about the issue is locked behind a login wall”
They’ve been very cagey about this. No mention of the issue on their web home page and they only posted on X an hour ago.
I think the CrowdStrike stuff might be a separate incident – Microsoft’s own cloud status page at https://status.cloud.microsoft/ currently says:
“Preliminary root cause: A configuration change in a portion of our Azure backend workloads, caused interruption between storage and compute resources which resulted in connectivity failures that affected downstream Microsoft 365 services dependent on these connections.”
I suspect this then hit a ton of Azure-hosted sites/services, which in turn caused a lot of the IT chaos you’re seeing today. Microsoft do list the CrowdStrike issue on the status page I mentioned, but this is an optional component I think – it’s just that a fair number of Azure VMs use it, which is why Microsoft mention it (the fix is either to try a load of VM reboots or restore the VM from a previous backup).
Welcome to the era of Habsburg IT. Biology is very quick to push groups that lack any genetic variance into oblivion.
Why would you have Windows run your airport signage systems? Why not the minimum setup you need to just reload a static webpage every few seconds?
Why can’t a 911 dispatcher’s workstation be just a dumb terminal? How often do you need to add features into it?
Why would you have a Windows-powered fire alarm system?
Why would a hospital have to cancel SURGERIES because the clerk’s workstation is down?
What is this madness?
Why do we need to have everywhere the whole directx stack, drivers for my 1999 joystick, bluetooth support, and all the daemons to keep all this stuff safe? Why does everything has to be cloud-connected, app-connected?
Why is this madness? When we will learn that monopolies have no incentives to improve?
The IT industry is basically the 3-4 larger vendors and everyone around trying to milk the cow. Why would you ever recommend another vendor after spending so much money getting certified for this vendor?
When we will stop adding complexity to the solutons we deploy and stop praying to for all the security settings to be correctly configured and, oh, let’s hope all updates of all the million libraries and dependencies have been security-tested, will not crash and burn…
If you are stuck at an airport, enjoy the 5 EUR voucher that will get you half espresso.
If you are responsible for this mess, enjoy your golden parachute.
Also… wasn’t Nadella who said that now the buck stops with the execs, even if not their direct responsibility? I know well this is not MSFT’s fault, but I want to see someone, please, show me an exec that is not Teflon-clad.
Very curious to see if anyone at Crowdstrike will take responsibility and suffer any kind of consequence with this.
Oh, well, we have Boeing, who am I kidding? People die, people lose money, people are stuck, nothing ever changes.
Shiunbird,
I agree, I think this “cloud computing” has been sold to management to the overall detriment of our infrastructure. Many of these systems really shouldn’t be internet connected at all.
The answer here is easy, money. Subscriptions and cloud services have been sold as the panacea to all IT troubles and for cheaper. “Replace your IT staff and hardware budget with our cloud service” is quite an effective selling point for executives who are rewarded for cost cutting. Who cares about side effects when a decent chunk of whatever money they save will increase their compensation by millions.
Except that I’ve never seen a true headcount reduction with cloud – you still need an army to tame all the complexity, and screw ups are way more menacing due to the exposed-to-the-internet default nature of it.
Shiunbird,
This is how it’s being sold to executives. Whether it works as advertised is a good question though.
Maybe there’s compressive research on this? The thing is I suspect one could cherry pick cases to match a predetermined narrative, and there’s definitely an incentive for marketing departments to do so.
Personally my jobs have been heavily affected by software offshoring, which is both different and similar to what we’re talking about. There’s absolutely no doubt offshore teams win on costs, but at the same time their quality can be atrocious. Still, I’ve seen many owners choose lower prices over higher quality.
This seems to be a common refrain across industries. When Thom Holwerda was complaining about the quality of AI translations, it felt oddly relatable.
It makes IT employees easier interchangeable. And also Mgmt has the feeling of more control.
AnAmigian,
I can’t speak for how management feels, but it would be ironic for them to feel that way given that one of the cons of outsourcing is less control. Putting critical business functions in other people’s hands relinquishes control to them. If it were me, I’d be very reluctant to enter arrangements that leave my business so utterly dependent on others, but obviously the business world loves it. It doesn’t seem to ring any alarm bells that everyone’s eggs are being kept in collective giant baskets.
AnAmigian,
There are two ways to look at your technical staff:
1. Valuable assets that help group your company
2. Cost centers that “you have to tolerate”
My example of (1) is “The Old Google”. At one point they were hiring everyone over a bar, as they would expect the engineers to build awesome stuff on their own (famous 20% which lead to AdWords, GMail, and others)
My example of (2) is “The New Google”, but also “Boeing”, and many other companies. Any department that does not show revenue growth or not useful for “shareholder value” (like a flashy AI team) is on the chopping block.
Unfortunately many start with (1) where they value their engineers as most important superstars. But over time evolve into (2) and start layoffs and massive offshoring and outsourcing of talent.
sukru,
I do follow your point. Whereas you describe this as the lifecycle of a company starting with #1 and then transitioning to #2, I think we might actually be experiencing a generational shift with new norms being set across the board. It’s normal for new companies to start with outsourcing today. For better or worse, they skip #1 and go strait to #2. My take is that fewer people today get to experience #1 compared to the past. On the one hand, people including myself may be susceptible to rose tinted glasses, but on the other hand things may actually be getting worse, especially for recent grads.
Alfman,
You might be right.
The biggest negotiation power engineers have (had?) was being able to “go solo”. It was very easy to build your own web page, program your indie game, write your own app, or develop you online service.
However with increasing cost of entry and massive regulations on new areas like AI, it is becoming difficult to do so.
“Very curious to see if anyone at Crowdstrike will take responsibility and suffer any kind of consequence with this.”
Look at SolarWinds and we see articles these last few days like: SEC Legal Claim on Solarwinds Dismissed
So good luck with that !
Lennie,
I’m pretty sure there would be at least one clause in the Crowdstrike license agreement that limits the company’s liability for any damages resulting from its software.
The more pressing concern is that whether the enterprise customers will retaliate by taking their business elsewhere. That would really hurt their wallets.
It’s rather interesting indeed, this concept of outsourcing system safety and integrity to some other software provider, for them to make that happen. So here some fundamental shift needs to happen in the future, on where such reliance to not be needed any more. If that is even possible, likely not.
Systems today are so complicated there is no way to “guarantee” code is bug free. Maybe, if software was still in kilobytes and not gigabytes, it would be possible. The days of DEC delaying the shipment of VMS until it contained no known bugs are well passed us.
Hopefully this will also push companies to move to Linux as AFAIK the Linux version of CrowdStrike was unaffected.
Please fix login – “Remember me” does nothing and it logs me off after few days. Cookies enabled, etc. and it happens only on OSNews, other sites I was logged on for years (some to the point of forgetting the password) still are logged on, it’s just OSNews that keeps logging me off.
Linux is not gonna fix it because it is decades since it was plucky little enthusiast OS challenging the establishment. We are no longer in the 1990s! For that you need any of the OS that does not run crowdstrike.
The name of the business is apposite. Why is everything stopped? Is it doctor’s strike? Aircrew on strike? No, its a Crowd Strike!
It was unaffected because Crowdstrike didn’t bork the Linux version, not because of Linux itself. That said I’d be happy for more people to use Linux, even if the reasoning is questionable.
I suppose the correct phrase would be, didn’t b0rk Linux this time.
It appears Linux was the first OS it happened to a while back, and just didn’t get all the publicity because it was only Debian or Rocky Distros that were affected.
Although the mechanism was largely the same, the root cause was different.
Same company was causing kernel panics a few months ago on their Linux clients. The OS isn’t the problem.
The fact that we didn’t hear about it means that no one uses CrowdStrike on Linux or close to no one. Which checks out since Linux has superior opensource network shaping and hack protection and only clueless admins or those who were told to set it up by clueless managers would use it on Linux.
Actually, what that really says is something that I’ve always argued for, diversity of solutions is good. It’s not an OS that causes the problem, it is the proliferation of one specific OS or software solution that is the real problem.
In terms of avoiding this problem, it’s more about luck, it could be any tightly linked device driver next time. If a driver leverages some form of direct access to low levels kernel, it’s a risk. Think about this in regards to the recent Nvidia announcement, they give you a lightweight certified kernel plugin, then pipe whatever they want through that channel probably untested.
Is it true regulators blocked MS from deploying pretty much the same solution already found in MacOS, because they rule it anti-competitive? Some hard questions need to be asked.
Someone can probably correct me on this with more experience with CrowdStrike Falcon, but I don’t believe it’s a user monitoring tool. I believe the monitoring it does is for malware/exploit/virus/whatever mitigation.
Not to let them off the hook, but it’s honestly worse than if this was just simply user monitoring software, IMO.
Also not to say that I’m sure somewhere shitty useless managers have tried to use security monitoring as user monitoring…
It’s not. Thom makes sweeping statements about tools he knows nothing about on a semi-regular basis.
Not saying it’s good, but that’s not its purpose.
AFAIK, they do both. But this time it’s the AV/DDoS protection that borked up, not employee monitoring.
He doesn’t claim to know what it is. Thom is quite knowledgeable, but he’s by no means an expert. He’s a translator by trade and a journalist by employment. He’s not a tech expert in any way.
People here think that OSNews is some bastion of informed knowledge, which is impossible with it being the one-man Thom show. No one person can be an expert in everything covered here.
Same CEO of the big McAfee screw up of 2010, if not mistaken.
Is it me or does “CrowdStrike” sound like something a villain would yell just before an attack move?
You know, like “Freeze Ray!” or “Face Slap!”
Sounds like something like Extinction Rebellion or some 4chan group or maybe even some real terrorists.
Or literally some new kind of striking system outside of organized unions.
Sony Music CDs all over again.
If you remember, at one point in time Sony’s music CD’s had a hidden data partition that would auto install a rootkit on your system. Yes, I am not making this up: https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
Companies love to install kernel level drivers to monitor user behavior. It could be an overzealous copyright holder that thinks they have full access to your system for the privilege of listening to a few songs, or a “cloud security provider” that thinks they could monitor much better locally instead of using a proper firewall.
Let’s file this under “it should have never happened, but we will most likely learn the wrong lessons”.
Got hit by this this morning at work. Logged in (first of course) and started responding to issues from angry customers. Well things are behaving at this time but that was a fun 5, out 8 hours, of my workday…
I’m not the decision maker but I would guess my employer will toss the angry customers a bone.
Turn it off and on again (lots) to fix it
https://azure.status.microsoft/en-us/status
Adurbe,
Every time I hear that I think of The IT Crowd.
https://www.youtube.com/watch?v=nn2FB1P_Mn8
Such a great show.
Alfman,
I remember it ending abruptly. Not sure, but it was probably the service I was using back then (Netflix?) not having rights to all.
For those who are interested, all seasons seems to be available for free:
https://pluto.tv/us/on-demand/series/582bfe2e857920bd1d030c50/season/1?utm_medium=textsearch&utm_source=google
Sadly with Windows I’ve seen this solve many problems which nobody knew how to solve after extensive Googleing.
CrowdStrike and Microsoft are very lucky the fix was trivial.
It was not though, if you have an international organization with 100s of computers world wide and you got hit by this. Your IT staff could not write some script to solve it and had to visit every single computer. If they use disk encryption (which many many, probably most, do now, specifically bitlocker), You needed to have someone with IT knowledge start the machine in safe mode or recovery system and to get into that situation you needed to bitlocker key. And then delete a file. I’ve seen people report from organizations where the computers of the IT staff also can’t boot, so they can’t even get the bitlocker keys they need to do this. So they can’t even get their own systems running so forget about the systems of the employees they need to support.
I can’t say I’ve ever heard of CrowdStrike before this, but I do wonder if we’ll learn anything from this in the long term. It seems like a perfect storm of a very fast rollout, inadequate testing before the rollout and no quick fix/failover for end users caused this chaos. With 8.5 million PCs involved, any change in config/software has to be very carefully tested and then rolled out in stages (e.g. 0.1%, 1%, 10%, 100%).
With CrowdStrike seemingly hooking into the kernel level of Windows, the code needs to be close to bulletproof (and ideally detecting BSODs caused by its code and disabling itself for the next reboot), but it clearly wasn’t. What I suspect will happen is that people will either suck it up and stay with CrowdStrike or move to another cyber security provider who will probably have equally shoddy coding/testing/rollout practices.
I did laugh when I saw that CrowdStrike sponsor the Mercedes F1 team (it’s emblazoned on the car and the team’s shirts) and apparently they had to end free practice early because of their own sponsor’s ineptitude! If I were Toto Wolff, I’d have gone and put a blank sticker over the CrowdStrike name out of embarrassment 🙂