Major IT outage affecting banks, airlines, media outlets across the world

rxxrc@lemmy.ml · edit-2 1 year ago

Major IT outage affecting banks, airlines, media outlets across the world

NaibofTabr@infosec.pub · 1 year ago

Wow, I didn’t realize CrowdStrike was widespread enough to be a single point of failure for so much infrastructure. Lot of airports and hospitals offline.

The Federal Aviation Administration (FAA) imposed the global ground stop for airlines including United, Delta, American, and Frontier.

Flights grounded in the US.

The System is Down

aeronmelon@lemmy.world · 1 year ago

deleted by creator

Gemini24601@lemmy.world · 1 year ago

Why do people run windows servers when Linux exists, it’s literally a no brainer.

Sam_Bass@lemmy.world · 1 year ago

Good ol microsloth

r00ty@kbin.life · 1 year ago

Apparently at work “some servers are experiencing problems”. Sadly, none of the ones I need to use :(

upside431@lemmy.world · 1 year ago

Interesting day

1luv8008135@lemmy.world · 1 year ago

Everyone is assuming it’s some intern pushing a release out accidentally or a lack of QA but Microsoft also pushed out July security updates that have been causing bsods on the 9th(?). These aren’t optional either.

What’s the likelihood that the CS file was tested on devices that hadn’t got the latest windows security update and it was an unholy union of both those things that’s caused this meltdown. The timelines do potentially line up when you consider your average agile delivery cadence.

dhork@lemmy.world · 1 year ago

I picked the right week to be on PTO hahaha

Victor@lemmy.world · 1 year ago

If these affected systems are boot looping, how will they be fixed? Reinstall?

Sʏʟᴇɴᴄᴇ@lemmy.dbzer0.com · 1 year ago

There is a fix people have found which requires manual booting into safe mode and removal of a file causing the BSODs. No clue if/how they are going to implement a fix remotely when the affected machines can’t even boot.

letsgo@lemm.ee · 1 year ago

Probably have to go old-skool and actually be at the machine.

Encrypt-Keeper@lemmy.world · 1 year ago

You just need console access. Which if any of the affected servers are VMs, you’ll have.

CanadaPlus@lemmy.sdf.org · 1 year ago

Yes, VMs will be more manageable.

Freefall@lemmy.world · 1 year ago

Exactly, and super fun when all your systems are remote!!!

Passerby6497@lemmy.world · 1 year ago

It’s not super awful as long as everything is virtual. It’s annoying, but not painful like it would be for physical systems.

Really don’t envy physical/desk side support folks today…

VieuxQueb@lemmy.ca · 1 year ago

And hope you are not using BitLocker cause then you are screwed since BitLocker is tied to CS.

ChefKalash@lemmy.dbzer0.com · 1 year ago

Do you have any source on this?

Passerby6497@lemmy.world · 1 year ago

I can confirm it works after applying it to >100 servers :/

Victor@lemmy.world · 1 year ago

Nice work, friend. 🤝 [back pat]

Sʏʟᴇɴᴄᴇ@lemmy.dbzer0.com · 1 year ago

If you have an account you can view the support thread here: https://supportportal.crowdstrike.com/s/article/Tech-Alert-Windows-crashes-related-to-Falcon-Sensor-2024-07-19

Workaround Steps:

Boot Windows into Safe Mode or the Windows Recovery Environment
Navigate to the C:\Windows\System32\drivers\CrowdStrike directory
Locate the file matching “C-00000291*.sys”, and delete it.
Boot the host normally.

CanadaPlus@lemmy.sdf.org · 1 year ago

It seems like it’s in like half of the news stories.

bevan@lemmy.nz · 1 year ago

It is possible to edit a folder name in windows drivers. But for IT departments that could be more work than a reimage

Encrypt-Keeper@lemmy.world · 1 year ago

It’s just one file to delete.

Passerby6497@lemmy.world · edit-2 1 year ago

Having had to fix >100 machines today, I’m not sure how a reimage would be less work. Restoring from backups maybe, but reimage and reconfig is so painful

bevan@lemmy.nz · 1 year ago

Yes, but there are less competent people. The main answer for any slightly complex issue at work is ‘reimage’ - the pancea to solve all problems. And reconfig of personal settings is the users problem.

Treczoks@lemmy.world · 1 year ago

I was quite surprised when I heard the news. I had been working for hours on my PC without any issues. It pays off not to use Windows.

retrospectology@lemmy.world · 1 year ago

This is why you create restore points if using windows.

jedibob5@lemmy.world · 1 year ago

Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

Bell@lemmy.world · 1 year ago

Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?

sandalbucket@lemmy.world · 1 year ago

Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

wizardbeard@lemmy.dbzer0.com · 1 year ago

This didn’t go through Windows Update. It went through the ctowdstrike software directly.

RegalPotoo@lemmy.world · 1 year ago

Agreed, this will probably kill them over the next few years unless they can really magic up something.

They probably don’t get sued - their contracts will have indemnity clauses against exactly this kind of thing, so unless they seriously misrepresented what their product does, this probably isn’t a contract breach.

If you are running crowdstrike, it’s probably because you have some regulatory obligations and an auditor to appease - you aren’t going to be able to just turn it off overnight, but I’m sure there are going to be some pretty awkward meetings when it comes to contract renewals in the next year, and I can’t imagine them seeing much growth

Skydancer@pawb.social · 1 year ago

Nah. This has happened with every major corporate antivirus product. Multiple times. And the top IT people advising on purchasing decisions know this.

SupraMario@lemmy.world · 1 year ago

Yep. This is just uninformed people thinking this doesn’t happen. It’s been happening since av was born. It’s not new and this will not kill CS they’re still king.

corsicanguppy@lemmy.ca · 1 year ago

At my old shop we still had people giving money to checkpoint and splunk, despite numerous problems and a huge cost, because they had favourites.

jedibob5@lemmy.world · 1 year ago

Don’t most indemnity clauses have exceptions for gross negligence? Pushing out an update this destructive without it getting caught by any quality control checks sure seems grossly negligent.

IsThisAnAI@lemmy.world · 1 year ago

What lawsuits do you think are going to happen?

Cryophilia@lemmy.world · 1 year ago

Forget lawsuits, they’re going to be in front of congress for this one

IsThisAnAI@lemmy.world · 1 year ago

For what? At best it would be a hearing on the challenges of national security with industry.

Nachorella@lemmy.sdf.org · 1 year ago

They can have all the clauses they like but pulling something like this off requires a certain amount of gross negligence that they can almost certainly be held liable for.

IsThisAnAI@lemmy.world · 1 year ago

Whatever you say my man. It’s not like they go through very specific SLA conversations and negotiations to cover this or anything like that.

Nachorella@lemmy.sdf.org · 1 year ago

I forgot that only people you have agreements with can sue you. This is why Boeing hasn’t been sued once recently for their own criminal negligence.

IsThisAnAI@lemmy.world · 1 year ago

👌👍

Nachorella@lemmy.sdf.org · 1 year ago

😔💦🦅🥰🥳

ThrowawaySobriquet@lemmy.world · 1 year ago

I think you’re on the nose, here. I laughed at the headline, but the more I read the more I see how fucked they are. Airlines. Industrial plants. Fucking governments. This one is big in a way that will likely get used as a case study.

Cryophilia@lemmy.world · 1 year ago

The London Stock Exchange went down. They’re fukd.

Munkisquisher@lemmy.nz · 1 year ago

Yeah saw that several steel mills have been bricked by this, that’s months and millions to restart

gazter@aussie.zone · 1 year ago

Got a link? I find it hard to believe that a process like that would stop because of a few windows machines not booting.

This is fine🔥🐶☕🔥@lemmy.world · 1 year ago

a few windows machines with controller application installed

That’s the real kicker.

drspod@lemmy.ml · 1 year ago

Those machines should be airgapped and no need to run Crowdstrike on them. If the process controller machines of a steel mill are connected to the internet and installing auto updates then there really is no hope for this world.

This is fine🔥🐶☕🔥@lemmy.world · 1 year ago

But daddy microshoft says i gotta connect the system to the internet uwu

wizardbeard@lemmy.dbzer0.com · 1 year ago

No, regulatory auditors have boxes that need checking, regardless of the reality of the technical infrastructure.

elephantium@lemmy.world · 1 year ago

then there really is no hope for this world.

I don’t know how to tell you this, but…

Hotzilla@sopuli.xyz · 1 year ago

There is no unsafer place than isolated network. AV and xdr is not optional in industry/healthcare etc.

Munkisquisher@lemmy.nz · 1 year ago

I work in an environment where the workstations aren’t on the Internet there’s a separate network, there’s still a need for antivirus and we were hit with bsod yesterday

conciselyverbose@sh.itjust.works · edit-2 1 year ago

There are a lot of heavy manufacturing tools that are controlled and have their interface handled by Windows under the hood.

They’re not all networked, and some are super old, but a more modernized facility could easily be using a more modern version of Windows and be networked to have flow of materials, etc more tightly integrated into their systems.

The higher precision your operation, the more useful having much more advanced logs, networked to a central system, becomes in tracking quality control.

Imagine if after the fact, you could track a set of .1% of batches that are failing more often and look at the per second logs of temperature they were at during the process, and see that there’s 1° temperature variance between the 30th to 40th minute that wasn’t experienced by the rest of your batches. (Obviously that’s nonsense because I don’t know anything about the actual process of steel manufacturing. But I do know that there’s a lot of industrial manufacturing tooling that’s an application on top of windows, and the higher precision your output needs to be, the more useful it is to have high quality data every step of the way.)

rozodru@lemmy.ca · edit-2 1 year ago

deleted by creator

candybrie@lemmy.world · 1 year ago

Why is it bad to do on a Friday? Based on your last paragraph, I would have thought Friday is probably the best week day to do it.

rozodru@lemmy.ca · edit-2 1 year ago

deleted by creator

debil@lemmy.world · 1 year ago

And hence the term read-only Friday.

Lightor@lemmy.world · edit-2 1 year ago

Most companies, mine included, try to roll out updates during the middle or start of a week. That way if there are issues the full team is available to address them.

skittle07crusher@sh.itjust.works · 1 year ago

Was it not possible for MS to design their safe mode to still “work” when Bitlocker was enabled? Seems strange.

catloaf@lemm.ee · 1 year ago

I’m not sure what you’d expect to be able to do in a safe mode with no disk access.

corsicanguppy@lemmy.ca · 1 year ago

rolling out an update to production that there was clearly no testing

Or someone selected “env2” instead of “env1” (#cattleNotPets names) and tested in prod by mistake.

Look, it’s a gaffe and someone’s fired. But it doesn’t mean fuck ups are endemic.

Revan343@lemmy.ca · 1 year ago

explain to the project manager with crayons why you shouldn’t do this

Can’t; the project manager ate all the crayons

NaibofTabr@infosec.pub · 1 year ago

If all the computers stuck in boot loop can’t be recovered… yeah, that’s a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you’re responsible for it.

This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

rxxrc@lemmy.ml · 1 year ago

Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven’t bricked everything.

And yeah staged updates or even just… some testing? Not sure how this one slipped through.

dactylotheca@suppo.fi · 1 year ago

Not sure how this one slipped through.

I’d bet my ass this was caused by terrible practices brought on by suits demanding more “efficient” releases.

“Why do we do so much testing before releases? Have we ever had any problems before? We’re wasting so much time that I might not even be able to buy another yacht this year”

GoofSchmoofer@lemmy.world · 1 year ago

At least nothing like this happens in the airline industry

dactylotheca@suppo.fi · 1 year ago

Certainly not! Or other industries for that matter. It’s a good thing executives everywhere aren’t just concentrating on squeezing the maximum amount of money out of their companies and funneling it to themselves and their buddies on the board.

Sure, let’s “rightsize” the company by firing 20% of our workforce (but not management!) and raise prices 30%, and demand that the remaining employees maintain productivity at the level it used to be before we fucked things up. Oh and no raises for the plebs, we can’t afford it. Maybe a pizza party? One slice per employee though.

Confused_Emus@lemmy.dbzer0.com · 1 year ago

One of my coworkers, while waiting on hold for 3+ hours with our company’s outsourced helpdesk, noticed after booting into safe mode that the Crowdstrike update had triggered a snapshot that she was able to roll back to and get back on her laptop. So at least that’s a potential solution.

Wooki@lemmy.world · edit-2 1 year ago

Testing in production will do that

This is fine🔥🐶☕🔥@lemmy.world · 1 year ago

Not everyone is fortunate enough to have a seperate testing environment, you know? Manglement has to cut cost somewhere.

Blisterexe@lemmy.zip · 1 year ago

Manglement is the good term lmao

UncleArthur@lemmy.world · 1 year ago

Annoyingly, my laptop seems to be working perfectly.

Valmond@lemmy.world · 1 year ago

That’s the burden when you run Arch, right?

Damage@slrpnk.net · 1 year ago

lol he said it’s working

jaybone@lemmy.world · 1 year ago

He said it’s working annoyingly.

r00ty@kbin.life · 1 year ago

My favourite thing has been watching sky news (UK) operate without graphics, trailers, adverts or autocue. Back to basics.

scripthook@lemmy.world · 1 year ago

crowdstrike sent a corrupt file with a software update for windows servers. this caused a blue screen of death on all the windows servers globally for crowdstrike clients causing that blue screen of death. even people in my company. luckily i shut off my computer at the end of the day and missed the update. It’s not an OTA fix. they have to go into every data center and manually fix all the computer servers. some of these severs have encryption. I see a very big lawsuit coming…

jedibob5@lemmy.world · 1 year ago

Huh. I guess this explains why the monitor outside of my flight gate tonight started BSoD looping. And may also explain why my flight was delayed by an additional hour and a half…

Major IT outage affecting banks, airlines, media outlets across the world

Major IT outage affecting banks, airlines, media outlets across the world

Live: Major IT outage affecting banks, airlines, media outlets across the world