CrowdStrike’s unholy cluster is terrible news for CISOs
AWS EC2 Windows instances also borked with Crowdstrike's manual mitigation not working. Guidance available but...
Global IT outages attributed to a borked CrowdStrike driver update are horrendous news not just for the endpoint detection and response (EDR) company and its customers, but also for CISOs and security functions.
Not just for the obvious reason: that a security software agent on user machines appears to be doing incredible damage; but because organisations may have reduced appetite for security toolings in future.
Crowdstrike driver blamed: Paging QA?
The immediate question in the wake of the incident (which has had sweeping downstream impact) is about Crowdstrike’s QA – with the issue happening just three weeks after another update from the company maxed out end-users CPUs, also triggering widespread customer outages.
Security researcher Kevin Beaumont obtained a copy of the Crowdstrike driver they pushed to customers via automatic update, saying: “I don't know how it happened, but the file isn't a validly formatted driver.”
This suggests (surely) that what was tested by Crowdstrike is not what eventually got pushed live; i.e. real issues with deployment infrastructure.
Cloud users as well as simple Windows endpoints were also widely affected: Windows Instances, Windows Workspaces and Appstream Applications with Crowdstrike's agent running on AWS EC2 Windows failed and Crowdstrike’s manual mitigation did not work for such users. (AWS has its guidance here.) AWS services and network connectivity were not affected by the event and continued to operate normally, we have learned.
Azure virtual machines wordsrunning the CrowdStrike Falcon agent also got stuck in a restarting loop. Microsoft's guidance is here.
As one customer posted: “70% of our laptops are down and stuck in boot, HQ from Japan ordered a company wide shutdown, someone's getting fireblasted for this shit” – unfortunately in too many places, security functions are going to bear the brunt of that “fireblasting.”
The founder of Thinkst and canary tokens creator Haroon Meer noted gently: “This is going to remind a bunch of cybersecurity startups that adding an agent to users machines is a crazy big responsibility…”
Longer term damage: Not just for Crowdstrike
The recovery, as one commenter noted, will “leave a lot of scars.”
CISOs have had to fight a tough and uphill battle over the past decade to persuade organisations that cybersecurity is not just a cost-centre and its controls are not just frustrating blockers for developers or executives.
A close read of this week’s legal decision in the SolarWinds case shows a CISO, for example, persistently flagging egregious security issues but seemingly unable to drive meaningful change, or persuade a firm that the risk they were taking in not fixing them would have severe impact.
One of the most rudimentary of security controls, MFA, for example, is often resisted fiercely at many levels in organisations for causing friction.
Crowdstrike giving ammunition to those resistant to EDR or other security toolings will help nobody in the industry, other than sales people no doubt already preparing their “robust alternative to Crowdstrike” pitches.
The incident also raises questions about the risk of automatic software updates at such scale. Disabling them may cause failures in some security audits and cyber insurance policy challenges, but the risk of not testing updates in a small test circle first is significant. (You can join that debate here.)
Crowdstrike has 62 of the Fortune 100 as customers and has been growing rapidly – it reported $3.65 billion in ARR last month; up 33% year over year (68% of its customers are in the US ) Some of its customers have its agent installed on tens, even hundreds of thousands of their endpoints.
CEO George Kurtz said, with no hint of apology: “CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted.
“This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed,” he said.
“ We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website. We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.”
Spare a moment, meanwhile, for those at the coalface of recovery.
"Fixing this CrowdStrike issue will require basically a human visit to every machine. Some of the machines will not be able to get into the recovery environment, and require a USB stick boot. Centrally fixing this is not possible it happens before anything loads" suggested the security expert who posts as @swiftonsecurity. "This is a real Black Swan Event, though it has happened on smaller scales before and is a known risk of doing things with kernel drivers."