From a single update to global chaos: lessons from the CrowdStrike outage

Tenable APAC

By Nathan Wenzler*
Tuesday, 13 August, 2024

The recent global outage attributed to CrowdStrike has profoundly impacted the IT world, being labelled by some media outlets as the largest IT service disruption in history. This incident led to mass flight cancellations, forced numerous businesses to close and caused 8.5 million devices to experience the ‘Blue Screen of Death’, resulting in catastrophic consequences and costing Fortune 500 companies an estimated AU$8.1 billion due to a single faulty update.

In the wake of this colossal failure, IT leaders must examine the lessons learned and develop strategies to prevent similar incidents.

The event underscores the monumental impact a single faulty update can have globally. Unlike targeted cyber attacks, this incident was a mistake highlighting the weaknesses that can compromise even the most trusted IT environments. The necessity for rigorous testing and validation processes is evident. Many organisations are still dealing with the fallout, and without specific details on the root cause, most leaders are waiting for more information before making significant policy changes. This cautious approach emphasises the need for more rigorous acceptance testing of updates before deployment, though the exact parameters for this testing remain undefined. Currently, recovery takes priority over strategy revision.

Senior executives and board members have largely remained silent on cybersecurity risks in the immediate aftermath, focusing instead on complete recovery before addressing broader IT operational risks and failures. This highlights the need for a proactive stance in the face of uncertainty. The incident has sparked debate over the efficacy of automated security updates, with future actions dependent on CrowdStrike’s response and process changes. If these measures are deemed inadequate, organisations might adopt their own testing protocols, which could be time-consuming and costly. A shift away from automatic updates is likely to avoid repeating this situation.

The outage also emphasised the need for robust insurance covering downtime and recovery costs. Cyber insurance is essential for businesses, providing financial protection against incidents like data breaches and ransomware attacks. Proactive risk management is also incredibly important in these matters. Companies must assess their insurance coverage, understand exclusions and limits, and work with experienced brokers to select the right policies.

Addressing potential liabilities, such as contractual and regulatory obligations, can mitigate legal and financial risks. A recent poll from Tenable found that 44% of cybersecurity leaders saw insurance premiums drop by 5–15% after implementing proactive risk management strategies, highlighting the financial and broader benefits of preventive cybersecurity practices.

This incident also raises questions about the potential for new regulations that may be issues in response to this global outage. Ensuring rigorous testing and quality assurance before deployment is a longstanding best practice in cybersecurity. Balancing the historical reliability of automatic updates with thorough testing is risky for each organisation, making a one-size-fits-all regulatory approach unlikely.

The CrowdStrike outage is a stark reminder of the weaknesses that can lurk even within the most reputable security firms. For IT leaders, the path forward involves rigorous testing, enhanced monitoring, redundancy, accountability, collaboration and adherence to regulatory standards. By learning from this incident, organisations can build resilient IT infrastructures that are better equipped to handle future challenges, ensuring a more secure technological environment.

*As the Chief Security Strategist for Tenable, Nathan brings his expertise in vulnerability management and cyber exposure to executives and security professionals around the globe in order to help them mature their security strategy, understand their cyber risk and measurably improve their overall security posture. Nathan has over two decades of experience designing, implementing and managing technical and non-technical security solutions for IT and information security organisations within both the public and private sectors.