Preparing for 'black swan' IT events

SolarWindsSoftware

By Leon Adato, Head Geek, SolarWinds
Monday, 14 September, 2015

About a year ago, well-known US statistician Nate Silver famously got it wrong. Really, really wrong. While known for his ability to adeptly predict everything from elections to baseball finals, Silver was completely thrown by Germany’s win over Brazil in the World Cup. As he described it, the result was completely unforeseen and unforeseeable — a ‘black swan’ event.

The tendency in the face of these things is to do as Nate did, and focus on what went wrong with the prediction rather than what caused the event.

In business, when the unforeseen occurs, what often happens is management acquires a dark obsession with post-analysis. Meetings are called under the guise of ‘lessons learned’ exercises, with the express intent of ensuring ‘this’ never happens again. Time is spent not on figuring out what went wrong, but instead, why the assumedly informed prediction failed.

To be clear, I’m not saying that after a failure, business should just blithely ignore any lessons which can be learned. Far from it. But what Nate’s observation and other black swan events teach us is that one of the first things an organisation should do is determine whether the failure was predictable in the first case. If it isn’t, your efforts and post-analysis are much better spent elsewhere.

There’s little doubt that in the face of black swan events there is a natural urge to protect ourselves, to ensure this kind of impact on our business can never again occur.

But I’m here to tell you that that urge is a waste of time and valuable resources. Don’t believe me? Let’s take a not-so-imaginary case of a company that has a single, spectacular failure that cost it $100,000. Management immediately sets up a task force to identify the root cause of the failure and recommend steps to avoid it in the future. It takes more than 100 man-hours to investigate the trigger. Let’s be conservative and say that the cost is $50 per hour times five people times 100 hours. A total of $25,000. And let’s be completely optimistic and say that at the end of the effort, the problem is not only identified but code is in place to predict the next one. The company has expended $25,000 to devise a solution which may (or may not) predict the occurrence of a black swan exactly like the one that hit before.

Compare that to a fairly common problem — disk failures. Drives fill up, or throw errors until they are unreadable, or just completely stop. But at this not-quite-fictitious company, there was no alerting for this. Disk space was monitored, but not alerted on. Alerts on disks which stopped responding or disappeared was simply not done.

A fairly simple set of alerts could save a moderately sized company as much as $140,000 per year. And disk failures are no black swan. Even Nate Silver would agree they are a sure thing.

Leon Adato is a Head Geek and technical evangelist at SolarWinds, and is a Cisco Certified Network Associate (CCNA), MCSE and SolarWinds Certified Professional. His career includes key roles at Rockwell Automation, Nestlé, PNC and CardinalHealth, providing server standardisation, support and network management and monitoring.

Content from other channels on our network

Former contractor faces court for alleged payment breaches

Workers placed at risk of electric shock

Clean Fuel, Reliable Uptime: Diesel Monitoring in Data Centres

Treoflex TA6 and SKINTOP®: Built for Demanding VSD Conditions

Reliable Protection for Distributed Infrastructure Environments

Semiconductor chips enable biomolecular sensing

Light-controlled semiconductor device directs electrons

Acoustic chip advances 6G and satellite RF systems

Graphene hydrogel improves wearable biosensors

Fine-tuning electrolytes for lithium metal batteries

Ericsson to bring private 5G to Queensland's rail network

Softil and Flight Tactics announce TAK/MCX integration for iOS

Geotab secures TCA type-approval for GO9B device

Google spinout Taara launches wireless optical link planner

Mission-critical accessories specialist JUMA Communications joins TCCA

Preparing for 'black swan' IT events

AI agents set to reshape observability as businesses rethink their digital future

Stop treating AI like a senior exec: it's just a brilliant intern

From stone age to open source, community is key

Content from other channels on our network