Saturday, February 12, 2011

The Perfect Storm: How a Combination of Random Events can lead to Disaster

No, this post isn't about that Wolfgang Peterson & George Clooney movie but it does borrow from the concept in a business sense.

This week we skirted close to disaster but the rest of the business was completely oblivious to the danger.

I wasn't worried, it was all under control but my boss said to me later, "we nearly had the perfect storm today". I thought about his words and realised that he was right.

I was wondering just how many times we, and other businesses nearly have the perfect storm and don't even realise.


What is a Perfect Storm?
In the book and movie of "The Perfect Storm", the storm wasn't a cyclone or any kind of normal severe storm event. It was just a normal storm in which several other conditions were perfect. It's simply the combination of several unlikely events which results in disaster.


Our Experience
Factor 1: Backup
This week we've struggled a bit with backup. We had a faulty tape which meant that we missed out on our backup job for one night. I replaced the tape for the next night and tried to run the backup again. It failed again - this time because the previous night's tape had left a bit of gunk in the drive. I cleaned the drive but although the system was ready for backup, we now had two days worth of "unsaved work".

Factor 2: Backup ISP and Server
Like many critial businesses, we have offsite redundancy. In our case, we have an offsite domino server which is part of our cluster. Our offsite provider had told us a little while ago that they needed to switch ISPs and re-run our infrastructure. We were told a month ago that there was no pressing urgency. Of course last week, on the night of our last successful backup, we were told that it was starting to become urgent.

You can imagine our surprise when we came into work on Thursday to find ourselves disconnected. Our main systems worked, the internet worked, everything was Ok. It's just that our redundant server was no longer accessible.

Factor 3: Board Day
I work at one of those lucky companies who don't have board members in attendance every day. In fact they're usually only around on monthly basis. Guess what.... the day of the storm was a board day.


A Lucky Escape
We were careful, we took precautions. Earlier in the week we'd had some work done on our main production databases but on that day, they were out of bounds.

Like I mentioned earlier, nothing happened. The production servers all stayed up, we eventually got our offsite server back and we got a backup that night. All was well but you have to think...

If anything had gone wrong on the day, we had no backup and no offsite server. We were hosting one of the most critical meetings of the year and the most important people in our company were all onsite watching.

The question is; would you recognize the perfect storm if it started forming near your company?

No comments: