Monday, October 23, 2006

Why all the testing in the world can't protect you from Everything

In the light of my recent blogs, I thought this was worth relating.

We have one very critical system running on the Domino server. Everything else could fail for a while without problems, but not this one system.

The system has had the most extensive testing possible at our environment with a several test periods of several weeks each. This may not sound much, but it is a long time considering that the system isn't overly complex.

The way this system works is that requests are submitted over the internet for number allocations. These are processed internally and passed through a manual approval phase. Upon approval, there's a certain amount of time that must elapse before it is legal to use these numbers in production environments.

Recently one of our people had a clock problem on their PC. They fixed it themselves (our policies don't restrict users from touching their PC clocks - though the clocks all re-synch at startup).

In fixing the problem, this user managed to change the date forward by one month and one day. All approvals done by that person on that day therefore had the wrong date.

Now.. I know that this is fairly easy to fix, but

  • What if they'd changed the date BACK instead of Forward (harder to identify the problem files).

  • What if nobody had noticed?

I know that using stronger policies or stronger validation would avoid the problem. If we had these, I'd no doubt be writing about a different set of problems.

The point is that testing and user restrictions can only go so far... a good DRP is a must.

No comments: