Everyone always talks about "backing up your data" which is of course a good thing. In fact, not backing up can be considered an act of corporate negligence as it takes very little time for the data to quickly exceed the value of whatever holds it. Replacing a hard drive is cheap and easy. Replacing a thousand iTunes, not quite as quick and easy.
There's the other side to a backup process. It's called business resumption or data restoration. It's what you do after something bad happens.
Yesterday was a good example of when something simple turns into something ugly.
About a week ago we had a technical hiccup on the server. We made some quick checks; all seemed fine. We were busy and nobody was complaining. About 2am yesterday, one of our servers was auto-updated with some new software. The update overwrote/reset something that made certain services stop. e.g., http is a service. When you ask for a web page, the web server software listens and responds via http. We tried to restart the services, but they were not behaving.
Without getting into the details, we asked the data center for help and within a few hours, the server was back, but our data wasn't. What we didn't know was that the hiccup a week previous was related to our backup system. When we went to restore our data, it was corrupt so we were backing up garbage.
Restoring bad data is like going back in time. Ever write a long post and then have your browser crash? Imagine a one that you wrote for five days.
So, moral of the story:
1. Don't ignore hardware/software warning messages. This is true with computers just as it is in life.
2. Make sure your backups contain data that prevent you from time travel. (i.e., run a test to make sure they work)
3. Have a written business resumption process so that when something does happen, you have a reliable checklist that anyone can follow.
PS. I realize that most don't run their own servers, so then replace "server" with "laptop" or whatever your primary computer is where important data resides. Everything above applies to any device that holds data that exceeds the value of the container.
Comments(6)