Redundancy, Redundancy, Redundancy

Michael P. Mar 21
No Comments »

If your business, or any part of your business, relies on being online, you need to have a backup and disaster recovery plan in place and tested. It doesn’t necessarily have to be expensive or complicated, at least not at first. Of course, the more critical your site / application / server is to your business, the more elaborate your redundancy needs to be. (this related post discusses a way to estimate and calculate how much you should be willing to spend on redundancy)

I learned this the hard way back in 2004, at the cost of a business. I had helped start a local online real estate advertising site 4 years before, which was doing very well. We had over 1,000 visitors a day, and the business was producing over $100,000/year in revenue - more than $2,000 a week. We had just replaced the old “static” site with a new high tech form-based one that made adding new properties much simpler. But due to other bad decisions, unrelated to this article, we’d chosen a technology that meant we had to move our hosting from a shared environment, where someone else was managing the server, to a full-blown self-hosting arrangement.

There was a team of 3 involved in developing and running the site, and we knew enough to know we needed backups, so we got a tape drive for the server and figured out what programs we needed to use to backup everything. But in the midst of the site and server transition, things were changing so fast that we didn’t bother setting up the backups to run yet.

In February, the dust was starting to settle, and we had an annual planning meeting to set priorities for the coming year. I had a very strong feeling that one of the first and highest priorities coming out of that meeting needed to be finishing and testing the backup system, which I assigned to the lead programmer / system admin at that meeting. We all talked about it, were in agreement, and moved on.

To my great discredit, I never followed up on that assignment. Less than a month later, our server was hacked and taken over by a Russian spammer. At first I wasn’t concerned, because “we have our backups, right? We can recover quickly.” Our designated system admin informed me, sheepishly, that no, the backups weren’t running because he’d run into an obstacle and hadn’t gotten around to finishing it yet. So began a 2-week nightmare.

The site was totally down and inaccessible for those 2 weeks, as we worked literally around the clock to rebuild the server configuration and recover files as best we could. During that time, there was *no* new business (the signup process was on our website), and customers who had paid to be advertised on our site were furious and some were demanding at least partial refunds. With good management we may have been able to fully recover, but there were other factors later, events set in motion by those 2 weeks of being offline, that gradually killed the entire business. All for lack of a backup.

And I wasn’t the only one; in 2009 there was another high profile backup failure that nearly killed the social content site ma.gnolia.com.

Leave a Reply