My home Linux server with all my files had a hard drive failure, but I happily came through mostly unscathed. Your hard drive is going to fail someday too. Backups and a rescue plan will save you, maybe these notes will help.

My computer started acting weird; empty web page loads, ssh failing to connect, random permission denied errors on file access, etc. I couldn't figure it out until I saw the dreaded error in my syslog, hda: drive not ready for command. The moment you suspect drive hardware failure, turn off the drive. You may have some hope still.

After a two day wait for a new drive I unplugged all my old disks and reinstalled Debian. Then I plugged in the old drives one by one to copy the files off via rsync -a. I left the the known bad drive for last. Once one drive fails I don't trust any of the others of the same age.

I'd been faithfully running rsnapshot and so had a backup of all my most important files on a second disk. Unfortunately I wasn't backing up everything, leaving me to wonder what I might have lost. rsnapshot on Debian backs up /home, /usr/local, and /etc by default. That seems like it'd be enough but it leaves out /var which it turns out has important things like MySQL databases and Apache logfiles. I'd also manually excluded my MP3 collection from the backup and I sure as heck wasn't going to re-rip 60 gigs of CDs.

So after getting most everything back from the backup and the other disks I set to recover the dead drive for the last bits of files. I guessed it was a mechanical failure so I tried freezing the drive first but the motherboard wouldn't even recognize the disk. Then I swapped hard drive logic boards; fortunately I had two identical disks. The Frankensteined disk worked and I copied all my old files off. If the disk had a more pernicious failure I would have tried GNU ddrescue to make a copy of the disk image while skipping around the bad parts of the drive.

So now I've got all my files back but I still have a mess. Since I wasn't backing up the Linux install itself I've lost track of what packages I had installed before. And while I have my /etc backup I still have to manually reconfigure all the software. So going forward I plan to follow jwz's advice and get a second disc that's a full mirror via rsync.

What about RAID? No. RAID 1 mirroring is the only sensible thing for a home setup and even that is bad. What if your RAID controller fails? What if a disk fails and you don't get notified? What if you do a rogue rm -rf / and your RAID controller faithfully deletes files on both disks? A versioned, time delayed backup like rsnapshot provides is way more valuable.

Your hard drive will fail, it's just a matter of time. Do you have a backup and recovery plan?

tech
  2007-10-21 18:26 Z