Loadshedding trashed my server’s primary and backup data drives: This is how I recovered them using TestDisk
What a week! My car is also misfiring on a cylinder, and this morning I discovered both data drives on my home server showed “unallocated partition and filesystem”. This I discovered after plugging them into my desktop computer, and they did not even show up as drives. I managed to see the lack of partition and filesystem when using GParted (partition manager) to view them. This was not good!
The first rule of data recovery though is “Don’t Panic” and also don’t write anything or try formatting the drive. Invariably, the actual data is still on the drive, as usually it is the partition or boot sectors that are damaged in some way.
My home server boots off an SSD drive and has two 4 TB USB drives connected. One USB drive is the primary data and stores the container volume data (the config and working data for the Docker containers that run off the boot drive), and the second USB drive is a daily Rsync of all the data on the primary drive.
So, I made a cup of coffee while my heart rate was returning to normal and thought about the problem. At first, I thought it could be a hack as why would both drives fail together, but the boot drive was all OK and the logs showed nothing. My server does sit behind Cloudflare, a home router firewall, as well as Nginx Proxy Manager, and has Fail2Ban running. So, I began to think this was maybe not the case.
The logs showed fails at around 00:30, and I recalled the grid power came on around 00:10 (this is in South Africa) and may have had some sort of power spike. I thought originally this was unlikely though, as everything runs off a solar inverter, and it should have cleaned anything from the power side. The other computers were all off, but the system has survived hundreds of such power cycles over the last few years without issues at all.
But after neither drive showed any sign of life after being connected to my desktop PC, I found another USB-SATA adaptor which I connected up to one of the drives. I could now hear it spinning, but Gparted showed the unallocated partition and file system message. But this was good, as at least the drives still spun.
So, I tried TestDisk and its job is initially to search for partitions on the drive, which it did find, and then it allowed me to start browsing files. So that was great, as the files did show. What TestDisk then can do (amongst many more functions) was to then rebuild the partition table. After that, the drive appeared in my file manager, and I could view/copy the files. TestDisk may seem intimidating, but it is wizard driven with suggestions and advice, so you just really need to read each screen carefully.
But putting the original USB-SATA adaptor on again, seemed to once again trash the partition table. So, after another recovery, I used the backup USB-SATA adaptor that I still had and could place that drive back in the server, and it was working perfectly on the primary data disk now.
So off to the shop and I had to buy two new drive enclosures (that have the USB-SATA adaptors inside). I fitted the backup drive to one of those enclosures, and it is also now visible. OpenMediaVault was just moaning about the change of drive ID (supposedly because of the different SATA controller) but it settled down, and I could run a fresh backup to the drive.
So, in conclusion, it seems the USB-SATA adaptors were both trashed by probably some form of power issue that struck through the USB ports (although the main SATA SSD drive was fine). So, I’ve dusted off a disused APC UPS out of the cupboard and will rather run that server off that UPS to add some more cushioning for it.
I could consider again running a backup, possibly over the Ethernet network at night, to better separate the storage devices.
TestDisk is OpenSource software and is licensed under the terms of the GNU General Public License (GPL v2+). It is powerful free data recovery software that was primarily designed to help recover lost partitions and/or make non-booting disks bootable again when these symptoms are caused by faulty software: certain types of viruses or human error (such as accidentally deleting a Partition Table).
See https://www.cgsecurity.org/wiki/TestDisk
#Blog, #backups, #dataloss, #opensource, #technology, #TestDisk