Which archive file formats provide recovery protection against file corruption


I use my external HDD to back up my files, by putting them into large archive files.

I have thousands of tiny files, and put them into archives of 500MB to 4.2GB in size, before sending them to the external HDD. But, does one hard disk failure destroy the whole archive or only one file in the archive? I fear that one flipped bit could render large parts of the archive useless.

Things like CRC checks can alert you to the existence of corruption, but I am more interested in the ability to recover the undamaged files from a corrupted archive. What archive file formats would provide the best ability to recover from such failures, either through the native design of the archive structure or the existence of supplementary recovery tools? Is there any difference in this capability between zip and iso files?

Best Answer

Given that a damage to a directory part of any archive could potentially render entire archive useless, your best bet would be to add separate step to your backup process to generate so-called parity files. In case if a data block in original file gets damaged, it can be reconstructed by combining data from the parity file with valid blocks from the original file.

The variable there would be how much damage you'd like to be able to repair from. If you want to protect against a single bit flip, then your parity file will be just 1 bit in size. If you want something in a tune of a disk sector size, then obviously it'll obviously cost you more.

There's a big theory behind this (see Forward Error Correction) and it is widely used in practice. For example, this is how CDs can withstand certain degree of scratching and how cell phones can maintain reasonable call quality over lossy connections.

Long story short, take a look at .par files.

Related Question