Which archival formats efficiently extracts a single file from an archive

archivingcompressiontarzip

Extracting a single file from a zip file is a fast operation, so I assumed this would be true for TAR as well, but I learned that even though a TAR file is without compression, it can take a looong time for a file to be extracted. I had used tar to backup my home folder on OS X, and I then needed a single file. Since tar doesn't know where the file is, it needed to scan the entire 300GB file before being able to extract. This means TAR is a terrible format for most backup scenarios, so I'd like to know my options.

So, which archival file formats are suitable for quickly extracting a single file?

Even though this question isn't really about compression, I don't mind answers listing formats that combine archiving and compression (like zip), in which case "solid compression" will matter.

Best Answer

It sounds like speed & efficiency of extraction are your main concerns, and I'm assuming you're using linux or macOS so want to preserve special file attributes (the ones zip & 7z ignore). In that case, an excellent archive format would be:

  • An ext[2/3/4] filesystem - Just copy the files somewhere, then extracting a single file is as quick & easy as mounting & reading the original file. You could put the whole archive filesystem inside a single archive file if you wish, just create a file big enough & format it & mount it (don't even need the -o loop option anymore).

    Pros:

    • A nice bonus is you can easily add encryption (LUKS) to the whole archive file too, or any other encryption the filesystem supports (eCryptFS, EncFS, etc).

    • You can also use rsync-based backup solutions easily.

    • It's easy to add/delete files (up to the overall archive file's size).

    Cons:

    • If using a single archive file, you have to pick it's size before adding files, and it doesn't dynamically change size.
    • It's still possible to expand or shrink the entire archive even if it's in a single file, but you need tools like resize2fs to shrink the filesystem, then truncate to shrink the file (or vice versa to expand).
  • The same filesystem you're already using, in case you're using macOS and it likes something other than ext. I'm pretty sure macOS's mount command works with a single large archive file too.

If you do want some compression also, that's usually where the solid archives & slow reading comes in. Some filesystems support compression directly (btrfs, reiserfs/reiser4, planned for ext?) but I'd just go with:

  • SquashFS - It might be the compression King, saves file attributes, and allows quick extraction of a single file (mounting & browsing of every file in fact). It's great for archives too, and has adjustable levels of compression, use it.

    Or perhaps combine it with incremental backups & overlay mounts for a nice "partial backups but full files" solution.

    A con is it's impossible to increase or shrink the size of the archive, or add/delete files.

    Or just use an existing backup product (Time Machine?).

If you really wanted to use an archive like 7z/zip anyway, but still keep the file attributes, you could tar each file individually (saving the attributes) then store the separate tar files in a 7z/zip archive. It needs an extra step with more hassles, but would let you easily extract a single (tar'd) file, and expand or shrink the archive without re-compressing everything (if it's not a solid archive).