Windows – Decompressing these .gz files gives strange/unexpected results, can you get it working


I have a collection of mailing list archive files all gzip'd, they're in a nested directory structure that starts with what appears to be a blank folder/jargon name.

The files are here:

It appears to look like this at the header of each file:


I've tried using 7Zip, WinRAR and gzip on Windows 7, via the command line.

Also gzip on OS X, with the same results, am I missing something obvious? I haven't been able to get to rebuild out the directory structure, the result appears to be a merging of the directory structure and the file.

If you get it working can you let me know what:

  • Operating System you used
  • Compression/Decompression tool
  • The command line arguments or automation method

I want to do this in 1 go, or automated, not having to enter each file through a GUI application.

Best Answer

The file is gzipped twice. Try these commands on Mac OS X or Linux:

gzip -d 2011-May.txt.gz

You should end up with the file 2011-May.txt which is plain text. On my system, wget is properly saving a singly-gzipped file which decompresses to plain text.

If you have the double-gzipped file already, you can run this command:

gzip -cd 2011-May.txt.gz | gzip -cd > 2011-May.txt

This will decompress the file twice and write it. Alternatively, on Windows 7, you should be able to use 7zip to decompress the gzipped file, then open it again with 7zip and decompress it again. You should be left with the uncompressed file.

If you have a large number of files like this in one directory, you could do something like this:

for file in *.gz; do mv $file $file.gz; done;
gunzip *.gz
gunzip *.gz

This will rename all files that end in *.gz to *.gz.gz, then run gunzip on them twice.