Is piping ‘dd’ through gzip so much faster than a direct copy

backupddgzipperformancepipe

I wanted to backup a path from a computer in my network to another computer in the same network over a 100 Mbit/s line. For this I did

dd if=/local/path of=/remote/path/in/local/network/backup.img

which gave me a very low network transfer speed of something about 50 to 100 kB/s, which would have taken forever. So I stopped it and decided to try gzipping it on the fly to make it much smaller so that the amount to transfer is less. So I did

dd if=/local/path | gzip > /remote/path/in/local/network/backup.img.gz

But now I get something like 1 MB/s network transfer speed, so a factor of 10 to 20 faster. After noticing this, I tested this on several paths and files, and it was always the same.

Why does piping dd through gzip also increase the transfer rates by a large factor instead of only reducing the bytelength of the stream by a large factor? I'd expected even a small decrease in transfer rates instead, due to the higher CPU consumption while compressing, but now I get a double plus. Not that I'm not happy, but I am just wondering. 😉

Best Answer

dd by default uses a very small block size -- 512 bytes (!!). That is, a lot of small reads and writes. It seems that dd, used naively in your first example, was generating a great number of network packets with a very small payload, thus reducing throughput.

On the other hand, gzip is smart enough to do I/O with larger buffers. That is, a smaller number of big writes over the network.

Can you try dd again with a larger bs= parameter and see if it works better this time?