Linux – Container with in-built compression, which automatically adjust its size

backupcompressioncontainerlinuxrsync

I try to find an efficient way of rsyncing contents of an ext4 file system as a part of a regular backup, yet I consider a decent compression and a minimal required space.

I can just use plain rsync and then tar/gzip the resulting directory, but compression itself will be orders of magnitude slower, than the preceding rsyncing.

I cannot use squashfs and its likes because they are read-only.

I can make a partition of a special type for this backup, with an in-built compression, such as btrfs or reiser4 but I must to create it with a specific size and it will not scale.

I wonder if there any technology of a container with an in-built compression, which transparently and automatically adjust its size according to the volume of data, rsynced to it?

By the way, I use Debian GNU/Linux.

Best Answer

  • @Tetsujin gave me a right direction, OS X's sparse bundles/images do have analog in Linux and this is sparse files.

    Sparse files grow as the data in them grows. They can contain any Linux filesystem, including any modern variants with in-built compression, such as btrfs.

    The following shows how to create a sparse compressed btrfs image. btrfs support in Debian and its derivatives (such as Ubuntu) can be enabled by the installing of btrfs-tools packages (sudo apt-get install btrfs-tools). I have added a sparsed ext4 image as well to compare speed and size. All operations were performed on Debian 7.8 Wheezy (oldstable as of 30 April 2015).

    1. Create empty sparse files of any size. Let it be 5 terabytes:

       me@wheezy:~$ truncate -s 5T ext4.sparse btrfs.sparse
      
    2. Format them

    to ext4:

        me@wheezy:~$ mkfs.ext4 ext4.sparse
        mke2fs 1.42.5 (29-Jul-2012)
        <...>
        Allocating group tables: done
        Writing inode tables: done
        Creating journal (32768 blocks): done
        Writing superblocks and filesystem accounting information: done
    

    to btrfs:

        me@wheezy:~$ mkfs.btrfs btrfs.sparse
    
        WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
        WARNING! - see http://btrfs.wiki.kernel.org before using
    
        fs created label (null) on btrfs.sparse
                nodesize 4096 leafsize 4096 sectorsize 4096 size 5.00TB
        Btrfs Btrfs v0.19
    
    1. Create mount points:

       me@wheezy:~$ mkdir ext4_mount btrfs_mount
      
    2. Mount them. Do not forget loop option:

    ext4:

        me@wheezy:~$ sudo mount -o loop -t ext4 ext4.sparse ext4_mount
    

    btrfs (don't forget compress option (can be zlib or lzo)):

        me@wheezy:~$ sudo mount -o loop,compress=lzo -t btrfs btrfs.sparse btrfs_mount
    
    1. That's it! File systems are created and mounted, appear as 5 TB to the OS, but actually occupy very little space:

    df:

        me@wheezy:~$ df -h | grep _mount
        /dev/loop0                         5.0T  189M  4.8T   1% /home/a/ext4_mount
        /dev/loop1                         5.0T  120K  5.0T   1% /home/a/btrfs_mount
    

    du:

        me@wheezy:~$ du -h *.sparse
        4.3M    btrfs.sparse
        169M    ext4.sparse
    
    1. For a purpose of testing I've created a huge 1.3 GB text file with a repetitive pattern. It will be cp'd to both newly created file systems:

    ext4:

        me@wheezy:~$ time sudo cp /store/share/bigtextfile ext4_mount/
    
        real    0m12.344s
        user    0m0.008s
        sys     0m1.708s
    

    btrfs:

        me@wheezy:~$ time sudo cp /store/share/bigtextfile btrfs_mount/
    
        real    0m3.714s
        user    0m0.016s
        sys     0m1.204s
    
    1. As have been seen in the previous step, btrfs proved to be a lot faster during a transfer of a highly compressible data, compared to the good ol' ext4. Let's check filesystems' sizes afterwards:

       me@wheezy:~$ df -h | grep _mount
       /dev/loop0                         5.0T  1.5G  4.8T   1% /home/a/ext4_mount
       /dev/loop1                         5.0T   46M  5.0T   1% /home/a/btrfs_mount
      
    2. btrfs proved to be a lot more space efficient. At last, let's check the sparse files' sizes as well:

       me@wheezy:~$ du -h *.sparse
       50M     btrfs.sparse
       1.4G    ext4.sparse
      

    That's it. If it's needed, sparse files may be further enlarged. btrfs can be resized online as well.

    Cool solution for regular rsync backups. But don't forget to backup these files more traditionally as well, since btrfs is still an experimental filesystem.

    Further info on Arch Wiki: https://wiki.archlinux.org/index.php/Sparse_file and https://wiki.archlinux.org/index.php/Btrfs