Linux – How to tell if ZFS (zfs-fuse) dedup/compression is applied to a particular file

filesystemslinuxUbuntuzfs

I have a zfs formatted partition using zfs-fuse for linux (Ubuntu).

I had used it for a while, and then enabled dedup and compression on it (zfs set compression=on/dedup=on). Now I think I have some files that are dedup'ed and compressed, and file that are not yet.

It was OK, but sometimes I was confused. Let's see, following command would consume almost 4GB of my zfs storage:

cp oldfile.4GB newfile.4GB

.. and this would consume almost zero:

cp newfile.4GB newfile.4GB.2

This is because the old file is not yet compressed, so dedup not happened, I think.

My idea is — if I can find old files that are not yet dedup/compressed, I can perform batch copy/rename/remove them to eliminate duplicity and redundancy. But how I can check that?

I know I can re-copy whole contents of my storage should work (even better with checking the time stamp of each file), but I'd be happier if I have zfsstat-like tool that shows some file properties.


EDIT: Verified jlliagre's tip on my environment.

First, made some dataset and directories:
$ sudo zfs create zfs/test
$ sudo install -d -m 1777 /zfs/test/orig /zfs/test/copy

Created some files:
$ yes > /zfs/test/orig/yes.1s & sleep 1; kill %1
$ dd if=/dev/zero of=/zfs/test/orig/zero.1M bs=1K count=1024
$ dd if=/dev/urandom of=/zfs/test/orig/rand.1M bs=1K count=1024

Turned compression on, and copy above files:
$ sudo zfs set compress=on  zfs/test
$ cp /zfs/test/orig/* /zfs/test/copy

Now the directories look like:
$ ls -hil /zfs/test/*
/zfs/test/copy:
total 1.5K
10 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:30 rand.1M
11 -rw-r--r-- 1 kimura kimura  40M Mar  2 01:30 yes.1s
12 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:30 zero.1M

/zfs/test/orig:
total 42M
9 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:29 rand.1M
7 -rw-r--r-- 1 kimura kimura  40M Mar  2 01:29 yes.1s
8 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:29 zero.1M

And zdb tool shows some information:
kimura@kimura-desktop:~$ sudo zdb zfs/test 
Dataset zfs/test [ZPL], ID 196, cr_txg 108306, 44.2M, 12 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K    16K    16K   37.50  DMU dnode
        -1    1    16K    512     1K    512  100.00  ZFS user/group used
        -2    1    16K    512     1K    512  100.00  ZFS user/group used
         1    1    16K    512     1K    512  100.00  ZFS master node
         2    1    16K    512     1K    512  100.00  ZFS delete queue
         3    1    16K    512     1K    512  100.00  ZFS directory
         4    1    16K    512     1K    512  100.00  ZFS directory
         5    1    16K    512     1K    512  100.00  ZFS directory
         6    1    16K    512     1K    512  100.00  ZFS directory
         7    3    16K   128K  39.8M  39.8M  100.00  ZFS plain file
         8    2    16K   128K  1.00M     1M  100.00  ZFS plain file
         9    2    16K   128K  1.00M     1M  100.00  ZFS plain file
        10    2    16K   128K  1.00M     1M  100.00  ZFS plain file
        11    3    16K   128K  1.41M  39.8M  100.00  ZFS plain file
        12    2    16K   128K      0   128K    0.00  ZFS plain file

I can see "yes" and "zero" are well compressed.

Best Answer

You can get deduplication overall statistics with the zdb -D poolname command.

For per file compression status, it's not very straightforward but you might use this:

zdb dataset | grep plain

This will output lines looking like these ones:

     8    2    16K   128K  3.03M  5.00M  100.00  ZFS plain file
     9    2    16K   128K  3.03M  5.00M  100.00  ZFS plain file
    10    2    16K   128K  5.00M  5.00M  100.00  ZFS plain file
    11    2    16K   128K  3.03M  6.00M   83.33  ZFS plain file

The first column is the inode number, column 5 and 6 represent the size on disk and the file size, and column 7 the percentage of the file that really exists. Any file with different values in 6 and 7 and 100% as 8 are compressed.