Ubuntu – ZFS grub-probe error failed to get canonical path of /dev/DISK_NAME

grub2zfs

Background:

  • Ubuntu Xenial
  • ZFS installed for system disk (so, you know: rpool/ROOT)
  • System runs fine, but when kernel updates, grub-probe barks error mentioned above
  • I would rather not reboot right now

There's a discussion here about grub-probe and how it should "just be better", but this helps until that comes along. I got the idea from that discussion.

More detail: a complete instance of the error (for my system) looks like:

/usr/sbin/grub-probe: error: failed to get canonical path of `/dev/ata-ADATA_SP550_2G1520009135-part1'.

This is buried in a slew of detail spouted forth from an apt command to install graphics drivers (but that's not important).

This disk corresponds to one of my ZIL partitions. I added ZIL and cache after the install completed, so I suppose that's why I didn't see the problem before. I haven't yet rebooted, and that's why I'm seeing the problem at all. Yes, you can reboot to fix all this, but assuming you don't want to do that, read on:

If I look in /dev, I see links to all my ZFS disks that look like:

lrwxrwxrwx  1 root     root           4 Sep 16 23:31 ata-WDC_WD10EARS-00Y5B1_WD-WMAV51436394-part1 -> sdc1
lrwxrwxrwx  1 root     root           4 Sep 16 23:31 ata-WDC_WD20EZRX-00D8PB0_WD-WCC4MK86SWX7-part1 -> sdd1
lrwxrwxrwx  1 root     root           4 Sep 16 23:31 ata-WDC_WD20EZRX-00D8PB0_WD-WCC4N1085683-part1 -> sde1
lrwxrwxrwx  1 root     root           4 Sep 16 23:31 ata-WDC_WD2500JS-22MHB0_WD-WCANK4053187-part1 -> sda1

… but notably none for the ZIL partitions.

I can test the situation by running:

$ sudo grub-probe /
grub-probe: error: failed to get canonical path of `/dev/ata-ADATA_SP550_2G1520009135-part1'.

So the question is: how to fix this problem so grub-probe behaves?

Best Answer

There is an environment variable that fixes this. The issue from my reading seems to be that Grub likes the idea of 'supporting' zfs but not the idea of fixing issues related to zfs in Grub. Specifically its poor error handling in terms of finding things.

For instance, the grub tools that ship with Ubuntu 16.x will fail to find /boot on a ZFS volume without some user intervention, and then happily write some (but not all) needed files output from whatever utility you're using to the /boot folder that it just told you it couldn't find.

In any case...

http://list.zfsonlinux.org/pipermail/zfs-discuss/2016-June/025765.html

To check if you have commit (should see full paths):

ZPOOL_VDEV_NAME_PATH=1 zpool status

If so you can do:

ZPOOL_VDEV_NAME_PATH=1 grub-whatevs ....

You can pass the variable as input to the necessary grub utilities, or you can specify it as a shell variable in root's .bashrc or .profile or some such with...

export ZPOOL_VDEV_NAME_PATH=YES

The variable causes zpool to report full paths, rather than relative /dev paths to the disks which may or may not work properly with zfs. Grub utilities check zpool status for zfs pools to find the disks that contain them. Therefore changing the output of zpool status fixes grub.

I agree that users shouldn't have to deal with this, in reference to femulator's comment. The real solution? Same as every other open source project that languishes in bugs that never get fixed. Fork it, fix it yourself, and stop using the source project/library/whatever. The FOSS way of "firing" someone, in other words ;). Apparently Debian was aware of this particular bug seven years ago.

This was the only thing stopping me from successfully migrating a FreeBSD RaidZ boot pool to Ubuntu. If anyone else attempts something similar, the process is relatively simple, as long as you understand ZFS well enough to ignore the parts of the documentation from Grub and zfsonlinux that are wrong (such as setting your root dataset to not automount, eh...? How exactly is it going to boot then?). It's somewhat ironic that Ubuntu points out in their docs that the boot loader is Linux's most insecure 'feature', which is true I suppose, but in this case it's also Ubuntu's glaring flaw. It would have taken me an hour or two to migrate a BSD ZFS pool to another OS if I could have done it using the Sun/Solaris utilities that actually work. The problem is I had to use Linux utilities (like Grub) that don't (or barely) work at some point, so there lies the fault for the other two days I spent fixing this. Ubuntu would be a whole lot better if it didn't need grub to boot...