Installing Gentoo Into a LUKS-Encrypted ZFS Root

2013-12-31 14:31 - Linux

Note, this is a 2023 rewrite of an article originally from 2013. See earlier versions at archive.org.

For the past few years I've relied on ZFS for my backup system, with atomic snapshots taken regularly, plus zfs send and zfs recv making geographic redundancy an easy extra layer. I also LUKS encrypt the disks for peace of mind, especially in the case of failed disks. This document is a detailed explanation of how I set up a brand new machine from scratch with an ecrypted ZFS root file system.

Since I first wrote this document, ZFS has gained native encryption. There are pros and cons to each approach. But for me, I've got significant data sets already on LUKS, and no desire to buy sufficient new drives just to have space to migrate to another solution. So I continue doing things this way.

Getting Started

Work through the Gentoo Handbook until the Preparing the Disks section. You may want to use my LiveCD with ZFS support as the boot medium, as I'm doing.

I've selected raidz2 across four disks, to guarantee any two drive failures does not result in any downtime nor data loss. I prefer the enhanced safety and reliability of this setup more than higher storage efficiency or even performance. You need not make the same decision, but the examples below detail such a setup.

I used to prefer a completely separate (unencrypted) boot device. I originally thought that it was "best practice" to provision ZFS on raw disks, with no partitions. I'm no longer confident that this is true, and I've come up with reasons to prefer partitions. It can be difficult to hook up enough disks, to have a separate one dedicated for the boot volume. It's common for motherboards to provide four SATA ports. If that's three ZFS volumes plus one boot disk, they're all used up. Where do you put your extra disk when swapping in a replacement or an upgrade? Even if you've got the space, there's the expense of another device, plus a separate plan for it's failure? If you've got the extra connectors and devices, maybe go for an L2ARC or ZIL. I'm skipping dedicated boot devices from here on out.

So now I'm selecting a partitioned scheme. I reserve a bit of space for the boot volume plus some for the grub bootloader (as required for GPT partitioning). (I don't, however, use EFI nor an EFI system partition.) Then I use the rest of the disk for the main data volume. You may select the size of your boot partition(s) freely.

(A quick note on the examples before this first one: The shell prompts are colored red, the inputs I type are colored green, and the rest is the output. Your output will likely differ in small details; I'm trusting you to be intelligent enough to figure that out if you're following this as a guide. But I find archiving the output still makes it easier to follow along. Your inputs may differ as well, be careful to make sure you are referencing (e.g.) the proper disk at every point!) Also note that all examples below are based on a test installation in a virtual machine.

I prefer to use GPT partitions with the data partitions labeled like crypt_something so that "something" is meaningful. (I use the model and/or serial number of the drive, so that if/when ZFS reports a problem with a volume, I'll know which physical disk it's on!) Any swap will happen inside the encrypted ZFS pool.

livecd ~ # fdisk /dev/sda

Welcome to fdisk (util-linux 2.38.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xcefdae72.

Command (m for help): g
Created a new GPT disklabel (GUID: E5CD736D-D284-8F41-A8E7-D73BD158CFB3).

Command (m for help): n
Partition number (1-128, default 1):
First sector (2048-97677278, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-97677278, default 97677278): +2M

Created a new partition 1 of type 'Linux filesystem' and of size 2 MiB.

Command (m for help): n
Partition number (2-128, default 2):
First sector (6144-16777182, default 6144):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (6144-16777182, default 16775167):
+256M

Created a new partition 2 of type 'Linux filesystem' and of size 256 MiB.

Command (m for help): t
Partition number (1,2, default 2): 1
Partition type or alias (type L to list all): 4

Changed type of partition 'Linux filesystem' to 'BIOS boot'.

Command (m for help): n
Partition number (3-128, default 3):
First sector (530432-16777182, default 530432):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (530432-16777182, default 16775167):

Created a new partition 3 of type 'Linux filesystem' and of size 7.7 GiB.
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
livecd ~ # gdisk /dev/sda
GPT fdisk (gdisk) version 1.0.8

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): c
Partition number (1-3): 1
Enter name: grub_K82746KA

Command (? for help): c
Partition number (1-3): 2
Enter name: boot_K82746KA

Command (? for help): c
Partition number (1-3): 3
Enter name: crypt_K82746KA

Command (? for help): p
Disk /dev/sda: 16777216 sectors, 8.0 GiB
Model: VMware Virtual S
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 4A317F72-1F12-824A-BFCF-B7B6362A6877
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 2048, last usable sector is 16777182
Partitions will be aligned on 2048-sector boundaries
Total free space is 2015 sectors (1007.5 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            6143   2.0 MiB     EF02  grub_K82746KA
   2            6144          530431   256.0 MiB   8300  boot_K82746KA
   3          530432        16775167   7.7 GiB     8300  crypt_K82746KA

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sda.
The operation has completed successfully.

Since we're putting ZFS in a partition, it's important to make sure that the partition start is aligned with a physical disk sector. This virtual disk has 512 byte sectors, but real drives today typically have 4,096 byte sectors. The partitions are using logical sectors, so it is possible to misalign the physical sectors. I believe the best thing to do is make sure that your main partition starts on a multiple-of-eight sector, which should be 4k aligned even for 512 byte logical sectors. In this example the data partition starts at sector 530432 which is 66304 * 8.

Above were the steps for /dev/sda with the example serial number "K82746KA" encoded into the partition labels. Do the same steps for your remaining drives. You can use sgdisk to copy the partitions to identical size drives, then tweak the names aftewards. Note that --replicate takes the destination drive.

livecd ~ # sgdisk --replicate=/dev/sdb /dev/sda
The operation has completed successfully.
livecd ~ # gdisk /dev/sdb
(... change partition names here ...)

We only use one of the boot partitions: it's the one part of the drive outside of both the LUKS encryption and the ZFS system which (in my case) has automatically managed backups. You should add a scheme for keeping the boot volume backed up as well. For now, simply put an ext4 filesystem in place.

livecd ~ # mkfs.ext4 -T small -U random /dev/disk/by-partlabel/boot_K82746KA
mke2fs 1.47.0 (5-Feb-2023)
Creating filesystem with 262144 1k blocks and 65536 inodes
Filesystem UUID: a8182f88-848a-448b-aac6-26a2fd3ee9ac
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729, 204801, 221185

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

Next, use LUKS to encrypt all data partitions:

livecd ~ # for D in /dev/disk/by-partlabel/crypt_*; do cryptsetup luksFormat "$D"; done

WARNING!
========
This will overwrite data on /dev/disk/by-partlabel/crypt_JETMMKKJ irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for /dev/disk/by-partlabel/crypt_JETMMKKJ:
Verify passphrase:
System is out of entropy while generating volume key.
Please move mouse or type some text in another window to gather some random events.
Generating key (68% done).
Generating key (87% done).
Generating key (100% done).

(... repeated for each drive ...)

I'm using the same passphrase for all disks, and suggest you do the same. Either way, LUKS will let you change/add/remove these at any point in the future. If you get stuck waiting for randomness, a ping -f from another machine to this one will generate kernel entropy.

With the encrypted volumes initialized, open them up for use:

livecd ~ # for D in /dev/disk/by-partlabel/crypt_*; do cryptsetup luksOpen "$D" "$(basename $D | sed -e 's/crypt_/vault_/')"; done
Enter passphrase for /dev/disk/by-partlabel/crypt_JETMMKKJ:
Enter passphrase for /dev/disk/by-partlabel/crypt_K82746KA:
Enter passphrase for /dev/disk/by-partlabel/crypt_WHU2I04G:
Enter passphrase for /dev/disk/by-partlabel/crypt_Y9KWTIL7:
livecd ~ # ls /dev/mapper
control  vault_JETMMKKJ  vault_K82746KA  vault_WHU2I04G  vault_Y9KWTIL7

Initializing ZFS

So now we have three encrypted disks each called crypt_something and their equivalent plaintext disks each called vault_something. We can set up a zpool with them, and data sets within that pool.

livecd ~ # modprobe zfs
livecd ~ # zpool create -m none -R /mnt/gentoo -o ashift=12 -O atime=off -O xattr=sa rpool raidz2 vault_JETMMKKJ vault_K82746KA vault_WHU2I04G vault_Y9KWTIL7

This complicated command will:

-m none
Not mount this pool.
-R /mnt/gentoo
Sets the "altroot", i.e. the temporary alternative mount point. This value is appropriate for the Handbook driven install process.
-o ashift=12
Set the block size to 4k, or 212. (If your sectors are 512 bytes (unlikely), you should omit this.)
-O atime=off
Not record access times.
-O xattr=sa
Store extended attributes in inodes rather than hidden files. Which is supposed to make Samba more performant.
rpool
The name of the pool. (I wish I had a better scheme, but I just call it the "r"oot pool.)
raidz1
The type of the pool, RAID like with one disk for redundancy.
crypt_sda crypt_sdb crypt_sdc
The devices making up this pool; ZFS can find them just by these short names (which are unlikely to otherwise exist) and their brevity will make other output easier to read.

And in good Unix tradition, produces no output upon success.

Now to create the datasets within the pool. ZFS datasets are hierarchical. I've created two top level data sets: root and tmp. The first gets regular snapshots, which get replicated off site. The latter does not, because it contains only files that are easy to replace (linux kernel source, portage) or are not worth backing up (large scratch files, media archives).

First, (create the root dataset, and within that) initialize mount points with no write permissions, to reduce the chances of accidentally putting files there. (ZFS refuses to mount into a non-empty directory, by default.)

livecd ~ # zfs create -o mountpoint=/ rpool/root
livecd ~ # for D in /home /tmp /var; do mkdir "/mnt/gentoo/$D"; chmod 0111 "/mnt/gentoo/$D"; done

Now create the rest of the ZFS data sets:

livecd ~ # cd /mnt/gentoo
livecd ~ # zfs create -o mountpoint=/var rpool/root/var
livecd ~ # mkdir var/tmp; chmod 0111 var/tmp
livecd ~ # zfs create -o mountpoint=/home rpool/root/home
livecd ~ # mkdir home/USER; chmod 0111 home/USER
livecd ~ # zfs create -o mountpoint=/home/USER rpool/root/home/USER
livecd ~ # mkdir home/USER/tmp; chmod 0111 home/USER/tmp
livecd ~ # zfs create -o mountpoint=none rpool/tmp
livecd ~ # zfs create -o mountpoint=/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/root
livecd ~ # zfs create -o mountpoint=/home/USER/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/USER
livecd ~ # zfs create -o mountpoint=/usr/src rpool/tmp/linux-src
livecd ~ # zfs create -o mountpoint=/var/db/repos rpool/tmp/portage
livecd ~ # zfs create -o mountpoint=/var/tmp rpool/tmp/var

Note that ZFS will auto-mount these data sets as they're created, at the given mount points (relative to the altroot specified at zpool creation). Fill in the actual name for "USER". Also optionally (but recommended) set up swap.

livecd ~ # zfs create -o sync=always -o primarycache=metadata -o secondarycache=none -o volblocksize=4K -V 1G rpool/swap
livecd ~ # mkswap -f /dev/zvol/rpool/swap
Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=c1e2b76d-717a-4cad-b69f-92e0a9289027
livecd ~ # swapon /dev/zvol/rpool/swap

Continue with the last few settings for our mounts:

livecd ~ # mkdir /mnt/gentoo/boot
livecd ~ # chmod 0111 /mnt/gentoo/boot
livecd ~ # mount /dev/disk/by-partlabel/boot_K82746KA /mnt/gentoo/boot
livecd ~ # chmod 1777 /mnt/gentoo/tmp

The Kernel

We're doing a standard Gentoo install now, from the installing Stage3 section. When reaching the configuring the kernel section make sure you include ext4 file system support hard-coded in (our boot partition). With the LiveCD having loaded modules for most or all of our hardware, we can make localmodconfig to rapidly generate a tidy kernel config. There's a few key settings we'll need, to enable LUKS and ZFS support. (The Gentoo wiki calls out some of these.)

Device Drivers --->
  [*] Multiple devices driver support (RAID and LVM)  --->
    <*> Device mapper support
      <*> Crypt target support
Cryptographic API --->
  Length-preserving ciphers and modes  ---<
    <*> XTS support

Be sure to also include any hardware drivers that your machine will depend on for boot (especially e.g. SATA, maybe Ethernet, maybe USB and/or mass storage if you'd like to use those in the initramfs recovery). These should all be built in to the kernel. (They could be built into the initramfs but I think it's easier to just put them directly in the kernel.) Running lspci -k will tell you the (loaded) modules that power your hardware, search for those entries in menuconfig and enable them. Install the kernel, but then we diverge at the Building an initramfs step. We'll customize ours, to handle both LUKS encryption and a ZFS root filesystem.

First install the ZFS tools (and kernel modules, via dependency).

(chroot) livecd ~ # emerge -va zfs

These are the packages that would be merged, in order:
...
(chroot) livecd ~ # rc-update add zfs-mount sysinit

When changing kernels in the future, be prepared to emerge @module-rebuild to make sure these out-of-tree modules are built for the kernel being used.

Boot Configuration

We've completed the Kernel section, continue with the handbook at the Configuring the system section. The fstab should look something like:

# <fs>                  <mountpoint>    <type>  <opts>          <dump/pass>
/dev/disk/by-partlabel/boot_K82746KA /boot           ext4    noauto,noatime  1 2
/dev/zvol/rpool/swap    none            swap    sw              0 0

But once we reach the Configuring the bootloader section, again we diverge. First, simply, we're going to install the grub2 bootloader to all three disks. In case of failure of any one disk, we'll be able to boot from the others:

(chroot) livecd ~ # cat > /etc/portage/package.use/grub
# No eye candy, thanks.
sys-boot/grub -fonts -themes
^D
(chroot) livecd ~ # emerge -vat grub

These are the packages that would be merged, in reverse order:

Calculating dependencies... done!
[ebuild  N     ] sys-boot/grub-2.06-r6:2/2.06-r6::gentoo ...
...
(chroot) livecd ~ # for D in a b c d; do grub-install /dev/sd$D; done
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.
Installing for i386-pc platform.
Installation finished. No error reported.

Next we need an initramfs to hold enough user space tools, unencrypted, to mount our encrypted ZFS root file system. We're going to use tranquil-initramfs for this. We also need to install a few of its dependencies before it will run.

(chroot) livecd ~ # emerge -vat dev-vcs/git cryptsetup busybox sudo

These are the packages that would be merged, in reverse order:

Calculating dependencies... done!
...
[ebuild  N     ] sys-apps/busybox-1.34.1-r1::gentoo ...
...
[ebuild  N     ] sys-fs/cryptsetup-2.6.1:0/12::gentoo ...
...
[ebuild  N     ] dev-vcs/git-2.39.2::gentoo ...
...
(chroot) livecd ~ # git clone https://github.com/arantius/tranquil-initramfs.git
...
(chroot) livecd ~ # cd tranquil-initramfs
(chroot) livecd tranquil-initramfs # ./mkinitrd.py
[*] Checking preliminary binaries ...
[*] Creating temporary directory at /root/tranquil-initramfs/bi-362364694 ...
[*] Checking required files ...
[+] Using LUKS
[+] Using ZFS
[*] Copying binaries ...
[*] Copying modules ...
[*] Generating modprobe information ...
depmod: WARNING: could not open modules.builtin.modinfo at /root/tranquil-initramfs/bi-362364694/lib/modules/6.1.19-gentoo-x86_64: No such file or directory
[*] Copying library dependencies ...
[*] Creating symlinks ...
[*] Performing finishing steps ...
[*] Creating the initramfs ...
103779 blocks
[*] Please copy "initrd-6.1.19-gentoo-x86_64" to your /boot directory

This initramfs contains the ZFS modules, necessary to import the pool of course. They're tied to the kernel version, so if you ever upgrade (or downgrade), you'll need to rebuild the initramfs to include that kernel's matching modules. (After a emerge @module-rebuild, to make them available.) You might need to (create and) edit config.ini in the tranquil-initramfs directory, to include e.g. extra files and/or modules. Check the documentation at the page linked above.

Now we configure grub to boot the kernel and initramfs we just made. By hand! I don't want the complex / graphical things that the default tools do.

(chroot) livecd ~ # cp -a initrd-6.1.19-gentoo-x86_64 /boot
(chroot) livecd ~ # cd /boot
(chroot) livecd /boot # ln -s vmlinuz-$(ls /lib/modules|sort|tail -1) kernel
(chroot) livecd /boot # cp -aL kernel kernel.old
(chroot) livecd /boot # ln -s initrd-$(ls /lib/modules|sort|tail -1) initrd
(chroot) livecd /boot # cp -aL initrd initrd.old
(chroot) livecd /boot # cat > /boot/grub/grub.cfg
debug all

if [ -s $prefix/grubenv ]; then
  load_env
fi

menuentry 'Gentoo GNU/Linux' {
  insmod part_gpt
  set root='hd0,gpt1'

  echo    'Loading Linux ...'
  linux   /kernel consoleblank=0 crashkernel=64M enc_type=pass triggers=luks,zfs redetect by=/dev/mapper enc_drives=/dev/disk/by-partlabel/crypt_* root=rpool/root
  echo    'Loading initial ramdisk ...'
  initrd  /initrd
}

menuentry 'Gentoo GNU/Linux (old)' {
  insmod part_gpt
  set root='hd0,gpt1'

  echo    'Loading Linux ...'
  linux   /kernel.old consoleblank=0 crashkernel=64M enc_type=pass triggers=luks,zfs redetect by=/dev/mapper enc_drives=/dev/disk/by-partlabel/crypt_* root=rpool/root
  echo    'Loading initial ramdisk ...'
  initrd  /initrd.old
}
^D

Some of the details will depend on your specific set-up.

We're just about done! Continue from handbook section configuring the system (and skipping "Configuring the bootloader").

Appendix: Recovery

Should you ever need to reboot during installation, or later boot from the livecd for recovery, it would go something like this:

livecd ~ # for D in /dev/disk/by-partlabel/crypt_*; do cryptsetup luksOpen "$D" "$(basename $D | sed -e 's/crypt_/vault_/')"; done
Enter passphrase for /dev/disk/by-partlabel/crypt_JETMMKKJ:
Enter passphrase for /dev/disk/by-partlabel/crypt_K82746KA:
Enter passphrase for /dev/disk/by-partlabel/crypt_WHU2I04G:
Enter passphrase for /dev/disk/by-partlabel/crypt_Y9KWTIL7:
livecd ~ # zpool import -fR /mnt/gentoo -d /dev/mapper rpool
livecd ~ # mount /dev/sda1 /mnt/gentoo/boot
livecd ~ # mount --rbind --make-rslave /dev /mnt/gentoo/dev
livecd ~ # mount --rbind --make-rslave /sys /mnt/gentoo/sys
livecd ~ # mount -t proc none /mnt/gentoo/proc
livecd ~ # mount -t devpts devpts /mnt/gentoo/dev/pts
livecd ~ # chroot /mnt/gentoo /bin/bash

Appendix: Links

I relied on a number of existing sources to figure out how to do this. Most of this information came from Gentoo Hardened ZFS rootfs with dm-crypt/luks 0.6.2 which is quite similar to this article. I added some more details from the Funtoo wiki's ZFS Install Guide and the Gentoo wiki's ZFS page.

Comments:

No comments!

Post a comment:

Username
Password
  If you do not have an account to log in to yet, register your own account. You will not enter any personal info and need not supply an email address.
Subject:
Comment:

You may use Markdown syntax in the comment, but no HTML. Hints:

If you are attempting to contact me, ask me a question, etc, please send me a message through the contact form rather than posting a comment here. Thank you. (If you post a comment anyway when it should be a message to me, I'll probably just delete your comment. I don't like clutter.)