Installing Gentoo Into a LUKS-Encrypted ZFS Root
2013-12-31 14:31 - Linux
Note, this is a 2023 rewrite of an article originally from 2013. See earlier versions at archive.org.
For the past few years I've relied on ZFS for my backup system, with atomic snapshots taken regularly, plus zfs send and zfs recv making geographic redundancy an easy extra layer. I also LUKS encrypt the disks for peace of mind, especially in the case of failed disks. This document is a detailed explanation of how I set up a brand new machine from scratch with an ecrypted ZFS root file system.
Since I first wrote this document, ZFS has gained native encryption. There are pros and cons to each approach. But for me, I've got significant data sets already on LUKS, and no desire to buy sufficient new drives just to have space to migrate to another solution. So I continue doing things this way.
Getting Started
Work through the Gentoo Handbook until the Preparing the Disks section. You may want to use my LiveCD with ZFS support as the boot medium, as I'm doing.
I've selected raidz2 across four disks, to guarantee any two drive failures does not result in any downtime nor data loss. I prefer the enhanced safety and reliability of this setup more than higher storage efficiency or even performance. You need not make the same decision, but the examples below detail such a setup.
I used to prefer a completely separate (unencrypted) boot device. I originally thought that it was "best practice" to provision ZFS on raw disks, with no partitions. I'm no longer confident that this is true, and I've come up with reasons to prefer partitions. It can be difficult to hook up enough disks, to have a separate one dedicated for the boot volume. It's common for motherboards to provide four SATA ports. If that's three ZFS volumes plus one boot disk, they're all used up. Where do you put your extra disk when swapping in a replacement or an upgrade? Even if you've got the space, there's the expense of another device, plus a separate plan for it's failure? If you've got the extra connectors and devices, maybe go for an L2ARC or ZIL. I'm skipping dedicated boot devices from here on out.
So now I'm selecting a partitioned scheme. I reserve a bit of space for the boot volume plus some for the grub bootloader (as required for GPT partitioning). (I don't, however, use EFI nor an EFI system partition.) Then I use the rest of the disk for the main data volume. You may select the size of your boot partition(s) freely.
(A quick note on the examples before this first one: The shell prompts are colored red, the inputs I type are colored green, and the rest is the output. Your output will likely differ in small details; I'm trusting you to be intelligent enough to figure that out if you're following this as a guide. But I find archiving the output still makes it easier to follow along. Your inputs may differ as well, be careful to make sure you are referencing (e.g.) the proper disk at every point!) Also note that all examples below are based on a test installation in a virtual machine.
I prefer to use GPT partitions with the data partitions labeled like crypt_something so that "something" is meaningful. (I use the model and/or serial number of the drive, so that if/when ZFS reports a problem with a volume, I'll know which physical disk it's on!) Any swap will happen inside the encrypted ZFS pool.
livecd ~ # fdisk /dev/sda Welcome to fdisk (util-linux 2.38.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Device does not contain a recognized partition table. Created a new DOS disklabel with disk identifier 0xcefdae72. Command (m for help): g Created a new GPT disklabel (GUID: E5CD736D-D284-8F41-A8E7-D73BD158CFB3). Command (m for help): n Partition number (1-128, default 1): First sector (2048-97677278, default 2048): Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-97677278, default 97677278): +2M Created a new partition 1 of type 'Linux filesystem' and of size 2 MiB. Command (m for help): n Partition number (2-128, default 2): First sector (6144-16777182, default 6144): Last sector, +/-sectors or +/-size{K,M,G,T,P} (6144-16777182, default 16775167): +256M Created a new partition 2 of type 'Linux filesystem' and of size 256 MiB. Command (m for help): t Partition number (1,2, default 2): 1 Partition type or alias (type L to list all): 4 Changed type of partition 'Linux filesystem' to 'BIOS boot'. Command (m for help): n Partition number (3-128, default 3): First sector (530432-16777182, default 530432): Last sector, +/-sectors or +/-size{K,M,G,T,P} (530432-16777182, default 16775167): Created a new partition 3 of type 'Linux filesystem' and of size 7.7 GiB. Command (m for help): w The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. livecd ~ # gdisk /dev/sda GPT fdisk (gdisk) version 1.0.8 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Command (? for help): c Partition number (1-3): 1 Enter name: grub_K82746KA Command (? for help): c Partition number (1-3): 2 Enter name: boot_K82746KA Command (? for help): c Partition number (1-3): 3 Enter name: crypt_K82746KA Command (? for help): p Disk /dev/sda: 16777216 sectors, 8.0 GiB Model: VMware Virtual S Sector size (logical/physical): 512/512 bytes Disk identifier (GUID): 4A317F72-1F12-824A-BFCF-B7B6362A6877 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 2048, last usable sector is 16777182 Partitions will be aligned on 2048-sector boundaries Total free space is 2015 sectors (1007.5 KiB) Number Start (sector) End (sector) Size Code Name 1 2048 6143 2.0 MiB EF02 grub_K82746KA 2 6144 530431 256.0 MiB 8300 boot_K82746KA 3 530432 16775167 7.7 GiB 8300 crypt_K82746KA Command (? for help): w Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/sda. The operation has completed successfully.
Since we're putting ZFS in a partition, it's important to make sure that the partition start is aligned with a physical disk sector. This virtual disk has 512 byte sectors, but real drives today typically have 4,096 byte sectors. The partitions are using logical sectors, so it is possible to misalign the physical sectors. I believe the best thing to do is make sure that your main partition starts on a multiple-of-eight sector, which should be 4k aligned even for 512 byte logical sectors. In this example the data partition starts at sector 530432 which is 66304 * 8.
Above were the steps for /dev/sda with the example serial number "K82746KA" encoded into the partition labels. Do the same steps for your remaining drives. You can use sgdisk to copy the partitions to identical size drives, then tweak the names aftewards. Note that --replicate takes the destination drive.
livecd ~ # sgdisk --replicate=/dev/sdb /dev/sda The operation has completed successfully. livecd ~ # gdisk /dev/sdb (... change partition names here ...)
We only use one of the boot partitions: it's the one part of the drive outside of both the LUKS encryption and the ZFS system which (in my case) has automatically managed backups. You should add a scheme for keeping the boot volume backed up as well. For now, simply put an ext4 filesystem in place.
livecd ~ # mkfs.ext4 -T small -U random /dev/disk/by-partlabel/boot_K82746KA mke2fs 1.47.0 (5-Feb-2023) Creating filesystem with 262144 1k blocks and 65536 inodes Filesystem UUID: a8182f88-848a-448b-aac6-26a2fd3ee9ac Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done
Next, use LUKS to encrypt all data partitions:
livecd ~ # for D in /dev/disk/by-partlabel/crypt_*; do cryptsetup luksFormat "$D"; done WARNING! ======== This will overwrite data on /dev/disk/by-partlabel/crypt_JETMMKKJ irrevocably. Are you sure? (Type uppercase yes): YES Enter passphrase for /dev/disk/by-partlabel/crypt_JETMMKKJ: Verify passphrase: System is out of entropy while generating volume key. Please move mouse or type some text in another window to gather some random events. Generating key (68% done). Generating key (87% done). Generating key (100% done). (... repeated for each drive ...)
I'm using the same passphrase for all disks, and suggest you do the same. Either way, LUKS will let you change/add/remove these at any point in the future. If you get stuck waiting for randomness, a ping -f
from another machine to this one will generate kernel entropy.
With the encrypted volumes initialized, open them up for use:
livecd ~ # for D in /dev/disk/by-partlabel/crypt_*; do cryptsetup luksOpen "$D" "$(basename $D | sed -e 's/crypt_/vault_/')"; done Enter passphrase for /dev/disk/by-partlabel/crypt_JETMMKKJ: Enter passphrase for /dev/disk/by-partlabel/crypt_K82746KA: Enter passphrase for /dev/disk/by-partlabel/crypt_WHU2I04G: Enter passphrase for /dev/disk/by-partlabel/crypt_Y9KWTIL7: livecd ~ # ls /dev/mapper control vault_JETMMKKJ vault_K82746KA vault_WHU2I04G vault_Y9KWTIL7
Initializing ZFS
So now we have three encrypted disks each called crypt_something and their equivalent plaintext disks each called vault_something. We can set up a zpool with them, and data sets within that pool.
livecd ~ # modprobe zfs livecd ~ # zpool create -m none -R /mnt/gentoo -o ashift=12 -O atime=off -O xattr=sa rpool raidz2 vault_JETMMKKJ vault_K82746KA vault_WHU2I04G vault_Y9KWTIL7
This complicated command will:
- -m none
- Not mount this pool.
- -R /mnt/gentoo
- Sets the "altroot", i.e. the temporary alternative mount point. This value is appropriate for the Handbook driven install process.
- -o ashift=12
- Set the block size to 4k, or 212. (If your sectors are 512 bytes (unlikely), you should omit this.)
- -O atime=off
- Not record access times.
- -O xattr=sa
- Store extended attributes in inodes rather than hidden files. Which is supposed to make Samba more performant.
- rpool
- The name of the pool. (I wish I had a better scheme, but I just call it the "r"oot pool.)
- raidz1
- The type of the pool, RAID like with one disk for redundancy.
- crypt_sda crypt_sdb crypt_sdc
- The devices making up this pool; ZFS can find them just by these short names (which are unlikely to otherwise exist) and their brevity will make other output easier to read.
And in good Unix tradition, produces no output upon success.
Now to create the datasets within the pool. ZFS datasets are hierarchical. I've created two top level data sets: root and tmp. The first gets regular snapshots, which get replicated off site. The latter does not, because it contains only files that are easy to replace (linux kernel source, portage) or are not worth backing up (large scratch files, media archives).
First, (create the root dataset, and within that) initialize mount points with no write permissions, to reduce the chances of accidentally putting files there. (ZFS refuses to mount into a non-empty directory, by default.)
livecd ~ # zfs create -o mountpoint=/ rpool/root livecd ~ # for D in /home /tmp /var; do mkdir "/mnt/gentoo/$D"; chmod 0111 "/mnt/gentoo/$D"; done
Now create the rest of the ZFS data sets:
livecd ~ # cd /mnt/gentoo livecd ~ # zfs create -o mountpoint=/var rpool/root/var livecd ~ # mkdir var/tmp; chmod 0111 var/tmp livecd ~ # zfs create -o mountpoint=/home rpool/root/home livecd ~ # mkdir home/USER; chmod 0111 home/USER livecd ~ # zfs create -o mountpoint=/home/USER rpool/root/home/USER livecd ~ # mkdir home/USER/tmp; chmod 0111 home/USER/tmp livecd ~ # zfs create -o mountpoint=none rpool/tmp livecd ~ # zfs create -o mountpoint=/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/root livecd ~ # zfs create -o mountpoint=/home/USER/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/USER livecd ~ # zfs create -o mountpoint=/usr/src rpool/tmp/linux-src livecd ~ # zfs create -o mountpoint=/var/db/repos rpool/tmp/portage livecd ~ # zfs create -o mountpoint=/var/tmp rpool/tmp/var
Note that ZFS will auto-mount these data sets as they're created, at the given mount points (relative to the altroot specified at zpool creation). Fill in the actual name for "USER". Also optionally (but recommended) set up swap.
livecd ~ # zfs create -o sync=always -o primarycache=metadata -o secondarycache=none -o volblocksize=4K -V 1G rpool/swap livecd ~ # mkswap -f /dev/zvol/rpool/swap Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes) no label, UUID=c1e2b76d-717a-4cad-b69f-92e0a9289027 livecd ~ # swapon /dev/zvol/rpool/swap
Continue with the last few settings for our mounts:
livecd ~ # mkdir /mnt/gentoo/boot livecd ~ # chmod 0111 /mnt/gentoo/boot livecd ~ # mount /dev/disk/by-partlabel/boot_K82746KA /mnt/gentoo/boot livecd ~ # chmod 1777 /mnt/gentoo/tmp
The Kernel
We're doing a standard Gentoo install now, from the installing Stage3 section. When reaching the configuring the kernel section make sure you include ext4 file system support hard-coded in (our boot partition). With the LiveCD having loaded modules for most or all of our hardware, we can make localmodconfig to rapidly generate a tidy kernel config. There's a few key settings we'll need, to enable LUKS and ZFS support. (The Gentoo wiki calls out some of these.)
Device Drivers ---> [*] Multiple devices driver support (RAID and LVM) ---> <*> Device mapper support <*> Crypt target support Cryptographic API ---> Length-preserving ciphers and modes ---< <*> XTS support
Be sure to also include any hardware drivers that your machine will depend on for boot (especially e.g. SATA, maybe Ethernet, maybe USB and/or mass storage if you'd like to use those in the initramfs recovery). These should all be built in to the kernel. (They could be built into the initramfs but I think it's easier to just put them directly in the kernel.) Running lspci -k will tell you the (loaded) modules that power your hardware, search for those entries in menuconfig and enable them. Install the kernel, but then we diverge at the Building an initramfs step. We'll customize ours, to handle both LUKS encryption and a ZFS root filesystem.
First install the ZFS tools (and kernel modules, via dependency).
(chroot) livecd ~ # emerge -va zfs These are the packages that would be merged, in order: ... (chroot) livecd ~ # rc-update add zfs-mount sysinit
When changing kernels in the future, be prepared to emerge @module-rebuild to make sure these out-of-tree modules are built for the kernel being used.
Boot Configuration
We've completed the Kernel section, continue with the handbook at the Configuring the system section. The fstab should look something like:
# <fs> <mountpoint> <type> <opts> <dump/pass> /dev/disk/by-partlabel/boot_K82746KA /boot ext4 noauto,noatime 1 2 /dev/zvol/rpool/swap none swap sw 0 0
But once we reach the Configuring the bootloader section, again we diverge. First, simply, we're going to install the grub2 bootloader to all three disks. In case of failure of any one disk, we'll be able to boot from the others:
(chroot) livecd ~ # cat > /etc/portage/package.use/grub # No eye candy, thanks. sys-boot/grub -fonts -themes ^D (chroot) livecd ~ # emerge -vat grub These are the packages that would be merged, in reverse order: Calculating dependencies... done! [ebuild N ] sys-boot/grub-2.06-r6:2/2.06-r6::gentoo ... ... (chroot) livecd ~ # for D in a b c d; do grub-install /dev/sd$D; done Installing for i386-pc platform. Installation finished. No error reported. Installing for i386-pc platform. Installation finished. No error reported. Installing for i386-pc platform. Installation finished. No error reported. Installing for i386-pc platform. Installation finished. No error reported.
Next we need an initramfs to hold enough user space tools, unencrypted, to mount our encrypted ZFS root file system. We're going to use tranquil-initramfs for this. We also need to install a few of its dependencies before it will run.
(chroot) livecd ~ # emerge -vat dev-vcs/git cryptsetup busybox sudo These are the packages that would be merged, in reverse order: Calculating dependencies... done! ... [ebuild N ] sys-apps/busybox-1.34.1-r1::gentoo ... ... [ebuild N ] sys-fs/cryptsetup-2.6.1:0/12::gentoo ... ... [ebuild N ] dev-vcs/git-2.39.2::gentoo ... ... (chroot) livecd ~ # git clone https://github.com/arantius/tranquil-initramfs.git ... (chroot) livecd ~ # cd tranquil-initramfs (chroot) livecd tranquil-initramfs # ./mkinitrd.py [*] Checking preliminary binaries ... [*] Creating temporary directory at /root/tranquil-initramfs/bi-362364694 ... [*] Checking required files ... [+] Using LUKS [+] Using ZFS [*] Copying binaries ... [*] Copying modules ... [*] Generating modprobe information ... depmod: WARNING: could not open modules.builtin.modinfo at /root/tranquil-initramfs/bi-362364694/lib/modules/6.1.19-gentoo-x86_64: No such file or directory [*] Copying library dependencies ... [*] Creating symlinks ... [*] Performing finishing steps ... [*] Creating the initramfs ... 103779 blocks [*] Please copy "initrd-6.1.19-gentoo-x86_64" to your /boot directory
This initramfs contains the ZFS modules, necessary to import the pool of course. They're tied to the kernel version, so if you ever upgrade (or downgrade), you'll need to rebuild the initramfs to include that kernel's matching modules. (After a emerge @module-rebuild, to make them available.) You might need to (create and) edit config.ini in the tranquil-initramfs directory, to include e.g. extra files and/or modules. Check the documentation at the page linked above.
Now we configure grub to boot the kernel and initramfs we just made. By hand! I don't want the complex / graphical things that the default tools do.
(chroot) livecd ~ # cp -a initrd-6.1.19-gentoo-x86_64 /boot (chroot) livecd ~ # cd /boot (chroot) livecd /boot # ln -s vmlinuz-$(ls /lib/modules|sort|tail -1) kernel (chroot) livecd /boot # cp -aL kernel kernel.old (chroot) livecd /boot # ln -s initrd-$(ls /lib/modules|sort|tail -1) initrd (chroot) livecd /boot # cp -aL initrd initrd.old (chroot) livecd /boot # cat > /boot/grub/grub.cfg debug all if [ -s $prefix/grubenv ]; then load_env fi menuentry 'Gentoo GNU/Linux' { insmod part_gpt set root='hd0,gpt1' echo 'Loading Linux ...' linux /kernel consoleblank=0 crashkernel=64M enc_type=pass triggers=luks,zfs redetect by=/dev/mapper enc_drives=/dev/disk/by-partlabel/crypt_* root=rpool/root echo 'Loading initial ramdisk ...' initrd /initrd } menuentry 'Gentoo GNU/Linux (old)' { insmod part_gpt set root='hd0,gpt1' echo 'Loading Linux ...' linux /kernel.old consoleblank=0 crashkernel=64M enc_type=pass triggers=luks,zfs redetect by=/dev/mapper enc_drives=/dev/disk/by-partlabel/crypt_* root=rpool/root echo 'Loading initial ramdisk ...' initrd /initrd.old } ^D
Some of the details will depend on your specific set-up.
We're just about done! Continue from handbook section configuring the system (and skipping "Configuring the bootloader").
Appendix: Recovery
Should you ever need to reboot during installation, or later boot from the livecd for recovery, it would go something like this:
livecd ~ # for D in /dev/disk/by-partlabel/crypt_*; do cryptsetup luksOpen "$D" "$(basename $D | sed -e 's/crypt_/vault_/')"; done Enter passphrase for /dev/disk/by-partlabel/crypt_JETMMKKJ: Enter passphrase for /dev/disk/by-partlabel/crypt_K82746KA: Enter passphrase for /dev/disk/by-partlabel/crypt_WHU2I04G: Enter passphrase for /dev/disk/by-partlabel/crypt_Y9KWTIL7: livecd ~ # zpool import -fR /mnt/gentoo -d /dev/mapper rpool livecd ~ # mount /dev/sda1 /mnt/gentoo/boot livecd ~ # mount --rbind --make-rslave /dev /mnt/gentoo/dev livecd ~ # mount --rbind --make-rslave /sys /mnt/gentoo/sys livecd ~ # mount -t proc none /mnt/gentoo/proc livecd ~ # mount -t devpts devpts /mnt/gentoo/dev/pts livecd ~ # chroot /mnt/gentoo /bin/bash
Appendix: Links
I relied on a number of existing sources to figure out how to do this. Most of this information came from Gentoo Hardened ZFS rootfs with dm-crypt/luks 0.6.2 which is quite similar to this article. I added some more details from the Funtoo wiki's ZFS Install Guide and the Gentoo wiki's ZFS page.