Installing Gentoo Into a LUKS-Encrypted ZFS Root
2013-12-31 14:31 - Linux
Background
This post covers the steps I personally take to set up a new Gentoo server. I do so with LUKS encryption for the entire drive (except the boot partition). I use LUKS and not ZFS native encryption for two reasons: 1) I started this way before ZFS native encryption existed and 2) because neither way is better than the other for my needs (and native might be worse; the Gentoo Wiki recommends against ZFS native encryption and for generally the same reasons reasonable people see it as too immature). In a homelab setting it would be impractical for me to migrate away from my large existing data sets, and I prefer to keep my setup homogeneous across machines.
I've re-written this post a few times over the years. See earlier versions at archive.org. This post serves as documentation for future me, so I keep it around and up to date.
Getting Started
Work through the Gentoo Handbook until (but not including) the Preparing the Disks section. You may want to use my LiveCD with ZFS support as the boot medium, as I'm doing.
I've selected raidz2 across four disks, to guarantee even any two concurrent drive failures results in neither downtime nor data loss. I prefer the enhanced safety and reliability of this setup more than higher storage efficiency or performance. You need not make the same decision, but the examples below detail such a setup.
I use partitions to reserve a bit of space for the (unencrypted) boot volume, then use the rest of the disk for the main data volume. (Swap goes inside the encrypted data volume — I don't want to write potentially sensitive memory contents to disk unencrypted!) I use GPT partitions with the ability to assign arbitrary labels. Further, I put identifying information (i.e. the model and serial number) for the corresponding physical drive in the label. Later if/when ZFS complains about a problem, it uses this label and I know which physical disk needs attention. Any label will work, but they must be unique on the system.
Since we're putting ZFS in a partition, it's important to make sure that the partition start is aligned with a physical disk sector. Drives today (at least spinning ones like I'm using) typically have 4,096 byte sectors, unlike the historical 512 byte default. (Ignore the example output below — the virtual machine I'm using also exposes 512 byte historical size sectors.) The partitions are using logical (512 byte) sectors, so it is possible to misalign the physical sectors. I believe the best thing to do is make sure that your main partition starts on a multiple-of-eight sector, which should be 4k aligned even for 512 byte logical sectors. In this example below, the data partition starts at sector 1050624 which is 131328 * 8.
A quick note on the examples before this first one: The (shell) prompts are colored red, the inputs I type are colored green, and the rest is the output (sometimes with interactive prompts and inputs mixed in. Your output will likely differ in small or large details; I'm trusting you to be intelligent enough to figure that out if you're following this as a guide. But I find archiving the output still makes it easier to follow along. Your inputs may differ as well, be careful to make sure you are referencing (e.g.) the proper disk at every point!
Also note that all examples below are based on a test installation in a virtual machine. To confidently write this document correctly, I tend to practice to completion at least once, then repeat it beginning to end (and again if changes were needed), and only then use it for real (i.e. on real hardware). A virtual machine makes restarting and retrying much easier.
livecd ~ # gdisk /dev/sda GPT fdisk (gdisk) version 1.0.10 Partition table scan: MBR: not present BSD: not present APM: not present GPT: not present Creating new GPT entries in memory. Command (? for help): n Partition number (1-128, default 1): First sector (34-33554398, default = 2048) or {+-}size{KMGTP}: Last sector (2048-33554398, default = 33552383) or {+-}size{KMGTP}: +512M Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): ef00 Changed type of partition to 'EFI system partition' Command (? for help): n Partition number (2-128, default 2): First sector (34-33554398, default = 1050624) or {+-}size{KMGTP}: Last sector (1050624-33554398, default = 33552383) or {+-}size{KMGTP}: Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): 8309 Changed type of partition to 'Linux LUKS' Command (? for help): p Disk /dev/sda: 33554432 sectors, 16.0 GiB Model: VMware Virtual S Sector size (logical/physical): 512/512 bytes Disk identifier (GUID): 519D51DB-6D30-4EC5-98E4-EF98D7E42E99 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 33554398 Partitions will be aligned on 2048-sector boundaries Total free space is 4029 sectors (2.0 MiB) Number Start (sector) End (sector) Size Code Name 1 2048 1050623 512.0 MiB EF00 EFI system partition 2 1050624 33552383 15.5 GiB 8309 Linux LUKS Command (? for help): w Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/sda. The operation has completed successfully.
There's the first disk partitioned. We have 512M for the EFI partition, then the rest for LUKS+ZFS. Next we copy those partition layouts verbatim to the other drives. Finally apply names to each partition.
livecd ~ # for D in b c d; do sgdisk --replicate=/dev/sd$D --randomize-guids /dev/sda; done The operation has completed successfully. The operation has completed successfully. The operation has completed successfully. The operation has completed successfully. The operation has completed successfully. The operation has completed successfully. livecd ~ # N=Y9KWTIL7; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sda The operation has completed successfully. livecd ~ # N=K82746KA; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sdb The operation has completed successfully. livecd ~ # N=WHU2I04G; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sdc The operation has completed successfully. livecd ~ # N=JETMMKKJ; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sdd The operation has completed successfully.
This leaves us with labeled partitions so we know both which drive each one is on, plus its purpose.
livecd ~ # ls /dev/disk/by-partlabel/ efi_JETMMKKJ efi_WHU2I04G rpool_JETMMKKJ rpool_WHU2I04G efi_K82746KA efi_Y9KWTIL7 rpool_K82746KA rpool_Y9KWTIL7
Format the EFI and LUKS partitions. During this setup phase you'll get to type your passphrase quite a few times. Use the same passphrase for all disks.
livecd ~ # for D in /dev/disk/by-partlabel/efi_*; do mkfs.vfat -F 32 $D; done mkfs.fat 4.2 (2021-01-31) mkfs.fat 4.2 (2021-01-31) mkfs.fat 4.2 (2021-01-31) mkfs.fat 4.2 (2021-01-31) livecd ~ # for D in /dev/disk/by-partlabel/rpool_*; do cryptsetup luksFormat "$D"; done WARNING! ======== This will overwrite data on /dev/disk/by-partlabel/rpool_JETMMKKJ irrevocably. Are you sure? (Type 'yes' in capital letters): YES Enter passphrase for /dev/disk/by-partlabel/rpool_JETMMKKJ: Verify passphrase: (... three more times ...)
If you get stuck waiting for randomness, a ping -f
from another machine to this one will generate kernel entropy.
With the encrypted volumes initialized, open them up for use:
livecd ~ # mass-luks-open Found 4 LUKS volumes: 1 /dev/sdd2 => rpool_JETMMKKJ 2 /dev/sdb2 => rpool_K82746KA 3 /dev/sdc2 => rpool_WHU2I04G 4 /dev/sda2 => rpool_Y9KWTIL7 Enter LUKS decryption passphrase: Opening LUKS volume [/dev/sdd2] with label [rpool_JETMMKKJ]... success. Opening LUKS volume [/dev/sdb2] with label [rpool_K82746KA]... success. Opening LUKS volume [/dev/sdc2] with label [rpool_WHU2I04G]... success. Opening LUKS volume [/dev/sda2] with label [rpool_Y9KWTIL7]... success. livecd ~ # ls /dev/mapper control rpool_JETMMKKJ rpool_K82746KA rpool_WHU2I04G rpool_Y9KWTIL7
If you're not using my installer ISO with mass-luks-open, do: for D in /dev/disk/by-partlabel/rpool_*; do cryptsetup luksOpen "$D" "$(basename $D)"; done .
Initializing ZFS
So now we have four encrypted disks (partitions) each called rpool_something and their equivalent plaintext mapped volumes. We can set up a zpool with them, and data sets within that pool.
livecd ~ # zpool create -m none -R /mnt/gentoo -o ashift=12 -o cachefile=none -o compatibility=openzfs-2.2-linux -O atime=on -O relatime=on -O compression=zstd -O xattr=sa rpool raidz2 rpool_JETMMKKJ rpool_K82746KA rpool_WHU2I04G rpool_Y9KWTIL7
Note the lowercase -o which is a pool property versus the uppercase -O which is a (default) property for data sets (in this pool). This complicated command will:
- -m none
- Do not mount this pool.
- -R /mnt/gentoo
- Sets the "altroot", i.e. the temporary alternative mount point. This value is appropriate for the Handbook driven install process.
- -o ashift=12
- Set the block size to 4k, or 212. (If your sectors are 512 bytes (unlikely), you should omit this.)
- -o cachefile=none
- Do not use a cache file. All storage is encrypted, there's nowhere to read a cache file from at import time.
- -o compatibility=openzfs-2.2-linux
- Limit the maximum allowed ZFS features. This way you can't accidentally upgrade past what your rescue media can handle, and tools won't "remind" you to upgrade past that point either. (You can zpool set compatibility=... pool in the future, even to "off".)
- -O atime=on -O relatime=on
- Enable only relatime.
- -O compression=zstd
- Enable compression. Your CPU can probably compress/decompress faster than a spinning disk can read/write. The zstd method seems best.
- -O xattr=sa
- Store extended attributes in inodes rather than hidden files. Which is supposed to make Samba more performant.
- rpool
- The name of the pool. (I wish I had a better scheme, but I just call it the "r"oot pool.)
- raidz2
- The type of the pool, RAID-like with two disks for redundancy.
- rpool_JETMMKKJ rpool_K82746KA rpool_WHU2I04G rpool_Y9KWTIL7
- The devices making up this pool; ZFS can find them just by these short names (which are unlikely to otherwise exist if you've chosen them well) and their brevity will enhance readability.
And in good Unix tradition, produces no output upon success.
livecd ~ # zpool status (chroot) livecd ~ # zpool status pool: rpool state: ONLINE config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 rpool_JETMMKKJ ONLINE 0 0 0 rpool_K82746KA ONLINE 0 0 0 rpool_WHU2I04G ONLINE 0 0 0 rpool_Y9KWTIL7 ONLINE 0 0 0 errors: No known data errors
Now to create the datasets within the pool. ZFS datasets are hierarchical. I've created two top level data sets: root and tmp. Both get regular snapshots, but only root gets replicated off site. First, (create the root dataset, and within that) initialize data sets for each set of distinct mount properties and/or snapshot behavior. The $NEWUSER variable will hold the name of the user account to be created later.
livecd ~ # export NEWUSER=user_name_here livecd ~ # zfs create -o mountpoint=/ rpool/root livecd ~ # zfs create -o mountpoint=/home/$NEWUSER rpool/root/home_$NEWUSER livecd ~ # zfs create -o mountpoint=none rpool/tmp livecd ~ # zfs create -o mountpoint=/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/root livecd ~ # zfs create -o mountpoint=/home/$NEWUSER/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/$NEWUSER livecd ~ # zfs create -o mountpoint=/usr/src rpool/tmp/linux-src livecd ~ # zfs create -o mountpoint=/var/db/repos rpool/tmp/portage livecd ~ # zfs create -o mountpoint=/var/tmp rpool/tmp/var
Note that ZFS will auto-mount these data sets as they're created, at the given mount points (relative to the altroot specified at zpool creation). Also optionally (but recommended) set up swap.
livecd ~ # zfs create -o sync=always -o primarycache=metadata -o secondarycache=none -o volblocksize=4K -o logbias=throughput -o compression=off -V 1G rpool/swap Warning: volblocksize (4096) is less than the default minimum block size (16384). To reduce wasted space a volblocksize of 16384 is recommended. livecd ~ # mkswap -f /dev/zvol/rpool/swap Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes) no label, UUID=c1e2b76d-717a-4cad-b69f-92e0a9289027 livecd ~ # swapon /dev/zvol/rpool/swap
I think this small volblocksize is still a good idea, because the kernel will be using 4k pages. Continue with the last few settings for our mounts:
livecd ~ # mkdir -p /mnt/gentoo/efi; chmod 0111 /mnt/gentoo/efi livecd ~ # mount /dev/disk/by-partlabel/efi_Y9KWTIL7 /mnt/gentoo/efi livecd ~ # chmod 1777 /mnt/gentoo/tmp
Kernel and Boot
We're doing a standard Gentoo install now, from the installing Stage3 section. When reaching the configuring the kernel section skip it and return here.
We'll use the prebuilt distribution kernel. Dracut to build the UKI (no need for any other bootloader like grub), and our own dracut module to unlock our several LUKS volumes.
(chroot) livecd ~ # cat > /etc/portage/package.use/kernel <<EOF > sys-apps/systemd-utils boot kernel-install > sys-fs/zfs-kmod initramfs -rootfs > sys-fs/zfs -dist-kernel -rootfs > sys-kernel/gentoo-kernel-bin -initramfs > sys-kernel/installkernel dracut uki > EOF (chroot) livecd ~ # emerge -vat dev-vcs/git sys-kernel/dracut sys-fs/cryptsetup (chroot) livecd ~ # git clone https://github.com/arantius/mass-luks-open.git (chroot) livecd ~ # cd mass-luks-open (chroot) livecd ~/mass-luks-open # make install make -C src make[1]: Entering directory '/root/mass-luks-open/src' gcc -Wall -Werror -Wextra -Wformat -Wshadow -Wstrict-prototypes -Wno-unused-parameter -Wno-unused-variable -o mass-luks-open mass-luks-open.c -lblkid -lcryptsetup make[1]: Leaving directory '/root/mass-luks-open/src' install --group=root --owner=root --mode=755 --directory /usr/lib/dracut/modules.d/20mass-luks-open install --group=root --owner=root --mode=644 module-setup.sh /usr/lib/dracut/modules.d/20mass-luks-open install --group=root --owner=root --mode=755 src/mass-luks-open.sh src/mass-luks-open /usr/lib/dracut/modules.d/20mass-luks-open (chroot) livecd ~/mass-luks-open # cd (chroot) livecd ~ # mkdir -p /etc/dracut.conf.d/ (chroot) livecd ~ # cat > /etc/dracut.conf.d/omit.conf <<EOF > omit_dracutmodules+=" bluetooth crypt nfs pcmcia systemd-cryptsetup " > EOF (chroot) livecd ~ # cat > /etc/dracut.conf.d/hostonly.conf <<EOF > hostonly="yes" > EOF (chroot) livecd ~ # mkdir -p /efi/EFI/Linux (chroot) livecd ~ # emerge -vat sys-kernel/gentoo-kernel-bin sys-boot/efibootmgr ... (chroot) livecd ~ # ls -l /efi/EFI/Linux/ total 27516 -rwxr-xr-x 1 root 28173312 Jan 7 22:14 gentoo-6.6.67-gentoo-dist.efi
Remove this UKI! It doesn't contain ZFS so won't work for us. But, all the prerequisites are ready so we're going to install ZFS now and rebuild the UKI with it. (Note: I had some trouble with this in my small VM. I might suggest building sys-fs/zfs (and dependencies) as its own step with more conservative parallelism settings, if you have issues.)
(chroot) livecd ~ # emerge -vat sys-fs/zfs ... (chroot) livecd ~ # rc-update add zfs-mount boot (chroot) livecd ~ # emerge --config gentoo-kernel-bin ... (chroot) livecd ~ # ls -l /efi/EFI/Linux/ total 46368 -rwxr-xr-x 1 root root 47480832 Jan 7 22:22 gentoo-6.6.67-gentoo-dist.efi
This has installed the kernel and all dependencies, plus built our customized UKI (notice it's bigger). We need to add the UEFI boot entry to point to the good one:
(chroot) livecd ~ # efibootmgr --create --disk=/dev/sda --part=1 --loader='\EFI\Linux\gentoo-6.6.67-gentoo-dist.efi' --label='Gentoo 6.6.67'
This completes the configuring the kernel and configuring the bootloader sections of the Handbook. Resume from configuring the system section. The fstab should look something like:
# <fs> <mountpoint> <type> <opts> <dump/pass> PARTLABEL=efi_Y9KWTIL7 /efi vfat defaults 0 2 /dev/zvol/rpool/swap none swap sw 0 0
That's it! Simply continue with the Handbook, skipping the bootloader steps we've already done above. Once you've successfully booted, it might be a good idea to manually copy the UEFI UKI file, and add another efibootmgr entry to point to it. If future (i.e. kernel) updates go awry, this known-working UKI can be used temporarily.
Appendix: Recovery
Should you ever need to reboot during installation, or later boot from the live ISO for recovery, it would go something like this:
livecd ~ # mass-luks-open Found 4 LUKS volumes: 1 /dev/sdd2 => rpool_JETMMKKJ 2 /dev/sdb2 => rpool_K82746KA 3 /dev/sdc2 => rpool_WHU2I04G 4 /dev/sda2 => rpool_Y9KWTIL7 Enter LUKS decryption passphrase: Opening LUKS volume [/dev/sdd2] with label [rpool_JETMMKKJ]... success. Opening LUKS volume [/dev/sdb2] with label [rpool_K82746KA]... success. Opening LUKS volume [/dev/sdc2] with label [rpool_WHU2I04G]... success. Opening LUKS volume [/dev/sda2] with label [rpool_Y9KWTIL7]... success. livecd ~ # zpool import -fR /mnt/gentoo -d /dev/mapper rpool livecd ~ # mount /dev/sda1 /mnt/gentoo/efi livecd ~ # arch-chroot /mnt/gentoo /bin/bash livecd ~ # export PS1="(chroot) $PS1"
Appendix: Links
I relied on a number of existing sources to figure out how to do this. Most of this information came from Gentoo Hardened ZFS rootfs with dm-crypt/luks 0.6.2 which is quite similar to this article. I added some more details from the Funtoo wiki's ZFS Install Guide and the Gentoo wiki's ZFS and ZFS/rootfs and Unified kernel image page.