Installing Gentoo Into a LUKS-Encrypted ZFS Root

2013-12-31 14:31 - Linux

Background

This post covers the steps I personally take to set up a new Gentoo server. I do so with LUKS encryption for the entire drive (except the boot partition). I use LUKS and not ZFS native encryption for two reasons: 1) I started this way before ZFS native encryption existed and 2) because neither way is better than the other for my needs (and native might be worse; the Gentoo Wiki recommends against ZFS native encryption and for generally the same reasons reasonable people see it as too immature). In a homelab setting it would be impractical for me to migrate away from my large existing data sets, and I prefer to keep my setup homogeneous across machines.

I've re-written this post a few times over the years. See earlier versions at archive.org. This post serves as documentation for future me, so I keep it around and up to date.

Getting Started

Work through the Gentoo Handbook until (but not including) the Preparing the Disks section. You may want to use my LiveCD with ZFS support as the boot medium, as I'm doing.

I've selected raidz2 across four disks, to guarantee even any two concurrent drive failures results in neither downtime nor data loss. I prefer the enhanced safety and reliability of this setup more than higher storage efficiency or performance. You need not make the same decision, but the examples below detail such a setup.

I use partitions to reserve a bit of space for the (unencrypted) boot volume, then use the rest of the disk for the main data volume. (Swap goes inside the encrypted data volume — I don't want to write potentially sensitive memory contents to disk unencrypted!) I use GPT partitions with the ability to assign arbitrary labels. Further, I put identifying information (i.e. the model and serial number) for the corresponding physical drive in the label. Later if/when ZFS complains about a problem, it uses this label and I know which physical disk needs attention. Any label will work, but they must be unique on the system.

Since we're putting ZFS in a partition, it's important to make sure that the partition start is aligned with a physical disk sector. Drives today (at least spinning ones like I'm using) typically have 4,096 byte sectors, unlike the historical 512 byte default. (Ignore the example output below — the virtual machine I'm using also exposes 512 byte historical size sectors.) The partitions are using logical (512 byte) sectors, so it is possible to misalign the physical sectors. I believe the best thing to do is make sure that your main partition starts on a multiple-of-eight sector, which should be 4k aligned even for 512 byte logical sectors. In this example below, the data partition starts at sector 1050624 which is 131328 * 8.

A quick note on the examples before this first one: The (shell) prompts are colored red, the inputs I type are colored green, and the rest is the output (sometimes with interactive prompts and inputs mixed in. Your output will likely differ in small or large details; I'm trusting you to be intelligent enough to figure that out if you're following this as a guide. But I find archiving the output still makes it easier to follow along. Your inputs may differ as well, be careful to make sure you are referencing (e.g.) the proper disk at every point!

Also note that all examples below are based on a test installation in a virtual machine. To confidently write this document correctly, I tend to practice to completion at least once, then repeat it beginning to end (and again if changes were needed), and only then use it for real (i.e. on real hardware). A virtual machine makes restarting and retrying much easier.

livecd ~ # gdisk /dev/sda
GPT fdisk (gdisk) version 1.0.10

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.

Command (? for help): n
Partition number (1-128, default 1):
First sector (34-33554398, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-33554398, default = 33552383) or {+-}size{KMGTP}: +512M
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): ef00
Changed type of partition to 'EFI system partition'

Command (? for help): n
Partition number (2-128, default 2):
First sector (34-33554398, default = 1050624) or {+-}size{KMGTP}:
Last sector (1050624-33554398, default = 33552383) or {+-}size{KMGTP}:
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): 8309
Changed type of partition to 'Linux LUKS'

Command (? for help): p
Disk /dev/sda: 33554432 sectors, 16.0 GiB
Model: VMware Virtual S
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 519D51DB-6D30-4EC5-98E4-EF98D7E42E99
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 33554398
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         1050623   512.0 MiB   EF00  EFI system partition
   2         1050624        33552383   15.5 GiB    8309  Linux LUKS

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sda.
The operation has completed successfully.

There's the first disk partitioned. We have 512M for the EFI partition, then the rest for LUKS+ZFS. Next we copy those partition layouts verbatim to the other drives. Finally apply names to each partition.

livecd ~ # for D in b c d; do sgdisk --replicate=/dev/sd$D --randomize-guids /dev/sda; done
The operation has completed successfully.
The operation has completed successfully.
The operation has completed successfully.
The operation has completed successfully.
The operation has completed successfully.
The operation has completed successfully.
livecd ~ # N=Y9KWTIL7; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sda
The operation has completed successfully.
livecd ~ # N=K82746KA; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sdb
The operation has completed successfully.
livecd ~ # N=WHU2I04G; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sdc
The operation has completed successfully.
livecd ~ # N=JETMMKKJ; sgdisk --change-name=1:efi_$N --change-name=2:rpool_$N /dev/sdd
The operation has completed successfully.

This leaves us with labeled partitions so we know both which drive each one is on, plus its purpose.

livecd ~ # ls /dev/disk/by-partlabel/
efi_JETMMKKJ  efi_WHU2I04G  rpool_JETMMKKJ  rpool_WHU2I04G
efi_K82746KA  efi_Y9KWTIL7  rpool_K82746KA  rpool_Y9KWTIL7

Format the EFI and LUKS partitions. During this setup phase you'll get to type your passphrase quite a few times. Use the same passphrase for all disks.

livecd ~ # for D in /dev/disk/by-partlabel/efi_*; do mkfs.vfat -F 32 $D; done
mkfs.fat 4.2 (2021-01-31)
mkfs.fat 4.2 (2021-01-31)
mkfs.fat 4.2 (2021-01-31)
mkfs.fat 4.2 (2021-01-31)
livecd ~ # for D in /dev/disk/by-partlabel/rpool_*; do cryptsetup luksFormat "$D"; done

WARNING!
========
This will overwrite data on /dev/disk/by-partlabel/rpool_JETMMKKJ irrevocably.

Are you sure? (Type 'yes' in capital letters): YES
Enter passphrase for /dev/disk/by-partlabel/rpool_JETMMKKJ:
Verify passphrase:

(... three more times ...)

If you get stuck waiting for randomness, a ping -f from another machine to this one will generate kernel entropy.

With the encrypted volumes initialized, open them up for use:

livecd ~ # mass-luks-open
Found 4 LUKS volumes:
  1            /dev/sdd2 => rpool_JETMMKKJ
  2            /dev/sdb2 => rpool_K82746KA
  3            /dev/sdc2 => rpool_WHU2I04G
  4            /dev/sda2 => rpool_Y9KWTIL7
Enter LUKS decryption passphrase:
Opening LUKS volume [/dev/sdd2] with label [rpool_JETMMKKJ]... success.
Opening LUKS volume [/dev/sdb2] with label [rpool_K82746KA]... success.
Opening LUKS volume [/dev/sdc2] with label [rpool_WHU2I04G]... success.
Opening LUKS volume [/dev/sda2] with label [rpool_Y9KWTIL7]... success.
livecd ~ # ls /dev/mapper
control  rpool_JETMMKKJ  rpool_K82746KA  rpool_WHU2I04G  rpool_Y9KWTIL7

If you're not using my installer ISO with mass-luks-open, do: for D in /dev/disk/by-partlabel/rpool_*; do cryptsetup luksOpen "$D" "$(basename $D)"; done .

Initializing ZFS

So now we have four encrypted disks (partitions) each called rpool_something and their equivalent plaintext mapped volumes. We can set up a zpool with them, and data sets within that pool.

livecd ~ # zpool create -m none -R /mnt/gentoo -o ashift=12 -o cachefile=none -o compatibility=openzfs-2.2-linux -O atime=on -O relatime=on -O compression=zstd -O xattr=sa rpool raidz2 rpool_JETMMKKJ rpool_K82746KA rpool_WHU2I04G rpool_Y9KWTIL7

Note the lowercase -o which is a pool property versus the uppercase -O which is a (default) property for data sets (in this pool). This complicated command will:

-m none
Do not mount this pool.
-R /mnt/gentoo
Sets the "altroot", i.e. the temporary alternative mount point. This value is appropriate for the Handbook driven install process.
-o ashift=12
Set the block size to 4k, or 212. (If your sectors are 512 bytes (unlikely), you should omit this.)
-o cachefile=none
Do not use a cache file. All storage is encrypted, there's nowhere to read a cache file from at import time.
-o compatibility=openzfs-2.2-linux
Limit the maximum allowed ZFS features. This way you can't accidentally upgrade past what your rescue media can handle, and tools won't "remind" you to upgrade past that point either. (You can zpool set compatibility=... pool in the future, even to "off".)
-O atime=on -O relatime=on
Enable only relatime.
-O compression=zstd
Enable compression. Your CPU can probably compress/decompress faster than a spinning disk can read/write. The zstd method seems best.
-O xattr=sa
Store extended attributes in inodes rather than hidden files. Which is supposed to make Samba more performant.
rpool
The name of the pool. (I wish I had a better scheme, but I just call it the "r"oot pool.)
raidz2
The type of the pool, RAID-like with two disks for redundancy.
rpool_JETMMKKJ rpool_K82746KA rpool_WHU2I04G rpool_Y9KWTIL7
The devices making up this pool; ZFS can find them just by these short names (which are unlikely to otherwise exist if you've chosen them well) and their brevity will enhance readability.

And in good Unix tradition, produces no output upon success.

livecd ~ # zpool status
(chroot) livecd ~ # zpool status
  pool: rpool
 state: ONLINE
config:

        NAME                STATE     READ WRITE CKSUM
        rpool               ONLINE       0     0     0
          raidz2-0          ONLINE       0     0     0
            rpool_JETMMKKJ  ONLINE       0     0     0
            rpool_K82746KA  ONLINE       0     0     0
            rpool_WHU2I04G  ONLINE       0     0     0
            rpool_Y9KWTIL7  ONLINE       0     0     0

errors: No known data errors

Now to create the datasets within the pool. ZFS datasets are hierarchical. I've created two top level data sets: root and tmp. Both get regular snapshots, but only root gets replicated off site. First, (create the root dataset, and within that) initialize data sets for each set of distinct mount properties and/or snapshot behavior. The $NEWUSER variable will hold the name of the user account to be created later.

livecd ~ # export NEWUSER=user_name_here
livecd ~ # zfs create -o mountpoint=/ rpool/root
livecd ~ # zfs create -o mountpoint=/home/$NEWUSER rpool/root/home_$NEWUSER
livecd ~ # zfs create -o mountpoint=none rpool/tmp
livecd ~ # zfs create -o mountpoint=/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/root
livecd ~ # zfs create -o mountpoint=/home/$NEWUSER/tmp -o devices=off -o exec=off -o setuid=off rpool/tmp/$NEWUSER
livecd ~ # zfs create -o mountpoint=/usr/src rpool/tmp/linux-src
livecd ~ # zfs create -o mountpoint=/var/db/repos rpool/tmp/portage
livecd ~ # zfs create -o mountpoint=/var/tmp rpool/tmp/var

Note that ZFS will auto-mount these data sets as they're created, at the given mount points (relative to the altroot specified at zpool creation). Also optionally (but recommended) set up swap.

livecd ~ # zfs create -o sync=always -o primarycache=metadata -o secondarycache=none -o volblocksize=4K -o logbias=throughput -o compression=off -V 1G rpool/swap
Warning: volblocksize (4096) is less than the default minimum block size (16384).
To reduce wasted space a volblocksize of 16384 is recommended.
livecd ~ # mkswap -f /dev/zvol/rpool/swap
Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=c1e2b76d-717a-4cad-b69f-92e0a9289027
livecd ~ # swapon /dev/zvol/rpool/swap

I think this small volblocksize is still a good idea, because the kernel will be using 4k pages. Continue with the last few settings for our mounts:

livecd ~ # mkdir -p /mnt/gentoo/efi; chmod 0111 /mnt/gentoo/efi
livecd ~ # mount /dev/disk/by-partlabel/efi_Y9KWTIL7 /mnt/gentoo/efi
livecd ~ # chmod 1777 /mnt/gentoo/tmp

Kernel and Boot

We're doing a standard Gentoo install now, from the installing Stage3 section. When reaching the configuring the kernel section skip it and return here.

We'll use the prebuilt distribution kernel. Dracut to build the UKI (no need for any other bootloader like grub), and our own dracut module to unlock our several LUKS volumes.

(chroot) livecd ~ # cat > /etc/portage/package.use/kernel <<EOF
> sys-apps/systemd-utils boot kernel-install
> sys-fs/zfs-kmod initramfs -rootfs
> sys-fs/zfs -dist-kernel -rootfs
> sys-kernel/gentoo-kernel-bin -initramfs
> sys-kernel/installkernel dracut uki
> EOF
(chroot) livecd ~ # emerge -vat dev-vcs/git sys-kernel/dracut sys-fs/cryptsetup
(chroot) livecd ~ # git clone https://github.com/arantius/mass-luks-open.git
(chroot) livecd ~ # cd mass-luks-open
(chroot) livecd ~/mass-luks-open # make install
make -C src
make[1]: Entering directory '/root/mass-luks-open/src'
gcc -Wall -Werror -Wextra -Wformat -Wshadow -Wstrict-prototypes -Wno-unused-parameter -Wno-unused-variable -o mass-luks-open mass-luks-open.c -lblkid -lcryptsetup
make[1]: Leaving directory '/root/mass-luks-open/src'
install --group=root --owner=root --mode=755 --directory /usr/lib/dracut/modules.d/20mass-luks-open
install --group=root --owner=root --mode=644 module-setup.sh /usr/lib/dracut/modules.d/20mass-luks-open
install --group=root --owner=root --mode=755 src/mass-luks-open.sh src/mass-luks-open /usr/lib/dracut/modules.d/20mass-luks-open
(chroot) livecd ~/mass-luks-open # cd
(chroot) livecd ~ # mkdir -p /etc/dracut.conf.d/
(chroot) livecd ~ # cat > /etc/dracut.conf.d/omit.conf <<EOF
> omit_dracutmodules+=" bluetooth crypt nfs pcmcia systemd-cryptsetup "
> EOF
(chroot) livecd ~ # cat > /etc/dracut.conf.d/hostonly.conf <<EOF
> hostonly="yes"
> EOF
(chroot) livecd ~ # mkdir -p /efi/EFI/Linux
(chroot) livecd ~ # emerge -vat sys-kernel/gentoo-kernel-bin sys-boot/efibootmgr
...
(chroot) livecd ~ # ls -l /efi/EFI/Linux/
total 27516
-rwxr-xr-x 1 root 28173312 Jan  7 22:14 gentoo-6.6.67-gentoo-dist.efi

Remove this UKI! It doesn't contain ZFS so won't work for us. But, all the prerequisites are ready so we're going to install ZFS now and rebuild the UKI with it. (Note: I had some trouble with this in my small VM. I might suggest building sys-fs/zfs (and dependencies) as its own step with more conservative parallelism settings, if you have issues.)

(chroot) livecd ~ # emerge -vat sys-fs/zfs
...
(chroot) livecd ~ # rc-update add zfs-mount boot
(chroot) livecd ~ # emerge --config gentoo-kernel-bin
...
(chroot) livecd ~ # ls -l /efi/EFI/Linux/
total 46368
-rwxr-xr-x 1 root root 47480832 Jan  7 22:22 gentoo-6.6.67-gentoo-dist.efi

This has installed the kernel and all dependencies, plus built our customized UKI (notice it's bigger). We need to add the UEFI boot entry to point to the good one:

(chroot) livecd ~ # efibootmgr --create --disk=/dev/sda --part=1 --loader='\EFI\Linux\gentoo-6.6.67-gentoo-dist.efi' --label='Gentoo 6.6.67'

This completes the configuring the kernel and configuring the bootloader sections of the Handbook. Resume from configuring the system section. The fstab should look something like:

# <fs>                  <mountpoint>    <type>  <opts>          <dump/pass>
PARTLABEL=efi_Y9KWTIL7  /efi            vfat    defaults        0 2
/dev/zvol/rpool/swap    none            swap    sw              0 0

That's it! Simply continue with the Handbook, skipping the bootloader steps we've already done above. Once you've successfully booted, it might be a good idea to manually copy the UEFI UKI file, and add another efibootmgr entry to point to it. If future (i.e. kernel) updates go awry, this known-working UKI can be used temporarily.

Appendix: Recovery

Should you ever need to reboot during installation, or later boot from the live ISO for recovery, it would go something like this:

livecd ~ # mass-luks-open
Found 4 LUKS volumes:
  1            /dev/sdd2 => rpool_JETMMKKJ
  2            /dev/sdb2 => rpool_K82746KA
  3            /dev/sdc2 => rpool_WHU2I04G
  4            /dev/sda2 => rpool_Y9KWTIL7
Enter LUKS decryption passphrase:
Opening LUKS volume [/dev/sdd2] with label [rpool_JETMMKKJ]... success.
Opening LUKS volume [/dev/sdb2] with label [rpool_K82746KA]... success.
Opening LUKS volume [/dev/sdc2] with label [rpool_WHU2I04G]... success.
Opening LUKS volume [/dev/sda2] with label [rpool_Y9KWTIL7]... success.
livecd ~ # zpool import -fR /mnt/gentoo -d /dev/mapper rpool
livecd ~ # mount /dev/sda1 /mnt/gentoo/efi
livecd ~ # arch-chroot /mnt/gentoo /bin/bash
livecd ~ # export PS1="(chroot) $PS1"

Appendix: Links

I relied on a number of existing sources to figure out how to do this. Most of this information came from Gentoo Hardened ZFS rootfs with dm-crypt/luks 0.6.2 which is quite similar to this article. I added some more details from the Funtoo wiki's ZFS Install Guide and the Gentoo wiki's ZFS and ZFS/rootfs and Unified kernel image page.

Comments:

No comments!

Post a comment:

Username
Password
  If you do not have an account to log in to yet, register your own account. You will not enter any personal info and need not supply an email address.
Subject:
Comment:

You may use Markdown syntax in the comment, but no HTML. Hints:

If you are attempting to contact me, ask me a question, etc, please send me a message through the contact form rather than posting a comment here. Thank you. (If you post a comment anyway when it should be a message to me, I'll probably just delete your comment. I don't like clutter.)