15 Backup and Restore

Note

ZFS snapshots for instant local rollback. Per-service database dumps. Backblaze B2 for offsite.

15.1 Snapshot strategy

zrepl manages automated ZFS snapshots on a schedule (hourly, daily, weekly retention)
Snapshots are instant and free (copy-on-write); there’s no reason not to snapshot frequently
Pre-deploy snapshots taken by pyinfra before service restarts provide a known-good rollback point

15.2 Database backup patterns

Services with PostgreSQL (Authentik, Immich, GitLab): pg_dump run via systemd timer, output written to the service’s ZFS dataset
The dump file is then included in ZFS snapshots and offsite replication
In-pod databases mean each service’s backup is self-contained; no shared database server to coordinate

15.3 Offsite with Backblaze B2

Backblaze B2 as the offsite target: S3-compatible API, low cost per TB
zrepl or restic sends encrypted snapshots to B2
Offsite replication is the disaster recovery path: house fire, drive failure beyond pool redundancy, ransomware

15.4 Restore testing

Restore testing is the part most people skip; untested backups are not backups
ZFS clone lets you mount a snapshot as a writable dataset without affecting the original — useful for testing restores without downtime
Periodic restore drills: spin up a service from backup, verify data integrity, tear it down

15.5 Break-glass access

If the host is unreachable, offsite backups in B2 are accessible from anywhere with the encryption key
Encryption keys stored in Bitwarden (cloud) and Vaultwarden (self-hosted); losing both simultaneously requires losing the maintainer’s Bitwarden account and the physical host
Recovery path: new host, bootc install to-disk with the instance image, restore ZFS datasets from B2, re-deploy services

# Backup and Restore

::: {.callout-note}
ZFS snapshots for instant local rollback. Per-service database dumps. [Backblaze B2](https://www.backblaze.com/cloud-storage) for offsite.
:::

## Snapshot strategy

- [zrepl](https://zrepl.github.io/) manages automated ZFS snapshots on a schedule (hourly, daily, weekly retention)
- Snapshots are instant and free (copy-on-write); there's no reason not to snapshot frequently
- Pre-deploy snapshots taken by pyinfra before service restarts provide a known-good rollback point

## Database backup patterns

- Services with PostgreSQL (Authentik, Immich, GitLab): `pg_dump` run via systemd timer, output written to the service's ZFS dataset
- The dump file is then included in ZFS snapshots and offsite replication
- In-pod databases mean each service's backup is self-contained; no shared database server to coordinate

## Offsite with Backblaze B2

- [Backblaze B2](https://www.backblaze.com/cloud-storage) as the offsite target: S3-compatible API, low cost per TB
- zrepl or [restic](https://restic.net/) sends encrypted snapshots to B2
- Offsite replication is the disaster recovery path: house fire, drive failure beyond pool redundancy, ransomware

## Restore testing

- Restore testing is the part most people skip; untested backups are not backups
- ZFS `clone` lets you mount a snapshot as a writable dataset without affecting the original — useful for testing restores without downtime
- Periodic restore drills: spin up a service from backup, verify data integrity, tear it down

## Break-glass access

- If the host is unreachable, offsite backups in B2 are accessible from anywhere with the encryption key
- Encryption keys stored in Bitwarden (cloud) and Vaultwarden (self-hosted); losing both simultaneously requires losing the maintainer's Bitwarden account and the physical host
- Recovery path: new host, `bootc install to-disk` with the instance image, restore ZFS datasets from B2, re-deploy services