15 Backup and Restore
Note
ZFS snapshots for instant local rollback. Per-service database dumps. Backblaze B2 for offsite.
15.1 Snapshot strategy
- zrepl manages automated ZFS snapshots on a schedule (hourly, daily, weekly retention)
- Snapshots are instant and free (copy-on-write); there’s no reason not to snapshot frequently
- Pre-deploy snapshots taken by pyinfra before service restarts provide a known-good rollback point
15.2 Database backup patterns
- Services with PostgreSQL (Authentik, Immich, GitLab):
pg_dumprun via systemd timer, output written to the service’s ZFS dataset - The dump file is then included in ZFS snapshots and offsite replication
- In-pod databases mean each service’s backup is self-contained; no shared database server to coordinate
15.3 Offsite with Backblaze B2
- Backblaze B2 as the offsite target: S3-compatible API, low cost per TB
- zrepl or restic sends encrypted snapshots to B2
- Offsite replication is the disaster recovery path: house fire, drive failure beyond pool redundancy, ransomware
15.4 Restore testing
- Restore testing is the part most people skip; untested backups are not backups
- ZFS
clonelets you mount a snapshot as a writable dataset without affecting the original — useful for testing restores without downtime - Periodic restore drills: spin up a service from backup, verify data integrity, tear it down
15.5 Break-glass access
- If the host is unreachable, offsite backups in B2 are accessible from anywhere with the encryption key
- Encryption keys stored in Bitwarden (cloud) and Vaultwarden (self-hosted); losing both simultaneously requires losing the maintainer’s Bitwarden account and the physical host
- Recovery path: new host,
bootc install to-diskwith the instance image, restore ZFS datasets from B2, re-deploy services