15  Backup and Restore

Note

ZFS snapshots for instant local rollback. Per-service database dumps. Backblaze B2 for offsite.

15.1 Snapshot strategy

  • zrepl manages automated ZFS snapshots on a schedule (hourly, daily, weekly retention)
  • Snapshots are instant and free (copy-on-write); there’s no reason not to snapshot frequently
  • Pre-deploy snapshots taken by pyinfra before service restarts provide a known-good rollback point

15.2 Database backup patterns

  • Services with PostgreSQL (Authentik, Immich, GitLab): pg_dump run via systemd timer, output written to the service’s ZFS dataset
  • The dump file is then included in ZFS snapshots and offsite replication
  • In-pod databases mean each service’s backup is self-contained; no shared database server to coordinate

15.3 Offsite with Backblaze B2

  • Backblaze B2 as the offsite target: S3-compatible API, low cost per TB
  • zrepl or restic sends encrypted snapshots to B2
  • Offsite replication is the disaster recovery path: house fire, drive failure beyond pool redundancy, ransomware

15.4 Restore testing

  • Restore testing is the part most people skip; untested backups are not backups
  • ZFS clone lets you mount a snapshot as a writable dataset without affecting the original — useful for testing restores without downtime
  • Periodic restore drills: spin up a service from backup, verify data integrity, tear it down

15.5 Break-glass access

  • If the host is unreachable, offsite backups in B2 are accessible from anywhere with the encryption key
  • Encryption keys stored in Bitwarden (cloud) and Vaultwarden (self-hosted); losing both simultaneously requires losing the maintainer’s Bitwarden account and the physical host
  • Recovery path: new host, bootc install to-disk with the instance image, restore ZFS datasets from B2, re-deploy services