Raxx · internal docs

internal · gated ↑ index

FreeScout backup and restore runbook

System: FreeScout helpdesk — tickets.raxx.app — backup and restore procedures Owner: operator Related runbook: docs/ops/runbooks/freescout.md Related issues: #714 (backup implementation), #668 (S3 bucket provisioning) Last incident: none (initial creation) Last reviewed: 2026-05-06


Backup architecture

FreeScout backups use a two-tier strategy orchestrated by GH Actions. Both tiers run in the same daily workflow at 06:00 UTC and are independent — a Tier 2 failure does not affect Tier 1.

Tier Type Schedule Retention Where
1 Lightsail instance snapshot (full disk) Daily 06:00 UTC 7 most-recent snapshots Lightsail
2 Logical DB dump (mysqldump → gzip → S3) Daily 06:00 UTC 30 days s3://raxx-support-attachments/db-backups/freescout/

06:00 UTC = 23:00 Pacific (PDT), low-traffic window. Both tiers run in the same GH Actions job; Tier 2 dump completes before the Tier 1 snapshot is initiated so the snapshot captures a consistent post-dump state.


GH Actions workflow

Workflow file: .github/workflows/freescout-backup.yml Schedule: 0 6 * * * (06:00 UTC daily) Manual trigger: workflow_dispatch with optional dry_run input

What the workflow does (in order)

  1. Configure AWS credentials from GH Actions repo secrets
  2. Read the DB password from SSM (/raxx/freescout/db_password) — never echoed
  3. Read the SSH key from SSM (/raxx/freescout/ssh_key) — written to temp file, deleted at end
  4. Ensure the S3 bucket raxx-support-attachments exists (creates once if missing)
  5. SSH into admin@54.146.13.200 — stream mysqldump | gzip to aws s3 cp on runner
  6. Verify the S3 object size (must be > 1 KB)
  7. Delete S3 objects older than 30 days
  8. Create a Lightsail snapshot named raxx-tickets-backup-YYYY-MM-DD
  9. Prune Lightsail snapshots to keep the last 7
  10. Clean up SSH key from runner disk
  11. Post failure Slack DM to D0AJ7K184TV if any step fails

Required GH Actions repo secrets

Secret Description
AWS_BACKUP_ACCESS_KEY_ID IAM access key for raxx-freescout-backup user
AWS_BACKUP_SECRET_ACCESS_KEY IAM secret key for raxx-freescout-backup user
SLACK_BOT_TOKEN Slack bot token with chat:write scope (operator DM channel)

Dry-run mode

# Trigger dry run via gh CLI
gh workflow run freescout-backup.yml --field dry_run=true

Dry run prints all planned actions but skips mysqldump, S3 upload, S3 cleanup, Lightsail snapshot creation, and snapshot pruning.


SSM parameter paths (us-east-1)

All parameters are SecureString type.

SSM path Contains Who reads it
/raxx/freescout/db_password MariaDB freescout user password GH Actions workflow, freescout-backup.sh on instance
/raxx/freescout/ssh_key PEM private key for admin@raxx-tickets GH Actions workflow

Provision SSM parameters (one-time setup)

# DB password — retrieve from Terraform state or instance .env
DB_PASS=$(ssh -i /tmp/lightsail_us_east_1.pem \
  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
  admin@54.146.13.200 \
  'grep "^DB_PASSWORD=" /var/www/html/freescout/.env | cut -d= -f2-')

aws ssm put-parameter \
  --name "/raxx/freescout/db_password" \
  --type SecureString \
  --value "$DB_PASS" \
  --region us-east-1 \
  --overwrite >/dev/null
echo "db_password written to SSM"

# SSH key — full PEM content
SSH_KEY=$(cat /tmp/lightsail_us_east_1.pem)
aws ssm put-parameter \
  --name "/raxx/freescout/ssh_key" \
  --type SecureString \
  --value "$SSH_KEY" \
  --region us-east-1 \
  --overwrite >/dev/null
echo "ssh_key written to SSM"

IAM policy for the GH Actions user (raxx-freescout-backup)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SSMReadFreescout",
      "Effect": "Allow",
      "Action": ["ssm:GetParameter"],
      "Resource": "arn:aws:ssm:us-east-1:*:parameter/raxx/freescout/*"
    },
    {
      "Sid": "S3BackupWrite",
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject", "s3:HeadObject", "s3:DeleteObject", "s3:ListBucket",
                 "s3api:HeadBucket", "s3:CreateBucket"],
      "Resource": [
        "arn:aws:s3:::raxx-support-attachments",
        "arn:aws:s3:::raxx-support-attachments/db-backups/freescout/*"
      ]
    },
    {
      "Sid": "S3ListBackup",
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "arn:aws:s3:::raxx-support-attachments",
      "Condition": {"StringLike": {"s3:prefix": ["db-backups/freescout/*"]}}
    },
    {
      "Sid": "KMSBackup",
      "Effect": "Allow",
      "Action": ["kms:GenerateDataKey", "kms:Decrypt", "kms:DescribeKey"],
      "Resource": "*",
      "Condition": {"StringEquals": {"kms:ViaService": "s3.us-east-1.amazonaws.com"}}
    },
    {
      "Sid": "LightsailSnapshotManage",
      "Effect": "Allow",
      "Action": [
        "lightsail:CreateInstanceSnapshot",
        "lightsail:GetInstanceSnapshots",
        "lightsail:DeleteInstanceSnapshot"
      ],
      "Resource": "*"
    }
  ]
}

Tier 1: Lightsail snapshots

What it backs up

The full instance disk (/dev/xvda, 40 GB SSD). This includes the OS, MariaDB data files, FreeScout application files, .env, and all configuration. A Tier 1 restore rebuilds the entire instance as of the snapshot time.

Schedule

Verify snapshots exist

aws lightsail get-instance-snapshots --region us-east-1 \
  --query "instanceSnapshots[?fromInstanceName=='raxx-tickets'].[name,createdAt,state]" \
  --output table

Expected: at least one snapshot with state=available dated within the last 25 hours.

Re-enable Lightsail auto-snapshot (fallback)

The GH Actions workflow creates named snapshots, not auto-snapshots. If the workflow is not running, you can enable Lightsail's built-in auto-snapshot as a safety net:

aws lightsail enable-add-on \
  --resource-name raxx-tickets \
  --region us-east-1 \
  --add-on-request 'addOnType=AutoSnapshot,autoSnapshotAddOnRequest={snapshotTimeOfDay=06:00}'

Tier 2: Logical DB dump (S3)

What it backs up

MariaDB logical dump of the freescout database only. Schema + all rows. Does not include OS state, application files, or credentials.

Schedule

Daily at 06:00 UTC via GH Actions. The SSH session runs mysqldump on the instance and streams the gzip-compressed output directly to S3 — no SQL file is written to the runner disk or the instance disk.

S3 location

Property Value
Bucket raxx-support-attachments
Prefix db-backups/freescout/
Key format db-backups/freescout/YYYY-MM-DD.sql.gz
Encryption SSE-KMS
Storage class STANDARD
Retention 30 days (older objects deleted by workflow)

Verify a backup ran

# Most recent backup object
TODAY=$(date -u '+%Y-%m-%d')
aws s3api head-object \
  --bucket raxx-support-attachments \
  --key "db-backups/freescout/${TODAY}.sql.gz" \
  --region us-east-1

List recent backups:

aws s3 ls s3://raxx-support-attachments/db-backups/freescout/ \
  --region us-east-1 --human-readable | sort -r | head -10

A missing object for today's date at 06:00 UTC is an alert condition. Check the GH Actions workflow run for that day.

Validate a dump

TARGET_DATE="YYYY-MM-DD"
aws s3 cp \
  "s3://raxx-support-attachments/db-backups/freescout/${TARGET_DATE}.sql.gz" \
  /tmp/validate-${TARGET_DATE}.sql.gz \
  --region us-east-1

# Check gzip integrity
gunzip -t /tmp/validate-${TARGET_DATE}.sql.gz && echo "gzip OK"

# Check SQL content (no output = corrupt; should print SQL headers)
gunzip -c /tmp/validate-${TARGET_DATE}.sql.gz | head -20

rm -f /tmp/validate-${TARGET_DATE}.sql.gz

Restore procedures

When to use each tier

Scenario Recommended tier
Full instance corruption / disk failure Tier 1 (Lightsail snapshot)
Data deleted or corrupted by application Tier 2 (S3 logical dump) — faster, DB-only
Need data from > 7 days ago Tier 2 (S3, up to 30 days)
Accidental table drop Tier 2
OS-level compromise Tier 1 (fresh instance from snapshot)

Tier 1 restore: Lightsail snapshot to new instance

  1. List available snapshots to identify the target:
aws lightsail get-instance-snapshots --region us-east-1 \
  --query "instanceSnapshots[?fromInstanceName=='raxx-tickets'].[name,createdAt,state]" \
  --output table
  1. Create a new instance from the snapshot:
# Replace <snapshot-name> with the target snapshot from step 1
aws lightsail create-instances-from-snapshot \
  --instance-names raxx-tickets-restored \
  --availability-zone us-east-1a \
  --bundle-id micro_3_0 \
  --instance-snapshot-name <snapshot-name> \
  --region us-east-1
  1. Wait for the new instance to enter running state:
aws lightsail get-instance --instance-name raxx-tickets-restored \
  --region us-east-1 \
  --query 'instance.state'
# Wait until: {"code": 16, "name": "running"}
  1. Attach the static IP to the restored instance (swaps traffic):
aws lightsail detach-static-ip \
  --static-ip-name raxx-tickets-ip \
  --region us-east-1

aws lightsail attach-static-ip \
  --static-ip-name raxx-tickets-ip \
  --instance-name raxx-tickets-restored \
  --region us-east-1
  1. Verify https://tickets.raxx.app/ returns the FreeScout login page:
curl -sI https://tickets.raxx.app/ | grep -E 'HTTP|Content-Type'
  1. Rename the restored instance to raxx-tickets once the original is deleted:
aws lightsail delete-instance \
  --instance-name raxx-tickets \
  --region us-east-1
  1. Re-enable backup on the restored instance — update the SSM SSH key param if the new instance has a different key pair, and run a manual workflow dispatch to verify the backup works against the new instance.

Tier 2 restore: S3 dump to running instance

Use this when the instance is healthy but data needs to be recovered from a logical dump.

IMPORTANT: This procedure overwrites the live FreeScout database. Take the site offline first.

  1. Identify the target backup:
aws s3 ls s3://raxx-support-attachments/db-backups/freescout/ \
  --region us-east-1 | sort -r | head -10
  1. SSH into the instance:
ssh -i /tmp/lightsail_us_east_1.pem \
  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
  admin@54.146.13.200
  1. On the instance — download, validate, restore:
TARGET_DATE="YYYY-MM-DD"   # Set to target date
RESTORE_DIR="/tmp/freescout-restore"
mkdir -p "$RESTORE_DIR"

# Download from S3
aws s3 cp \
  "s3://raxx-support-attachments/db-backups/freescout/${TARGET_DATE}.sql.gz" \
  "${RESTORE_DIR}/restore.sql.gz" \
  --region us-east-1

# Validate
gunzip -t "${RESTORE_DIR}/restore.sql.gz" && echo "gzip integrity OK"

# Decompress
gunzip "${RESTORE_DIR}/restore.sql.gz"

# Stop Apache to prevent new writes
sudo systemctl stop apache2

# Safety dump of current DB before overwrite
DB_PASSWORD=$(grep '^DB_PASSWORD=' /var/www/html/freescout/.env | cut -d= -f2-)
MYSQL_PWD="$DB_PASSWORD" mysqldump \
  --single-transaction \
  --host=127.0.0.1 \
  --user=freescout \
  freescout | gzip > "${RESTORE_DIR}/pre-restore-safety-$(date -u +%Y%m%dT%H%M%SZ).sql.gz"
echo "Safety dump: $(ls -lh ${RESTORE_DIR}/pre-restore-safety*.sql.gz)"

# Restore
MYSQL_PWD="$DB_PASSWORD" mysql \
  --host=127.0.0.1 \
  --user=freescout \
  freescout < "${RESTORE_DIR}/restore.sql"
echo "Restore import complete"

# Clear caches
cd /var/www/html/freescout
sudo -u www-data /usr/bin/php artisan cache:clear
sudo -u www-data /usr/bin/php artisan config:cache

# Bring the site back online
sudo systemctl start apache2
  1. Verify from outside:
curl -sI https://tickets.raxx.app/ | grep 'HTTP'
# Expected: HTTP/2 200
  1. Log into FreeScout at https://tickets.raxx.app/ and confirm the expected conversation data is present.

  2. Clean up:

rm -rf "$RESTORE_DIR"

Verified restore record

Date Type From backup date Performed by Outcome
YYYY-MM-DD (pending) operator

A verified restore against a scratch Lightsail instance must be performed before marking this card fully closed. Add a row to the table above when done. The verified-restore invariant confirms that the backup is not just written but also readable.


Known failure modes

Failure mode: GH Actions workflow fails

Symptom: Workflow run shows red; Slack DM received at D0AJ7K184TV.

Diagnose: 1. Open the failed run in GH Actions — check which step failed 2. For SSM step: verify AWS_BACKUP_ACCESS_KEY_ID / AWS_BACKUP_SECRET_ACCESS_KEY secrets are set and the IAM user has ssm:GetParameter on /raxx/freescout/* 3. For SSH step: verify raxx-tickets instance is running (aws lightsail get-instance --instance-name raxx-tickets --region us-east-1 --query 'instance.state') 4. For S3 step: verify bucket raxx-support-attachments exists and the IAM user has s3:PutObject on the db-backups/freescout/* prefix 5. For snapshot step: verify the IAM user has lightsail:CreateInstanceSnapshot

Fix: Resolve the underlying permission or connectivity issue, then re-run via gh workflow run freescout-backup.yml.

Failure mode: S3 object too small (dump empty)

Symptom: Workflow step "Verify S3 upload" fails with "S3 object too small".

Cause: mysqldump connected but found an empty database, or the SSH pipe was interrupted.

Fix:

# Verify DB has data
ssh -i /tmp/lightsail_us_east_1.pem \
  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
  admin@54.146.13.200 \
  'mysql -u root -e "SELECT COUNT(*) FROM freescout.users;"'

If the DB is empty, the FreeScout instance may need to be rebuilt from a Lightsail snapshot.

Failure mode: Lightsail snapshot not taken

Symptom: aws lightsail get-instance-snapshots shows no snapshot newer than 7 days.

Cause: Workflow not running (cron skipped), or lightsail:CreateInstanceSnapshot permission denied.

Fix:

# Manual snapshot
aws lightsail create-instance-snapshot \
  --instance-name raxx-tickets \
  --instance-snapshot-name "raxx-tickets-manual-$(date -u +%Y-%m-%d)" \
  --region us-east-1

Then investigate why the scheduled workflow is not running.

Failure mode: SSM parameters not found

Symptom: Workflow fails with ParameterNotFound on the SSM step.

Cause: SSM parameters /raxx/freescout/db_password or /raxx/freescout/ssh_key not provisioned.

Fix: Follow "Provision SSM parameters" section above.


Cost estimate

Resource Frequency Unit cost Est. monthly
GH Actions minutes ~5 min/day × 30 days ~$0.008/min (public runner) ~$1.20/mo
Lightsail snapshots 7 × ~10 GB effective ~$0.05/GB/mo ~$3.50/mo
S3 Standard (30 days) ~50 MB/day × 30 days = ~1.5 GB $0.023/GB ~$0.03/mo
S3 PUT requests 1/day negligible <$0.01/mo
KMS encrypt/decrypt 1 PUT + 1 GET/day $0.03/10K negligible
Total estimated ~$4.73/mo

Escalation

Wake the operator when: - Both Tier 1 and Tier 2 backups have failed for more than 2 consecutive days - A restore attempt from Tier 2 produces data integrity errors - The Lightsail instance is unresponsive and no snapshots exist - The S3 bucket or KMS key is inaccessible (suspected security incident)