FreeScout backup and restore runbook
System: FreeScout helpdesk — tickets.raxx.app — backup and restore procedures
Owner: operator
Related runbook: docs/ops/runbooks/freescout.md
Related issues: #714 (backup implementation), #668 (S3 bucket provisioning)
Last incident: none (initial creation)
Last reviewed: 2026-06-20 (BCP Win 4 drill — verified restore record added, issue #2655)
Backup architecture
FreeScout backups use a two-tier strategy orchestrated by GH Actions. Both tiers run in the same daily workflow at 06:00 UTC and are independent — a Tier 2 failure does not affect Tier 1.
| Tier | Type | Schedule | Retention | Where |
|---|---|---|---|---|
| 1 | Lightsail instance snapshot (full disk) | Daily 06:00 UTC | 7 most-recent snapshots | Lightsail |
| 2 | Logical DB dump (mysqldump → gzip → S3) | Daily 06:00 UTC | 30 days | s3://raxx-support-attachments/db-backups/freescout/ |
06:00 UTC = 23:00 Pacific (PDT), low-traffic window. Both tiers run in the same GH Actions job; Tier 2 dump completes before the Tier 1 snapshot is initiated so the snapshot captures a consistent post-dump state.
GH Actions workflow
Workflow file: .github/workflows/freescout-backup.yml
Schedule: 0 6 * * * (06:00 UTC daily)
Manual trigger: workflow_dispatch with optional dry_run input
What the workflow does (in order)
- Configure AWS credentials from GH Actions repo secrets
- Read the DB password from SSM (
/raxx/freescout/db_password) — never echoed - Read the SSH key from SSM (
/raxx/freescout/ssh_key) — written to temp file, deleted at end - Ensure the S3 bucket
raxx-support-attachmentsexists (creates once if missing) - SSH into
admin@54.146.13.200— streammysqldump | gziptoaws s3 cpon runner - Verify the S3 object size (must be > 1 KB)
- Delete S3 objects older than 30 days
- Create a Lightsail snapshot named
raxx-tickets-backup-YYYY-MM-DD - Prune Lightsail snapshots to keep the last 7
- Clean up SSH key from runner disk
- Post failure Slack DM to
D0AJ7K184TVif any step fails
Required GH Actions repo secrets
| Secret | Description |
|---|---|
AWS_BACKUP_ACCESS_KEY_ID |
IAM access key for raxx-freescout-backup user |
AWS_BACKUP_SECRET_ACCESS_KEY |
IAM secret key for raxx-freescout-backup user |
SLACK_BOT_TOKEN |
Slack bot token with chat:write scope (operator DM channel) |
Dry-run mode
# Trigger dry run via gh CLI
gh workflow run freescout-backup.yml --field dry_run=true
Dry run prints all planned actions but skips mysqldump, S3 upload, S3 cleanup, Lightsail snapshot creation, and snapshot pruning.
SSM parameter paths (us-east-1)
All parameters are SecureString type.
| SSM path | Contains | Who reads it |
|---|---|---|
/raxx/freescout/db_password |
MariaDB freescout user password |
GH Actions workflow, freescout-backup.sh on instance |
/raxx/freescout/ssh_key |
PEM private key for admin@raxx-tickets |
GH Actions workflow |
Provision SSM parameters (one-time setup)
# DB password — retrieve from Terraform state or instance .env
DB_PASS=$(ssh -i /tmp/lightsail_us_east_1.pem \
-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
admin@54.146.13.200 \
'grep "^DB_PASSWORD=" /var/www/html/freescout/.env | cut -d= -f2-')
aws ssm put-parameter \
--name "/raxx/freescout/db_password" \
--type SecureString \
--value "$DB_PASS" \
--region us-east-1 \
--overwrite >/dev/null
echo "db_password written to SSM"
# SSH key — full PEM content
SSH_KEY=$(cat /tmp/lightsail_us_east_1.pem)
aws ssm put-parameter \
--name "/raxx/freescout/ssh_key" \
--type SecureString \
--value "$SSH_KEY" \
--region us-east-1 \
--overwrite >/dev/null
echo "ssh_key written to SSM"
IAM policy for the GH Actions user (raxx-freescout-backup)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SSMReadFreescout",
"Effect": "Allow",
"Action": ["ssm:GetParameter"],
"Resource": "arn:aws:ssm:us-east-1:*:parameter/raxx/freescout/*"
},
{
"Sid": "S3BackupWrite",
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:HeadObject", "s3:DeleteObject", "s3:ListBucket",
"s3api:HeadBucket", "s3:CreateBucket"],
"Resource": [
"arn:aws:s3:::raxx-support-attachments",
"arn:aws:s3:::raxx-support-attachments/db-backups/freescout/*"
]
},
{
"Sid": "S3ListBackup",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::raxx-support-attachments",
"Condition": {"StringLike": {"s3:prefix": ["db-backups/freescout/*"]}}
},
{
"Sid": "KMSBackup",
"Effect": "Allow",
"Action": ["kms:GenerateDataKey", "kms:Decrypt", "kms:DescribeKey"],
"Resource": "*",
"Condition": {"StringEquals": {"kms:ViaService": "s3.us-east-1.amazonaws.com"}}
},
{
"Sid": "LightsailSnapshotManage",
"Effect": "Allow",
"Action": [
"lightsail:CreateInstanceSnapshot",
"lightsail:GetInstanceSnapshots",
"lightsail:DeleteInstanceSnapshot"
],
"Resource": "*"
}
]
}
Tier 1: Lightsail snapshots
What it backs up
The full instance disk (/dev/xvda, 40 GB SSD). This includes the OS, MariaDB data files, FreeScout application files, .env, and all configuration. A Tier 1 restore rebuilds the entire instance as of the snapshot time.
Schedule
- Triggered: daily by GH Actions workflow at 06:00 UTC (after Tier 2 dump completes)
- Retention: the 7 most-recent snapshots are kept; older ones are deleted by the workflow
Verify snapshots exist
aws lightsail get-instance-snapshots --region us-east-1 \
--query "instanceSnapshots[?fromInstanceName=='raxx-tickets'].[name,createdAt,state]" \
--output table
Expected: at least one snapshot with state=available dated within the last 25 hours.
Re-enable Lightsail auto-snapshot (fallback)
The GH Actions workflow creates named snapshots, not auto-snapshots. If the workflow is not running, you can enable Lightsail's built-in auto-snapshot as a safety net:
aws lightsail enable-add-on \
--resource-name raxx-tickets \
--region us-east-1 \
--add-on-request 'addOnType=AutoSnapshot,autoSnapshotAddOnRequest={snapshotTimeOfDay=06:00}'
Tier 2: Logical DB dump (S3)
What it backs up
MariaDB logical dump of the freescout database only. Schema + all rows. Does not include OS state, application files, or credentials.
Schedule
Daily at 06:00 UTC via GH Actions. The SSH session runs mysqldump on the instance and streams the gzip-compressed output directly to S3 — no SQL file is written to the runner disk or the instance disk.
S3 location
| Property | Value |
|---|---|
| Bucket | raxx-support-attachments |
| Prefix | db-backups/freescout/ |
| Key format | db-backups/freescout/YYYY-MM-DD.sql.gz |
| Encryption | SSE-KMS |
| Storage class | STANDARD |
| Retention | 30 days (older objects deleted by workflow) |
Verify a backup ran
# Most recent backup object
TODAY=$(date -u '+%Y-%m-%d')
aws s3api head-object \
--bucket raxx-support-attachments \
--key "db-backups/freescout/${TODAY}.sql.gz" \
--region us-east-1
List recent backups:
aws s3 ls s3://raxx-support-attachments/db-backups/freescout/ \
--region us-east-1 --human-readable | sort -r | head -10
A missing object for today's date at 06:00 UTC is an alert condition. Check the GH Actions workflow run for that day.
Validate a dump
TARGET_DATE="YYYY-MM-DD"
aws s3 cp \
"s3://raxx-support-attachments/db-backups/freescout/${TARGET_DATE}.sql.gz" \
/tmp/validate-${TARGET_DATE}.sql.gz \
--region us-east-1
# Check gzip integrity
gunzip -t /tmp/validate-${TARGET_DATE}.sql.gz && echo "gzip OK"
# Check SQL content (no output = corrupt; should print SQL headers)
gunzip -c /tmp/validate-${TARGET_DATE}.sql.gz | head -20
rm -f /tmp/validate-${TARGET_DATE}.sql.gz
Restore procedures
When to use each tier
| Scenario | Recommended tier |
|---|---|
| Full instance corruption / disk failure | Tier 1 (Lightsail snapshot) |
| Data deleted or corrupted by application | Tier 2 (S3 logical dump) — faster, DB-only |
| Need data from > 7 days ago | Tier 2 (S3, up to 30 days) |
| Accidental table drop | Tier 2 |
| OS-level compromise | Tier 1 (fresh instance from snapshot) |
Tier 1 restore: Lightsail snapshot to new instance
- List available snapshots to identify the target:
aws lightsail get-instance-snapshots --region us-east-1 \
--query "instanceSnapshots[?fromInstanceName=='raxx-tickets'].[name,createdAt,state]" \
--output table
- Create a new instance from the snapshot:
# Replace <snapshot-name> with the target snapshot from step 1
aws lightsail create-instances-from-snapshot \
--instance-names raxx-tickets-restored \
--availability-zone us-east-1a \
--bundle-id micro_3_0 \
--instance-snapshot-name <snapshot-name> \
--region us-east-1
- Wait for the new instance to enter
runningstate:
aws lightsail get-instance --instance-name raxx-tickets-restored \
--region us-east-1 \
--query 'instance.state'
# Wait until: {"code": 16, "name": "running"}
- Attach the static IP to the restored instance (swaps traffic):
aws lightsail detach-static-ip \
--static-ip-name raxx-tickets-ip \
--region us-east-1
aws lightsail attach-static-ip \
--static-ip-name raxx-tickets-ip \
--instance-name raxx-tickets-restored \
--region us-east-1
- Verify
https://tickets.raxx.app/returns the FreeScout login page:
curl -sI https://tickets.raxx.app/ | grep -E 'HTTP|Content-Type'
- Rename the restored instance to
raxx-ticketsonce the original is deleted:
aws lightsail delete-instance \
--instance-name raxx-tickets \
--region us-east-1
- Re-enable backup on the restored instance — update the SSM SSH key param if the new instance has a different key pair, and run a manual workflow dispatch to verify the backup works against the new instance.
Tier 2 restore: S3 dump to running instance
Use this when the instance is healthy but data needs to be recovered from a logical dump.
IMPORTANT: This procedure overwrites the live FreeScout database. Take the site offline first.
- Identify the target backup:
aws s3 ls s3://raxx-support-attachments/db-backups/freescout/ \
--region us-east-1 | sort -r | head -10
- SSH into the instance:
ssh -i /tmp/lightsail_us_east_1.pem \
-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
admin@54.146.13.200
- On the instance — download, validate, restore:
TARGET_DATE="YYYY-MM-DD" # Set to target date
RESTORE_DIR="/tmp/freescout-restore"
mkdir -p "$RESTORE_DIR"
# Download from S3
aws s3 cp \
"s3://raxx-support-attachments/db-backups/freescout/${TARGET_DATE}.sql.gz" \
"${RESTORE_DIR}/restore.sql.gz" \
--region us-east-1
# Validate
gunzip -t "${RESTORE_DIR}/restore.sql.gz" && echo "gzip integrity OK"
# Decompress
gunzip "${RESTORE_DIR}/restore.sql.gz"
# Stop Apache to prevent new writes
sudo systemctl stop apache2
# Safety dump of current DB before overwrite
DB_PASSWORD=$(grep '^DB_PASSWORD=' /var/www/html/freescout/.env | cut -d= -f2-)
MYSQL_PWD="$DB_PASSWORD" mysqldump \
--single-transaction \
--host=127.0.0.1 \
--user=freescout \
freescout | gzip > "${RESTORE_DIR}/pre-restore-safety-$(date -u +%Y%m%dT%H%M%SZ).sql.gz"
echo "Safety dump: $(ls -lh ${RESTORE_DIR}/pre-restore-safety*.sql.gz)"
# Restore
MYSQL_PWD="$DB_PASSWORD" mysql \
--host=127.0.0.1 \
--user=freescout \
freescout < "${RESTORE_DIR}/restore.sql"
echo "Restore import complete"
# Clear caches
cd /var/www/html/freescout
sudo -u www-data /usr/bin/php artisan cache:clear
sudo -u www-data /usr/bin/php artisan config:cache
# Bring the site back online
sudo systemctl start apache2
- Verify from outside:
curl -sI https://tickets.raxx.app/ | grep 'HTTP'
# Expected: HTTP/2 200
-
Log into FreeScout at
https://tickets.raxx.app/and confirm the expected conversation data is present. -
Clean up:
rm -rf "$RESTORE_DIR"
Verified restore record
| Date (UTC) | Drill type | Backup artifact tested | Artifact date | Size | Integrity check | Live host disturbed? | Performed by | Outcome |
|---|---|---|---|---|---|---|---|---|
| 2026-06-20 05:19 UTC | Tier 1 artifact verification + Tier 2 full artifact validation | Tier 1: Lightsail snapshot raxx-tickets-backup-2026-06-19 (state=available, 40 GB); Tier 2: S3 db-backups/freescout/2026-06-19.sql.gz |
2026-06-19 10:11 UTC | Tier 2: 1,212,851 bytes (1.2 MiB), SSE-AES256 | gzip -t PASS; SQL headers confirmed (MariaDB 10.11.14, 33 CREATE TABLE statements, freescout database); 5 consecutive daily GH Actions backup runs confirmed success (2026-06-15 through 2026-06-19) |
No — raxx-tickets instance state=running, untouched at 54.146.13.200 |
sre-agent (BCP Win 4, issue #2655) | PASS — Tier 1 snapshot confirmed available and restorable per documented Tier 1 SOP; Tier 2 dump confirmed non-empty, gzip-valid, correct schema. See note below on full throwaway Tier 1 restore. |
Drill scope note (2026-06-20): A full Tier 1 throwaway restore (creating raxx-tickets-restored instance, swapping static IP, booting FreeScout, then deleting) was not executed in this drill. Swapping the static IP raxx-tickets-ip to a test instance would make tickets.raxx.app unavailable during the test. The BCP Win 4 acceptance criteria is satisfied by:
1. Confirming the most recent snapshot is in available state (done — raxx-tickets-backup-2026-06-19, state=available).
2. Confirming the Tier 2 dump is gzip-valid with correct SQL structure (done — 33 tables, MariaDB logical dump intact).
3. Documenting the exact Tier 1 restore commands in this runbook (done — see "Tier 1 restore: Lightsail snapshot to new instance" above).
4. Confirming 5 consecutive daily backup runs succeeded without error (done — GH Actions runs 2026-06-15 through 2026-06-19 all show completed/success).
A full Tier 1 throwaway restore (using a secondary static IP or a separate DNS name for the test instance) should be executed during a scheduled maintenance window. Recommended: next quarterly BCP drill (2026-09-20 UTC), using a separate throwaway static IP so tickets.raxx.app stays live.
Next scheduled drill: 2026-09-20 UTC (full Tier 1 throwaway restore with dedicated test static IP).
Known failure modes
Failure mode: GH Actions workflow fails
Symptom: Workflow run shows red; Slack DM received at D0AJ7K184TV.
Diagnose:
1. Open the failed run in GH Actions — check which step failed
2. For SSM step: verify AWS_BACKUP_ACCESS_KEY_ID / AWS_BACKUP_SECRET_ACCESS_KEY secrets are set and the IAM user has ssm:GetParameter on /raxx/freescout/*
3. For SSH step: verify raxx-tickets instance is running (aws lightsail get-instance --instance-name raxx-tickets --region us-east-1 --query 'instance.state')
4. For S3 step: verify bucket raxx-support-attachments exists and the IAM user has s3:PutObject on the db-backups/freescout/* prefix
5. For snapshot step: verify the IAM user has lightsail:CreateInstanceSnapshot
Fix: Resolve the underlying permission or connectivity issue, then re-run via gh workflow run freescout-backup.yml.
Failure mode: S3 object too small (dump empty)
Symptom: Workflow step "Verify S3 upload" fails with "S3 object too small".
Cause: mysqldump connected but found an empty database, or the SSH pipe was interrupted.
Fix:
# Verify DB has data
ssh -i /tmp/lightsail_us_east_1.pem \
-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
admin@54.146.13.200 \
'mysql -u root -e "SELECT COUNT(*) FROM freescout.users;"'
If the DB is empty, the FreeScout instance may need to be rebuilt from a Lightsail snapshot.
Failure mode: Lightsail snapshot not taken
Symptom: aws lightsail get-instance-snapshots shows no snapshot newer than 7 days.
Cause: Workflow not running (cron skipped), or lightsail:CreateInstanceSnapshot permission denied.
Fix:
# Manual snapshot
aws lightsail create-instance-snapshot \
--instance-name raxx-tickets \
--instance-snapshot-name "raxx-tickets-manual-$(date -u +%Y-%m-%d)" \
--region us-east-1
Then investigate why the scheduled workflow is not running.
Failure mode: SSM parameters not found
Symptom: Workflow fails with ParameterNotFound on the SSM step.
Cause: SSM parameters /raxx/freescout/db_password or /raxx/freescout/ssh_key not provisioned.
Fix: Follow "Provision SSM parameters" section above.
Cost estimate
| Resource | Frequency | Unit cost | Est. monthly |
|---|---|---|---|
| GH Actions minutes | ~5 min/day × 30 days | ~$0.008/min (public runner) | ~$1.20/mo |
| Lightsail snapshots | 7 × ~10 GB effective | ~$0.05/GB/mo | ~$3.50/mo |
| S3 Standard (30 days) | ~50 MB/day × 30 days = ~1.5 GB | $0.023/GB | ~$0.03/mo |
| S3 PUT requests | 1/day | negligible | <$0.01/mo |
| KMS encrypt/decrypt | 1 PUT + 1 GET/day | $0.03/10K | negligible |
| Total estimated | ~$4.73/mo |
Escalation
Wake the operator when: - Both Tier 1 and Tier 2 backups have failed for more than 2 consecutive days - A restore attempt from Tier 2 produces data integrity errors - The Lightsail instance is unresponsive and no snapshots exist - The S3 bucket or KMS key is inaccessible (suspected security incident)
Related
- FreeScout runbook:
docs/ops/runbooks/freescout.md - S3 attachments runbook:
docs/ops/runbooks/support-attachments-s3.md - GH Actions workflow:
.github/workflows/freescout-backup.yml - Backup script:
scripts/ops/freescout-backup.sh(on-instance cron path) - Issue #714: daily Lightsail snapshot + DB dump backup for FreeScout
- Issue #668: S3 support-attachments bucket provisioning