Owner: Operator (Kristerpher) + agent
Last updated: 2026-05-04
First execution: 2026-05-04 (rebuild from prevent_destroy blocked plan after SSM secrets migration)
Related: #707, #978, #979, #980
Run this when you need to destroy + recreate the raxx-tickets Lightsail instance — i.e., any change to aws_lightsail_instance.freescout.user_data that Terraform reports as "must be replaced".
Common triggers:
- user_data template input drifted from what was baked at original apply (Postmark token rename, random_password regeneration, etc.)
- Lightsail blueprint upgrade
- Major FreeScout version migration
user_data on Lightsail is immutable post-launch. AWS only honors changes via destroy + recreate. prevent_destroy = true correctly blocks this; the rebuild requires intentionally flipping it.
FreeScout is lamp_ls_1_0 blueprint with the database on the same instance. Destroying the instance destroys the MariaDB database. If real customer tickets exist, this SOP is the wrong tool — use freescout-backup-restore.md first to dump the DB, then restore after rebuild.
# Quick check: is there any real data?
ssh -i /tmp/lightsail_us_east_1.pem admin@<freescout-ip> \
'mysql -u freescout -p"$(cat /root/.freescout_db_pass)" freescout \
-e "SELECT COUNT(*) AS conv FROM conversations; SELECT COUNT(*) AS users FROM users;"'
If conv > 0 or users > 1, STOP and back up first.
cd terraform/freescout
terraform plan -out=tfplan -var "cloudflare_zone_id=$ZONE_ID"
# Look for "must be replaced" — if only the instance + public_ports + attachment
# show as replaced, you're in the destroy+recreate path.
If you see destroys on aws_lightsail_static_ip or cloudflare_record.freescout_a, STOP — the static IP must survive (DNS depends on it).
SNAP_NAME="raxx-tickets-pre-rebuild-$(date -u +%Y%m%d%H%M%S)"
aws lightsail create-instance-snapshot \
--instance-name raxx-tickets \
--instance-snapshot-name "$SNAP_NAME" \
--region us-east-1
# Wait until: aws lightsail get-instance-snapshots --region us-east-1 ... .state == 'available'
Snapshot durably persists; rolling back to it later restores the full instance state.
The rebuild-window PR flips prevent_destroy = true → false on aws_lightsail_instance.freescout in terraform/freescout/main.tf. Reference: #979.
cd /path/to/repo/terraform/freescout
# Auth to vault (must include User-Agent header to clear CF WAF — known gotcha)
TOKEN=$(curl -fsSL -X POST "$INFISICAL_HOST/api/v1/auth/universal-auth/login" \
-H "Content-Type: application/json" \
-H "User-Agent: raxx-ops/1.0" \
-H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" \
-H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \
-d "{\"clientId\":\"$INFISICAL_CLIENT_ID\",\"clientSecret\":\"$INFISICAL_CLIENT_SECRET\"}" \
| jq -r '.accessToken')
# fetch helper
fetch() { curl -fsSL \
-H "Authorization: Bearer $TOKEN" -H "User-Agent: raxx-ops/1.0" \
-H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" -H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \
"$INFISICAL_HOST/api/v3/secrets/raw/$1?workspaceId=${INFISICAL_PROJECT_ID}&environment=prod&secretPath=$2" \
| jq -r '.secret.secretValue'; }
# Required env
export CLOUDFLARE_API_TOKEN=$(fetch CLOUDFLARE_EDIT_DNS %2FMooseQuest%2Fcloudflare)
export TF_VAR_cloudflare_zone_id=$(fetch CLOUDFLARE_ZONE_ID_RAXX_APP %2FMooseQuest%2Fcloudflare)
# 10 license keys
for slug_var in "API_WEBHOOKS_LICENSE_KEY:license_api_webhooks" "REPORTS_LICENSE_KEY:license_reports" \
"OAUTH_LICENSE_KEY:license_oauth" "CUSTOM_FIELDS_LICENSE_KEY:license_custom_fields" \
"TAGS_LICENSE_KEY:license_tags" "WORKFLOWS_LICENSE_KEY:license_workflows" \
"CUSTOMIZATION_LICENSE_KEY:license_customization" "SAVED_REPLIES_LICENSE_KEY:license_saved_replies" \
"SLACK_LICENSE_KEY:license_slack" "CUSTOM_FOLDERS_LICENSE_KEY:license_custom_folders"; do
vault_name="${slug_var%:*}"
tf_var="${slug_var#*:}"
export "TF_VAR_${tf_var}=$(fetch $vault_name %2FMooseQuest%2Ffreescout)"
done
export TF_VAR_ssh_user=admin
export TF_VAR_ssh_private_key_path=/tmp/lightsail_us_east_1.pem
# CRITICAL: -var override is required because terraform.tfvars contains a
# placeholder for cloudflare_zone_id ("REPLACE_WITH_TF_VAR_..."). tfvars
# OVERRIDES TF_VAR_* env vars (TF precedence rule), so we must use CLI -var.
terraform plan -out=tfplan -input=false \
-var "cloudflare_zone_id=$TF_VAR_cloudflare_zone_id"
Expected plan output:
- aws_lightsail_instance.freescout — must be replaced
- aws_lightsail_instance_public_ports.freescout — must be replaced (depends on instance)
- aws_lightsail_static_ip_attachment.freescout — will be created (re-attach to new instance)
- 3× aws_ssm_parameter.*_password — will be created
- 13× null_resource.freescout_* — will be created (1 pre-snap + 10 modules + 1 cache + 1 post-snap)
If you see destroys beyond instance + ports, STOP.
terraform apply -input=false tfplan
Expected timeline:
- 0-90s: instance destroyed, recreated
- 90s-3min: cloud-init runs LAMP base init
- 3-8min: cloud-init runs FreeScout user_data (apt install, composer, FreeScout clone, migrate, vhost config) — log at /var/log/freescout-bootstrap.log
- 8-12min: 10 module installs run sequentially via SSH null_resource provisioners (workflows depends on tags + custom_fields)
# 1. Cloud-init completed cleanly
ssh -i /tmp/lightsail_us_east_1.pem admin@<ip> \
'sudo cloud-init status --long'
# Expected: "status: done"
# 2. FreeScout responds
curl -I https://tickets.raxx.app
# Expected: 200 or 302 (login redirect)
# 3. All 10 modules active
ssh -i /tmp/lightsail_us_east_1.pem admin@<ip> \
'cd /var/www/html/freescout && sudo -u www-data php artisan module:list | grep -E "Active|Inactive"'
# Expected: 10 rows showing Active
File a small follow-up PR that flips prevent_destroy = false → true. After merge:
terraform plan
# Expected: "No changes. Your infrastructure matches the configuration."
The guard is back in place.
vault.raxx.app is fronted by Cloudflare. CF Access service-token auth gets you past the IDP gate, but the WAF still fingerprints requests — Python-urllib/* and empty UA both get 1010. Always include -H "User-Agent: raxx-ops/1.0" (or any non-empty UA) in vault REST calls.
terraform.tfvars placeholder OVERRIDES TF_VAR_* env varsTerraform variable precedence: env vars < tfvars < CLI -var. The README + Makefile pattern of export TF_VAR_cloudflare_zone_id=$(...) doesn't actually take effect because terraform.tfvars has cloudflare_zone_id = "REPLACE_WITH_TF_VAR_..." which wins. Use CLI -var "cloudflare_zone_id=..." to override.
Long-term fix: remove the placeholder line from terraform.tfvars (followup card).
lamp_ls_1_0 runs user_data under dash, not bashRegardless of the shebang. POSIX sh only:
- set -eu (no pipefail)
- [ ... ] not [[ ... ]]
- No process substitution >(cmd)
scripts-user has no HOME — composer rejectsFixed in PR #980. The bootstrap now exports HOME=/root and COMPOSER_HOME=/root/.composer before any composer call. If you see the bootstrap log end with "The HOME or COMPOSER_HOME environment variable must be set...", that fix needs to be in main.
Terraform recreates the aws_lightsail_static_ip_attachment automatically. There's a brief window (~30s) where the new instance has a transient public IP (e.g., 100.48.x.x) before the static IP re-attaches.
The CF DNS record points at aws_lightsail_static_ip.freescout.ip_address (the static IP, not the instance), so DNS stays valid throughout. If the DNS record shows up in the destroy plan, something else is wrong (likely a TF_VAR not being set — see "tfvars overrides" above).
The new instance gets a new SSH host key. Your local known_hosts may have the old key for the static IP, causing host-key-verification failures.
ssh-keygen -R <static-ip>
# Then ssh with -o StrictHostKeyChecking=accept-new on first connect
If you see ssh: handshake failed: ... connection reset by peer on the module installs, the SSH daemon was probably mid-restart during cloud-init. Wait 60s and re-run terraform apply — the failed null_resource blocks become tainted and retry.
Tagras paid modules cannot be reinstalled via CLI (server-to-server signed one-shot download — see freescout-paid-module-install.md). The fastest path back to a fully-modular FreeScout is to restore from a Lightsail snapshot taken AFTER the 10 modules were installed + activated.
Canonical post-install snapshot (2026-05-04): raxx-tickets-modules-installed-20260503231628 — captures FreeScout core + all 10 paid modules in Enabled status. Take a fresh post-install snapshot after each major core/module update; keep the latest one as the canonical restore target.
# Verify the snapshot is available
aws lightsail get-instance-snapshots --region us-east-1 \
--query 'instanceSnapshots[?starts_with(name, `raxx-tickets-modules-installed`)].{name:name, state:state, sizeGB:sizeInGb, createdAt:createdAt}' \
--output table
# Create new instance from snapshot
SNAP="raxx-tickets-modules-installed-20260503231628"
aws lightsail create-instances-from-snapshot \
--instance-names raxx-tickets \
--availability-zone us-east-1a \
--bundle-id micro_3_0 \
--instance-snapshot-name "$SNAP" \
--region us-east-1
# Re-attach static IP
aws lightsail attach-static-ip \
--static-ip-name raxx-tickets-ip \
--instance-name raxx-tickets \
--region us-east-1
License domain caveat: Tagras licenses are bound to the original APP_URL (https://tickets.raxx.app). Restoring to the same domain preserves activation slots. Restoring to a different domain (e.g., tickets-staging.raxx.app) burns one of the license's domain activations. Don't restore to a non-canonical domain casually.
If apply completes but FreeScout doesn't respond:
# 1. Restore from the pre-rebuild snapshot
aws lightsail create-instances-from-snapshot \
--instance-names raxx-tickets-rollback \
--availability-zone us-east-1a \
--bundle-id micro_3_0 \
--instance-snapshot-name "$SNAP_NAME" \
--region us-east-1
# 2. Re-attach the static IP
aws lightsail attach-static-ip \
--static-ip-name raxx-tickets-ip \
--instance-name raxx-tickets-rollback \
--region us-east-1
# 3. Update Terraform state to point at the new resource (terraform import)
# 4. Investigate root cause before next attempt
The destroyed-instance state is unrecoverable except via the snapshot. Don't skip Pre-flight Step 3.
Use after every rebuild incident:
### FreeScout Lightsail rebuild — <UTC timestamp>
- Trigger: <user_data drift / blueprint upgrade / ...>
- Plan summary: <N add, N change, N destroy>
- Apply result: <succeeded | partial | rolled back>
- Time to apply (start → modules active): <minutes>
- Snapshot ID used: <raxx-tickets-pre-rebuild-...>
- Root cause (if rebuild was unplanned): <vault key rename / template bug / ...>
- Followups: <issue numbers>
Save postmortems at docs/ops/postmortems/freescout-rebuild-<YYYY-MM-DD>.md.