Raxx · internal docs

internal · gated ↑ index

SOP — FreeScout Lightsail Instance Rebuild

Owner: Operator (Kristerpher) + agent Last updated: 2026-05-04 First execution: 2026-05-04 (rebuild from prevent_destroy blocked plan after SSM secrets migration) Related: #707, #978, #979, #980

When to use this SOP

Run this when you need to destroy + recreate the raxx-tickets Lightsail instance — i.e., any change to aws_lightsail_instance.freescout.user_data that Terraform reports as "must be replaced".

Common triggers: - user_data template input drifted from what was baked at original apply (Postmark token rename, random_password regeneration, etc.) - Lightsail blueprint upgrade - Major FreeScout version migration

user_data on Lightsail is immutable post-launch. AWS only honors changes via destroy + recreate. prevent_destroy = true correctly blocks this; the rebuild requires intentionally flipping it.

Pre-flight (before any merge)

1. Confirm there's nothing to lose

FreeScout is lamp_ls_1_0 blueprint with the database on the same instance. Destroying the instance destroys the MariaDB database. If real customer tickets exist, this SOP is the wrong tool — use freescout-backup-restore.md first to dump the DB, then restore after rebuild.

# Quick check: is there any real data?
ssh -i /tmp/lightsail_us_east_1.pem admin@<freescout-ip> \
  'mysql -u freescout -p"$(cat /root/.freescout_db_pass)" freescout \
   -e "SELECT COUNT(*) AS conv FROM conversations; SELECT COUNT(*) AS users FROM users;"'

If conv > 0 or users > 1, STOP and back up first.

2. Verify the change requires destroy

cd terraform/freescout
terraform plan -out=tfplan -var "cloudflare_zone_id=$ZONE_ID"
# Look for "must be replaced" — if only the instance + public_ports + attachment
# show as replaced, you're in the destroy+recreate path.

If you see destroys on aws_lightsail_static_ip or cloudflare_record.freescout_a, STOP — the static IP must survive (DNS depends on it).

3. Take the pre-rebuild snapshot

SNAP_NAME="raxx-tickets-pre-rebuild-$(date -u +%Y%m%d%H%M%S)"
aws lightsail create-instance-snapshot \
  --instance-name raxx-tickets \
  --instance-snapshot-name "$SNAP_NAME" \
  --region us-east-1
# Wait until: aws lightsail get-instance-snapshots --region us-east-1 ... .state == 'available'

Snapshot durably persists; rolling back to it later restores the full instance state.

The rebuild cycle

Step 1 — Land the rebuild-window PR

The rebuild-window PR flips prevent_destroy = true → false on aws_lightsail_instance.freescout in terraform/freescout/main.tf. Reference: #979.

Step 2 — Source env vars + plan

cd /path/to/repo/terraform/freescout

# Auth to vault (must include User-Agent header to clear CF WAF — known gotcha)
TOKEN=$(curl -fsSL -X POST "$INFISICAL_HOST/api/v1/auth/universal-auth/login" \
  -H "Content-Type: application/json" \
  -H "User-Agent: raxx-ops/1.0" \
  -H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" \
  -H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \
  -d "{\"clientId\":\"$INFISICAL_CLIENT_ID\",\"clientSecret\":\"$INFISICAL_CLIENT_SECRET\"}" \
  | jq -r '.accessToken')

# fetch helper
fetch() { curl -fsSL \
  -H "Authorization: Bearer $TOKEN" -H "User-Agent: raxx-ops/1.0" \
  -H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" -H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \
  "$INFISICAL_HOST/api/v3/secrets/raw/$1?workspaceId=${INFISICAL_PROJECT_ID}&environment=prod&secretPath=$2" \
  | jq -r '.secret.secretValue'; }

# Required env
export CLOUDFLARE_API_TOKEN=$(fetch CLOUDFLARE_EDIT_DNS %2FMooseQuest%2Fcloudflare)
export TF_VAR_cloudflare_zone_id=$(fetch CLOUDFLARE_ZONE_ID_RAXX_APP %2FMooseQuest%2Fcloudflare)
# 10 license keys
for slug_var in "API_WEBHOOKS_LICENSE_KEY:license_api_webhooks" "REPORTS_LICENSE_KEY:license_reports" \
                "OAUTH_LICENSE_KEY:license_oauth" "CUSTOM_FIELDS_LICENSE_KEY:license_custom_fields" \
                "TAGS_LICENSE_KEY:license_tags" "WORKFLOWS_LICENSE_KEY:license_workflows" \
                "CUSTOMIZATION_LICENSE_KEY:license_customization" "SAVED_REPLIES_LICENSE_KEY:license_saved_replies" \
                "SLACK_LICENSE_KEY:license_slack" "CUSTOM_FOLDERS_LICENSE_KEY:license_custom_folders"; do
  vault_name="${slug_var%:*}"
  tf_var="${slug_var#*:}"
  export "TF_VAR_${tf_var}=$(fetch $vault_name %2FMooseQuest%2Ffreescout)"
done
export TF_VAR_ssh_user=admin
export TF_VAR_ssh_private_key_path=/tmp/lightsail_us_east_1.pem

# CRITICAL: -var override is required because terraform.tfvars contains a
# placeholder for cloudflare_zone_id ("REPLACE_WITH_TF_VAR_..."). tfvars
# OVERRIDES TF_VAR_* env vars (TF precedence rule), so we must use CLI -var.
terraform plan -out=tfplan -input=false \
  -var "cloudflare_zone_id=$TF_VAR_cloudflare_zone_id"

Expected plan output: - aws_lightsail_instance.freescout — must be replaced - aws_lightsail_instance_public_ports.freescout — must be replaced (depends on instance) - aws_lightsail_static_ip_attachment.freescout — will be created (re-attach to new instance) - 3× aws_ssm_parameter.*_password — will be created - 13× null_resource.freescout_* — will be created (1 pre-snap + 10 modules + 1 cache + 1 post-snap)

If you see destroys beyond instance + ports, STOP.

Step 3 — Apply

terraform apply -input=false tfplan

Expected timeline: - 0-90s: instance destroyed, recreated - 90s-3min: cloud-init runs LAMP base init - 3-8min: cloud-init runs FreeScout user_data (apt install, composer, FreeScout clone, migrate, vhost config) — log at /var/log/freescout-bootstrap.log - 8-12min: 10 module installs run sequentially via SSH null_resource provisioners (workflows depends on tags + custom_fields)

Step 4 — Verify

# 1. Cloud-init completed cleanly
ssh -i /tmp/lightsail_us_east_1.pem admin@<ip> \
  'sudo cloud-init status --long'
# Expected: "status: done"

# 2. FreeScout responds
curl -I https://tickets.raxx.app
# Expected: 200 or 302 (login redirect)

# 3. All 10 modules active
ssh -i /tmp/lightsail_us_east_1.pem admin@<ip> \
  'cd /var/www/html/freescout && sudo -u www-data php artisan module:list | grep -E "Active|Inactive"'
# Expected: 10 rows showing Active

Step 5 — Restore prevent_destroy

File a small follow-up PR that flips prevent_destroy = false → true. After merge:

terraform plan
# Expected: "No changes. Your infrastructure matches the configuration."

The guard is back in place.

Known gotchas

CF WAF returns HTTP 403 (error code 1010) without a User-Agent header

vault.raxx.app is fronted by Cloudflare. CF Access service-token auth gets you past the IDP gate, but the WAF still fingerprints requests — Python-urllib/* and empty UA both get 1010. Always include -H "User-Agent: raxx-ops/1.0" (or any non-empty UA) in vault REST calls.

`terraform.tfvars` placeholder OVERRIDES `TF_VAR_*` env vars

Terraform variable precedence: env vars < tfvars < CLI -var. The README + Makefile pattern of export TF_VAR_cloudflare_zone_id=$(...) doesn't actually take effect because terraform.tfvars has cloudflare_zone_id = "REPLACE_WITH_TF_VAR_..." which wins. Use CLI -var "cloudflare_zone_id=..." to override.

Long-term fix: remove the placeholder line from terraform.tfvars (followup card).

Lightsail `lamp_ls_1_0` runs user_data under dash, not bash

Regardless of the shebang. POSIX sh only: - set -eu (no pipefail) - [ ... ] not [[ ... ]] - No process substitution >(cmd)

cloud-init's `scripts-user` has no HOME — composer rejects

Fixed in PR #980. The bootstrap now exports HOME=/root and COMPOSER_HOME=/root/.composer before any composer call. If you see the bootstrap log end with "The HOME or COMPOSER_HOME environment variable must be set...", that fix needs to be in main.

Lightsail static IP detaches when its target instance is destroyed

Terraform recreates the aws_lightsail_static_ip_attachment automatically. There's a brief window (~30s) where the new instance has a transient public IP (e.g., 100.48.x.x) before the static IP re-attaches.

The CF DNS record points at aws_lightsail_static_ip.freescout.ip_address (the static IP, not the instance), so DNS stays valid throughout. If the DNS record shows up in the destroy plan, something else is wrong (likely a TF_VAR not being set — see "tfvars overrides" above).

Host key changes — clean known_hosts

The new instance gets a new SSH host key. Your local known_hosts may have the old key for the static IP, causing host-key-verification failures.

ssh-keygen -R <static-ip>
# Then ssh with -o StrictHostKeyChecking=accept-new on first connect

Apply may need to be re-run if SSH provisioners fire before sshd is fully up

If you see ssh: handshake failed: ... connection reset by peer on the module installs, the SSH daemon was probably mid-restart during cloud-init. Wait 60s and re-run terraform apply — the failed null_resource blocks become tainted and retry.

Fast-restore from a post-modules-installed snapshot

Tagras paid modules cannot be reinstalled via CLI (server-to-server signed one-shot download — see freescout-paid-module-install.md). The fastest path back to a fully-modular FreeScout is to restore from a Lightsail snapshot taken AFTER the 10 modules were installed + activated.

Canonical post-install snapshot (2026-05-04): raxx-tickets-modules-installed-20260503231628 — captures FreeScout core + all 10 paid modules in Enabled status. Take a fresh post-install snapshot after each major core/module update; keep the latest one as the canonical restore target.

# Verify the snapshot is available
aws lightsail get-instance-snapshots --region us-east-1 \
  --query 'instanceSnapshots[?starts_with(name, `raxx-tickets-modules-installed`)].{name:name, state:state, sizeGB:sizeInGb, createdAt:createdAt}' \
  --output table

# Create new instance from snapshot
SNAP="raxx-tickets-modules-installed-20260503231628"
aws lightsail create-instances-from-snapshot \
  --instance-names raxx-tickets \
  --availability-zone us-east-1a \
  --bundle-id micro_3_0 \
  --instance-snapshot-name "$SNAP" \
  --region us-east-1

# Re-attach static IP
aws lightsail attach-static-ip \
  --static-ip-name raxx-tickets-ip \
  --instance-name raxx-tickets \
  --region us-east-1

License domain caveat: Tagras licenses are bound to the original APP_URL (https://tickets.raxx.app). Restoring to the same domain preserves activation slots. Restoring to a different domain (e.g., tickets-staging.raxx.app) burns one of the license's domain activations. Don't restore to a non-canonical domain casually.

Rollback

If apply completes but FreeScout doesn't respond:

# 1. Restore from the pre-rebuild snapshot
aws lightsail create-instances-from-snapshot \
  --instance-names raxx-tickets-rollback \
  --availability-zone us-east-1a \
  --bundle-id micro_3_0 \
  --instance-snapshot-name "$SNAP_NAME" \
  --region us-east-1

# 2. Re-attach the static IP
aws lightsail attach-static-ip \
  --static-ip-name raxx-tickets-ip \
  --instance-name raxx-tickets-rollback \
  --region us-east-1

# 3. Update Terraform state to point at the new resource (terraform import)
# 4. Investigate root cause before next attempt

The destroyed-instance state is unrecoverable except via the snapshot. Don't skip Pre-flight Step 3.

Postmortem template

Use after every rebuild incident:

### FreeScout Lightsail rebuild — <UTC timestamp>

- Trigger: <user_data drift / blueprint upgrade / ...>
- Plan summary: <N add, N change, N destroy>
- Apply result: <succeeded | partial | rolled back>
- Time to apply (start → modules active): <minutes>
- Snapshot ID used: <raxx-tickets-pre-rebuild-...>
- Root cause (if rebuild was unplanned): <vault key rename / template bug / ...>
- Followups: <issue numbers>

Save postmortems at docs/ops/postmortems/freescout-rebuild-<YYYY-MM-DD>.md.

Auto-generated from docs/ in raxx-app/TradeMasterAPI. Gated behind Cloudflare Access. Re-deployed on every push to main.