Raxx · internal docs

internal · gated ↑ index

Production rollback runbook

System: raxx-api-prod · raxx-api-staging · raxx-console-prod · raxx-console-staging · raxx-velvet- Owner: operator Last incident: n/a (initial authoring — #98) Last reviewed: 2026-05-14 UTC Related:* docs/ops/runbooks/heroku.md · docs/ops/runbooks/deploy-freeze.md · docs/ops/runbooks/migration-gate.md


When to use this runbook

Use this runbook when a production release must be reverted immediately. Roll back when:

Roll forward instead of rolling back when:

Expected wall-clock from decision to verified recovery: under 5 minutes.


Pre-rollback checks

  1. Confirm the release is the cause. Check Sentry or heroku logs --tail -a <app> for a traceback that references code or config from the current release, not a dependency.

  2. Check dyno health before rolling back:

bash heroku ps -a raxx-api-prod

If dynos are in a crash loop, rollback is likely correct. If dynos are up but serving errors, confirm the request path before acting.

  1. Identify the known-good release:

bash heroku releases -a raxx-api-prod

Look for the last Deploy <sha> entry before the bad one. Config-var-only releases (e.g., Set STRIPE_API_KEY config vars) do not change the slug; rolling back past them reverts the config vars too — note this before proceeding.

  1. Check for a DB migration in the bad release. If heroku releases -a <app> shows a Deploy entry and that deploy included a migration, see DB migration caveat before executing rollback.

  2. Verify you are targeting the right app. The app names follow raxx-<service>-<env>:

App URL
raxx-api-prod https://raxx-api-prod-a60a19e5efbf.herokuapp.com
raxx-api-staging https://raxx-api-staging-1a19fb3873b9.herokuapp.com
raxx-console-prod https://console.raxx.app
raxx-console-staging https://console-staging.raxx.app

Rollback procedure — Heroku release rollback

This is the canonical path for slug-based (git-push) apps: raxx-api-* and raxx-console-*.

Step 1 — Identify the target release

heroku releases -a raxx-api-prod

Note the version number of the known-good release (e.g., v84). The current broken release is the current version (e.g., v85).

Step 2 — Execute rollback

heroku rollback v84 -a raxx-api-prod

Heroku creates a new release (e.g., v86: Rollback to v84) and immediately routes traffic to that slug. The command completes in seconds; dyno restart takes 10–30 seconds.

Step 3 — Verify recovery

# Confirm the new release appears at the top of the release list
heroku releases -a raxx-api-prod --num 3
# Expected: top row reads "Rollback to v84"

# Confirm dynos are up
heroku ps -a raxx-api-prod
# Expected: web.1: up

# Smoke-check the health endpoint
# Note: direct Heroku URLs return 403 when FLAG_ENFORCE_CF_ORIGIN is on.
# Use the CF-fronted URL instead:
curl -sf -o /dev/null -w "%{http_code}" https://api.raxx.app/api/system/status
# Expected: 200

For console:

heroku releases -a raxx-console-prod --num 3
heroku ps -a raxx-console-prod
curl -sf -o /dev/null -w "%{http_code}" https://console.raxx.app/health

Rollback procedure — tagged-image redeploy (container apps)

raxx-velvet-* and any future service deployed via heroku container: use the container stack, not the git slug stack. The heroku rollback command still works for these apps (it flips the release pointer), but if the prior release's image has been garbage-collected or you need to re-pin to a specific image tag, use this path.

Step 1 — Identify the known-good image

heroku releases -a raxx-velvet-prod --num 10

Find the last Deploy entry with a known-good commit SHA. Cross-reference against the GitHub Container Registry (GHCR) or your CI artifact log to find the corresponding image tag (e.g., sha256:<digest> or a semantic tag like main-<sha>).

Step 2 — Pull and re-release the prior image

# Pull the known-good image to your local Docker daemon
docker pull ghcr.io/raxx-app/trademasterapi/velvet:<prior-tag>

# Re-tag as latest for the push
docker tag ghcr.io/raxx-app/trademasterapi/velvet:<prior-tag> \
  registry.heroku.com/raxx-velvet-prod/web

# Push to Heroku registry
docker push registry.heroku.com/raxx-velvet-prod/web

# Release the image
heroku container:release web -a raxx-velvet-prod

Step 3 — Verify

heroku releases -a raxx-velvet-prod --num 3
heroku ps -a raxx-velvet-prod

Note: If heroku rollback v<N> succeeds for a container app (the image is still available in the Heroku slug cache), prefer that path — it is faster and does not require local Docker access.


DB migration caveat

Forward-only migrations make rollback partial. If the bad release ran a migration that added a column, table, or index:

Migration reviews must reject DROP COLUMN, DROP TABLE, and destructive ALTER statements on the rollback path — these are non-reversible and break rollback entirely. See docs/ops/runbooks/migration-gate.md for the gate checklist.

For v1.0, DB migrations are forward-only by policy. If a migration must be reversed, file it as a separate forward migration (re-add the removed column as nullable, etc.) rather than attempting a true rollback.


Comms template

User-facing incident note (brief, plain language)

We are investigating an issue affecting [surface, e.g., the trading platform]. Our team is on it and we will post an update within 15 minutes. No account data has been affected.

We have rolled back to the previous release. The platform is recovering. We will confirm full recovery shortly.

The platform has recovered. Thank you for your patience. We are conducting a post-incident review.

Internal Slack (operator DM — D0AJ7K184TV)

Incident open: [app] [brief symptom]. Investigating. Started: [HH:MM UTC]

Rolling back [app] from v[N] to v[N-1]. Initiated at [HH:MM UTC].

Rollback confirmed: v[N+1] (Rollback to v[N-1]) is live, dynos up, smoke check passing. [HH:MM UTC]. Wall-clock: [X] min.

Incident closed. Post-incident review: [link or TBD].

Post to the daily digest (not a separate per-event ping) for pre-launch incidents unless the incident runs into the next day or affects a live customer flow.


Drill record

Date (UTC) App Operator Bad release Rolled to Wall-clock (min) Outcome
2026-05-14 21:24:25 UTC raxx-api-staging raxx-dev-bot (agent, #98) v431 v430 <1 Success — v432 appeared, dyno up, rolled forward to v433

Add a row each time this runbook is executed in staging or production.