System: raxx-api-prod · raxx-api-staging · raxx-console-prod · raxx-console-staging · raxx-velvet-
Owner: operator
Last incident: n/a (initial authoring — #98)
Last reviewed: 2026-05-14 UTC
Related:* docs/ops/runbooks/heroku.md · docs/ops/runbooks/deploy-freeze.md · docs/ops/runbooks/migration-gate.md
Use this runbook when a production release must be reverted immediately. Roll back when:
Roll forward instead of rolling back when:
Expected wall-clock from decision to verified recovery: under 5 minutes.
Confirm the release is the cause. Check Sentry or heroku logs --tail -a <app> for a traceback that references code or config from the current release, not a dependency.
Check dyno health before rolling back:
bash
heroku ps -a raxx-api-prod
If dynos are in a crash loop, rollback is likely correct. If dynos are up but serving errors, confirm the request path before acting.
bash
heroku releases -a raxx-api-prod
Look for the last Deploy <sha> entry before the bad one. Config-var-only releases (e.g., Set STRIPE_API_KEY config vars) do not change the slug; rolling back past them reverts the config vars too — note this before proceeding.
Check for a DB migration in the bad release. If heroku releases -a <app> shows a Deploy entry and that deploy included a migration, see DB migration caveat before executing rollback.
Verify you are targeting the right app. The app names follow raxx-<service>-<env>:
| App | URL |
|---|---|
raxx-api-prod |
https://raxx-api-prod-a60a19e5efbf.herokuapp.com |
raxx-api-staging |
https://raxx-api-staging-1a19fb3873b9.herokuapp.com |
raxx-console-prod |
https://console.raxx.app |
raxx-console-staging |
https://console-staging.raxx.app |
This is the canonical path for slug-based (git-push) apps: raxx-api-* and raxx-console-*.
heroku releases -a raxx-api-prod
Note the version number of the known-good release (e.g., v84). The current broken release is the current version (e.g., v85).
heroku rollback v84 -a raxx-api-prod
Heroku creates a new release (e.g., v86: Rollback to v84) and immediately routes traffic to that slug. The command completes in seconds; dyno restart takes 10–30 seconds.
# Confirm the new release appears at the top of the release list
heroku releases -a raxx-api-prod --num 3
# Expected: top row reads "Rollback to v84"
# Confirm dynos are up
heroku ps -a raxx-api-prod
# Expected: web.1: up
# Smoke-check the health endpoint
# Note: direct Heroku URLs return 403 when FLAG_ENFORCE_CF_ORIGIN is on.
# Use the CF-fronted URL instead:
curl -sf -o /dev/null -w "%{http_code}" https://api.raxx.app/api/system/status
# Expected: 200
For console:
heroku releases -a raxx-console-prod --num 3
heroku ps -a raxx-console-prod
curl -sf -o /dev/null -w "%{http_code}" https://console.raxx.app/health
raxx-velvet-* and any future service deployed via heroku container: use the container stack, not the git slug stack. The heroku rollback command still works for these apps (it flips the release pointer), but if the prior release's image has been garbage-collected or you need to re-pin to a specific image tag, use this path.
heroku releases -a raxx-velvet-prod --num 10
Find the last Deploy entry with a known-good commit SHA. Cross-reference against the GitHub Container Registry (GHCR) or your CI artifact log to find the corresponding image tag (e.g., sha256:<digest> or a semantic tag like main-<sha>).
# Pull the known-good image to your local Docker daemon
docker pull ghcr.io/raxx-app/trademasterapi/velvet:<prior-tag>
# Re-tag as latest for the push
docker tag ghcr.io/raxx-app/trademasterapi/velvet:<prior-tag> \
registry.heroku.com/raxx-velvet-prod/web
# Push to Heroku registry
docker push registry.heroku.com/raxx-velvet-prod/web
# Release the image
heroku container:release web -a raxx-velvet-prod
heroku releases -a raxx-velvet-prod --num 3
heroku ps -a raxx-velvet-prod
Note: If heroku rollback v<N> succeeds for a container app (the image is still available in the Heroku slug cache), prefer that path — it is faster and does not require local Docker access.
Forward-only migrations make rollback partial. If the bad release ran a migration that added a column, table, or index:
Migration reviews must reject DROP COLUMN, DROP TABLE, and destructive ALTER statements on the rollback path — these are non-reversible and break rollback entirely. See docs/ops/runbooks/migration-gate.md for the gate checklist.
For v1.0, DB migrations are forward-only by policy. If a migration must be reversed, file it as a separate forward migration (re-add the removed column as nullable, etc.) rather than attempting a true rollback.
We are investigating an issue affecting [surface, e.g., the trading platform]. Our team is on it and we will post an update within 15 minutes. No account data has been affected.
We have rolled back to the previous release. The platform is recovering. We will confirm full recovery shortly.
The platform has recovered. Thank you for your patience. We are conducting a post-incident review.
D0AJ7K184TV)Incident open: [app] [brief symptom]. Investigating. Started: [HH:MM UTC]
Rolling back [app] from v[N] to v[N-1]. Initiated at [HH:MM UTC].
Rollback confirmed: v[N+1] (Rollback to v[N-1]) is live, dynos up, smoke check passing. [HH:MM UTC]. Wall-clock: [X] min.
Incident closed. Post-incident review: [link or TBD].
Post to the daily digest (not a separate per-event ping) for pre-launch incidents unless the incident runs into the next day or affects a live customer flow.
| Date (UTC) | App | Operator | Bad release | Rolled to | Wall-clock (min) | Outcome |
|---|---|---|---|---|---|---|
| 2026-05-14 21:24:25 UTC | raxx-api-staging | raxx-dev-bot (agent, #98) | v431 | v430 | <1 | Success — v432 appeared, dyno up, rolled forward to v433 |
Add a row each time this runbook is executed in staging or production.