Raxx · internal docs

internal · gated ↑ index

RCA — raxx-api-prod: 0 dynos, no slug, never deployed

Incident ID: 2026-04-30-raxx-api-prod-down Date: 2026-04-30 Severity: SEV-1 Duration: 8 days latent (app created 2026-04-22, first detected 2026-04-30); active investigation 2026-04-30 ~16:30–17:30 UTC Blast radius: raxx-api-prod (Raptor production backend) serving HTTP 502 on all requests. Pre-launch, so 0 external users affected. Internal systems that depend on the production API URL (console.raxx.app prod surface, Sentry DSN, status worker) could not reach the health endpoint. Author: sre-agent

Summary

raxx-api-prod was provisioned on 2026-04-22 with config vars but no code was ever deployed to it. All 7 Heroku releases were config-var-only (slug null on every release). Two CI/CD workflows exist that are supposed to cover the production deploy path, but both have structural gaps that prevented any production deploy from occurring in the 8 days since the app was created. The deploy.yml staging path has been failing on every push to main due to a missing dontautocreate: true flag; its production path has never been triggered because no v* tag has been pushed. The deploy-heroku.yml production path requires a workflow_dispatch that has never been triggered, and the required production GitHub Environment does not exist. Remediation requires: (1) operator creates the production GitHub Environment, (2) operator triggers workflow_dispatch on deploy-heroku.yml with environment=production, and (3) a PR to fix the missing dontautocreate: true in deploy.yml.

Timeline (all times UTC)

Impact

What went well

What didn't go well

Root cause analysis

Detection

Resolution

Not yet complete — pending operator actions below.

The code and configuration to deploy are ready. The blockers are two GitHub administrative actions that require operator (Kristerpher) execution:

Operator action 1 — Create production GitHub Environment

In the GitHub repo settings under Environments, create a new environment named exactly production. Configure: - Required reviewers: Kristerpher (minimum 1) - Wait timer: 0 min (can increase after first successful deploy) - Deployment branches: main only (restrict to protected branch)

URL: https://github.com/MooseQuest/TradeMasterAPI/settings/environments/new

Operator action 2 — Trigger first production deploy via deploy-heroku.yml

After the production environment exists: 1. Go to: https://github.com/MooseQuest/TradeMasterAPI/actions/workflows/deploy-heroku.yml 2. Click "Run workflow" 3. Set environment = production 4. Set ref = main 5. Click "Run workflow" 6. Approve the deployment when the required-reviewer gate appears 7. Monitor the run — the smoke gate will run first, then the deploy job, then the health check 8. Verify: curl -s https://raxx-api-prod-a60a19e5efbf.herokuapp.com/health returns HTTP 200

Code fix — deploy.yml missing dontautocreate

PR filed (see action items) to add dontautocreate: true to the deploy-staging and deploy-prod jobs in deploy.yml. This is a low-risk one-line addition that matches the pattern already in deploy-heroku.yml.

Validation

Action items

# Action Owner Due Issue
1 Create production GitHub Environment with required-reviewer protection Kristerpher 2026-04-30 #690
2 Trigger workflow_dispatch on deploy-heroku.yml with environment=production, ref=main Kristerpher 2026-04-30 #690
3 Fix missing dontautocreate: true in deploy.yml staging + prod jobs sre-agent (PR) 2026-05-01 #691
4 Add nightly monitor: Heroku API check that raxx-api-prod latest release has non-null slug sre-agent 2026-05-07 #692
5 Add Heroku dyno-count monitor: alert SEV-2 if raxx-api-prod has 0 running dynos for >5 min sre-agent 2026-05-07 #692
6 Deprecate or align deploy.yml vs deploy-heroku.yml — two overlapping workflows for the same target is a maintenance hazard operator 2026-05-07 #693

References