Queue prod deploy — pre-deploy checklist
Date: 2026-05-18 Operator authorization: "Let's do both. But first finish up internal docs and then push!" Internal docs gate: Cleared — PRs #2437, #2439, #2442 merged Author: sre-agent Status: DEPLOY BLOCKED ON OPERATOR (two items require operator action before proceeding)
Staging state (source of truth for what goes to prod)
| Field | Value |
|---|---|
| App | raxx-queue-staging |
| Current release | v14 |
| Release description | Set SENTRY_DSN_QUEUE config vars |
| Dyno count | 1 web |
| Stack | container |
| Postgres addon | heroku-postgresql:standard-0 (postgresql-polished-29366) |
| Health endpoint | {"service":"raxx-queue","status":"ok","timestamp":"2026-05-18T22:55:23Z","version":"0.1.0"} |
| Main HEAD commit | 0c4b7128 — fix(queue): guard add_subdirectory(tests) for Docker production builds |
Staging is healthy. /health returns HTTP 200 with status=ok and service=queue on the first probe.
Prod app state
| Field | Value |
|---|---|
| App | raxx-queue-prod |
| URL | https://raxx-queue-prod-327fa047b4b6.herokuapp.com/ |
| Current release | v5 (Set SENTRY_DSN_QUEUE — no container ever deployed) |
| Dynos | none — no dyno formation set |
| Stack | container |
| Postgres addon | heroku-postgresql:standard-0 (postgresql-flat-88052) |
| Health endpoint | Not reachable (no dynos, no container released) |
Prod app exists, same plan as staging (both Standard-0). No container has ever been released to prod. v5 is a config-set release, not a dyno release.
Postgres plan delta
Both staging and prod have heroku-postgresql:standard-0. No plan delta. Both are on the same tier.
Config var diff: staging vs prod
Variables present in STAGING but missing from PROD — these must be seeded before the container can function correctly:
| Variable | Staging value type | Prod |
|---|---|---|
FLAG_QUEUE_BILLING |
boolean string (true) |
MISSING |
QUEUE_SERVICE_TOKEN_CONSOLE |
64-char hex token | MISSING |
QUEUE_SERVICE_TOKEN_CRON |
64-char hex token | MISSING |
QUEUE_SERVICE_TOKEN_RAPTOR |
64-char hex token | MISSING |
RAPTOR_BASE_URL |
Heroku app URL | MISSING |
STRIPE_API_KEY |
Stripe live key (prod should be sk_live_*) |
MISSING |
STRIPE_PUBLISHABLE_KEY |
Stripe live key (prod should be pk_live_*) |
MISSING |
STRIPE_RESTRICTED_KEY |
Stripe live key (prod should be rk_live_*) |
MISSING |
STRIPE_WEBHOOK_SECRET |
Stripe webhook secret | MISSING |
Variables present in BOTH (present in prod already):
| Variable | Note |
|---|---|
DATABASE_URL |
Injected by Heroku Postgres addon — correct |
SENTRY_DSN_QUEUE |
Same DSN (single Sentry project for all envs — verify this is intended) |
CRITICAL: Staging is using Stripe TEST keys (sk_test_*, pk_test_*, rk_test_*). Prod MUST receive Stripe LIVE keys. These are NOT the same values. Operator must source from Stripe dashboard → API keys → Live mode.
ADR-0020 / GitHub Environment gate status
Expected: production environment has a human-approval gate before any prod deploy proceeds.
Actual (as of 2026-05-18): production environment exists but "protection_rules": []. The GH UI "Required reviewers" section was not accessible to the operator in the settings UI.
Resolution (2026-05-18): Workflow-level confirm gate implemented in .github/workflows/deploy-queue.yml (PR #2444). The workflow_dispatch trigger now requires two inputs that must both match before deploy-prod executes:
targetmust be set toprod(choice input)confirmmust equal the literal stringdeploy-prod-now(case-sensitive, no regex)
If the confirm string does not match, the confirm-gate-rejected job runs and exits 1 — making the rejection visible in the run log rather than silently skipping. The deploy-prod job is skipped (not failed) on mismatch, which keeps false-alarm CI noise down while still producing an auditable rejection record.
The GH Environment "Required reviewers" UI path remains open as a complementary hardening step if the operator later gains access to the settings UI — both safeguards are additive.
BLOCKER 1 — Prod config vars missing (7 of 9 env vars not set)
Without QUEUE_SERVICE_TOKEN_RAPTOR, QUEUE_SERVICE_TOKEN_CONSOLE, QUEUE_SERVICE_TOKEN_CRON, RAPTOR_BASE_URL, FLAG_QUEUE_BILLING, and all Stripe keys, the Queue service will boot but all authenticated endpoints and billing paths will fail immediately.
Resolution required: Operator must set prod env vars. See OPERATOR ACTION section below.
BLOCKER 2 — ADR-0020 reviewer gate (RESOLVED via workflow confirm gate)
~~Operator must add MooseQuest as a required reviewer on the production GitHub Environment.~~
RESOLVED: Workflow-level two-input confirm gate implemented (PR #2444). deploy-queue.yml now requires target=prod AND confirm=deploy-prod-now before the deploy-prod job runs. This satisfies the ADR-0020 intent (intentional human friction before prod deploy) without depending on the GH Environment reviewer UI.
OPERATOR ACTION REQUIRED
Step 1 — Seed prod config vars
The following values are needed. Stripe keys for prod must be LIVE keys (not test keys). Service tokens can be copied from staging (they are Queue-internal auth tokens, not environment-specific). RAPTOR_BASE_URL should point to the prod Raptor app.
heroku config:set \
FLAG_QUEUE_BILLING=true \
RAPTOR_BASE_URL=https://raxx-api-prod-XXXXX.herokuapp.com \
--app raxx-queue-prod >/dev/null 2>&1
heroku config:set \
QUEUE_SERVICE_TOKEN_RAPTOR=<from staging or mint new> \
QUEUE_SERVICE_TOKEN_CONSOLE=<from staging or mint new> \
QUEUE_SERVICE_TOKEN_CRON=<from staging or mint new> \
--app raxx-queue-prod >/dev/null 2>&1
heroku config:set \
STRIPE_API_KEY=sk_live_XXXX \
STRIPE_PUBLISHABLE_KEY=pk_live_XXXX \
STRIPE_RESTRICTED_KEY=rk_live_XXXX \
STRIPE_WEBHOOK_SECRET=whsec_XXXX \
--app raxx-queue-prod >/dev/null 2>&1
Note: Confirm whether SENTRY_DSN_QUEUE should use the same DSN as staging (currently it does). If Sentry projects are env-separated, update accordingly.
Step 3 — Set dyno formation
After first container release, prod will have no dyno until formation is set:
heroku ps:scale web=1 --app raxx-queue-prod >/dev/null 2>&1
(The workflow itself issues heroku container:release which should activate the web dyno, but verifying formation after is good hygiene.)
Step 3 (renumbered) — Dispatch the workflow
Once Steps 1-2 are complete, trigger:
Workflow dispatch URL:
https://github.com/raxx-app/TradeMasterAPI/actions/workflows/deploy-queue.yml
- Click "Run workflow".
- Branch:
main. confirminput: type exactlydeploy-prod-now(case-sensitive, no quotes).targetinput: selectprodfrom the dropdown.refinput: leave asmain(default).- Click "Run workflow".
The deploy-prod job runs only if both target=prod AND confirm=deploy-prod-now match exactly. If the confirm string is wrong, confirm-gate-rejected fails loudly in the run log and the deploy is blocked.
Post-deploy verification criteria:
- heroku releases --app raxx-queue-prod shows a new release with "Deployed web (xxxxxxx)"
- curl https://raxx-queue-prod-327fa047b4b6.herokuapp.com/health returns {"status":"ok","service":"queue",...}
- heroku ps --app raxx-queue-prod shows web.1: up
- heroku logs --tail --app raxx-queue-prod -n 50 shows no ERROR-level lines at startup
Deploy pipeline reference
Workflow: .github/workflows/deploy-queue.yml
Trigger: workflow_dispatch (manual only for prod)
Job chain on workflow_dispatch (target=prod, confirm=deploy-prod-now):
build-test (ASan+UBSan) → build-container → deploy-prod
If target=prod but confirm does not match:
build-test → build-container → confirm-gate-rejected (exits 1)
The deploy-prod job pulls the SHA-tagged image built for staging and retags it for prod — no rebuild. The exact binary that passed tests and was deployed to staging reaches prod.
Estimated duration: ~25-40 min on cache miss (vcpkg compile), ~5-8 min on cache hit.
vcpkg cache key: vcpkg-bookworm-<hash(queue/vcpkg.json)> — if the last staging run was recent, the cache should be warm.
References
- Runbook:
docs/ops/runbooks/prod-deploy-approval.md - ADR-0020:
docs/architecture/adr/0028-prod-deploy-intentional-friction.md - Workflow:
.github/workflows/deploy-queue.yml - Staging app:
raxx-queue-staging - Prod app:
raxx-queue-prod