Raxx · internal docs

internal · gated

Queue prod deploy — pre-deploy checklist

Date: 2026-05-18 Operator authorization: "Let's do both. But first finish up internal docs and then push!" Internal docs gate: Cleared — PRs #2437, #2439, #2442 merged Author: sre-agent Status: DEPLOY BLOCKED ON OPERATOR (two items require operator action before proceeding)


Staging state (source of truth for what goes to prod)

Field Value
App raxx-queue-staging
Current release v14
Release description Set SENTRY_DSN_QUEUE config vars
Dyno count 1 web
Stack container
Postgres addon heroku-postgresql:standard-0 (postgresql-polished-29366)
Health endpoint {"service":"raxx-queue","status":"ok","timestamp":"2026-05-18T22:55:23Z","version":"0.1.0"}
Main HEAD commit 0c4b7128 — fix(queue): guard add_subdirectory(tests) for Docker production builds

Staging is healthy. /health returns HTTP 200 with status=ok and service=queue on the first probe.


Prod app state

Field Value
App raxx-queue-prod
URL https://raxx-queue-prod-327fa047b4b6.herokuapp.com/
Current release v5 (Set SENTRY_DSN_QUEUE — no container ever deployed)
Dynos none — no dyno formation set
Stack container
Postgres addon heroku-postgresql:standard-0 (postgresql-flat-88052)
Health endpoint Not reachable (no dynos, no container released)

Prod app exists, same plan as staging (both Standard-0). No container has ever been released to prod. v5 is a config-set release, not a dyno release.


Postgres plan delta

Both staging and prod have heroku-postgresql:standard-0. No plan delta. Both are on the same tier.


Config var diff: staging vs prod

Variables present in STAGING but missing from PROD — these must be seeded before the container can function correctly:

Variable Staging value type Prod
FLAG_QUEUE_BILLING boolean string (true) MISSING
QUEUE_SERVICE_TOKEN_CONSOLE 64-char hex token MISSING
QUEUE_SERVICE_TOKEN_CRON 64-char hex token MISSING
QUEUE_SERVICE_TOKEN_RAPTOR 64-char hex token MISSING
RAPTOR_BASE_URL Heroku app URL MISSING
STRIPE_API_KEY Stripe live key (prod should be sk_live_*) MISSING
STRIPE_PUBLISHABLE_KEY Stripe live key (prod should be pk_live_*) MISSING
STRIPE_RESTRICTED_KEY Stripe live key (prod should be rk_live_*) MISSING
STRIPE_WEBHOOK_SECRET Stripe webhook secret MISSING

Variables present in BOTH (present in prod already):

Variable Note
DATABASE_URL Injected by Heroku Postgres addon — correct
SENTRY_DSN_QUEUE Same DSN (single Sentry project for all envs — verify this is intended)

CRITICAL: Staging is using Stripe TEST keys (sk_test_*, pk_test_*, rk_test_*). Prod MUST receive Stripe LIVE keys. These are NOT the same values. Operator must source from Stripe dashboard → API keys → Live mode.


ADR-0020 / GitHub Environment gate status

Expected: production environment has a human-approval gate before any prod deploy proceeds.

Actual (as of 2026-05-18): production environment exists but "protection_rules": []. The GH UI "Required reviewers" section was not accessible to the operator in the settings UI.

Resolution (2026-05-18): Workflow-level confirm gate implemented in .github/workflows/deploy-queue.yml (PR #2444). The workflow_dispatch trigger now requires two inputs that must both match before deploy-prod executes:

If the confirm string does not match, the confirm-gate-rejected job runs and exits 1 — making the rejection visible in the run log rather than silently skipping. The deploy-prod job is skipped (not failed) on mismatch, which keeps false-alarm CI noise down while still producing an auditable rejection record.

The GH Environment "Required reviewers" UI path remains open as a complementary hardening step if the operator later gains access to the settings UI — both safeguards are additive.


BLOCKER 1 — Prod config vars missing (7 of 9 env vars not set)

Without QUEUE_SERVICE_TOKEN_RAPTOR, QUEUE_SERVICE_TOKEN_CONSOLE, QUEUE_SERVICE_TOKEN_CRON, RAPTOR_BASE_URL, FLAG_QUEUE_BILLING, and all Stripe keys, the Queue service will boot but all authenticated endpoints and billing paths will fail immediately.

Resolution required: Operator must set prod env vars. See OPERATOR ACTION section below.

BLOCKER 2 — ADR-0020 reviewer gate (RESOLVED via workflow confirm gate)

~~Operator must add MooseQuest as a required reviewer on the production GitHub Environment.~~

RESOLVED: Workflow-level two-input confirm gate implemented (PR #2444). deploy-queue.yml now requires target=prod AND confirm=deploy-prod-now before the deploy-prod job runs. This satisfies the ADR-0020 intent (intentional human friction before prod deploy) without depending on the GH Environment reviewer UI.


OPERATOR ACTION REQUIRED

Step 1 — Seed prod config vars

The following values are needed. Stripe keys for prod must be LIVE keys (not test keys). Service tokens can be copied from staging (they are Queue-internal auth tokens, not environment-specific). RAPTOR_BASE_URL should point to the prod Raptor app.

heroku config:set \
  FLAG_QUEUE_BILLING=true \
  RAPTOR_BASE_URL=https://raxx-api-prod-XXXXX.herokuapp.com \
  --app raxx-queue-prod >/dev/null 2>&1

heroku config:set \
  QUEUE_SERVICE_TOKEN_RAPTOR=<from staging or mint new> \
  QUEUE_SERVICE_TOKEN_CONSOLE=<from staging or mint new> \
  QUEUE_SERVICE_TOKEN_CRON=<from staging or mint new> \
  --app raxx-queue-prod >/dev/null 2>&1

heroku config:set \
  STRIPE_API_KEY=sk_live_XXXX \
  STRIPE_PUBLISHABLE_KEY=pk_live_XXXX \
  STRIPE_RESTRICTED_KEY=rk_live_XXXX \
  STRIPE_WEBHOOK_SECRET=whsec_XXXX \
  --app raxx-queue-prod >/dev/null 2>&1

Note: Confirm whether SENTRY_DSN_QUEUE should use the same DSN as staging (currently it does). If Sentry projects are env-separated, update accordingly.

Step 3 — Set dyno formation

After first container release, prod will have no dyno until formation is set:

heroku ps:scale web=1 --app raxx-queue-prod >/dev/null 2>&1

(The workflow itself issues heroku container:release which should activate the web dyno, but verifying formation after is good hygiene.)

Step 3 (renumbered) — Dispatch the workflow

Once Steps 1-2 are complete, trigger:

Workflow dispatch URL: https://github.com/raxx-app/TradeMasterAPI/actions/workflows/deploy-queue.yml

  1. Click "Run workflow".
  2. Branch: main.
  3. confirm input: type exactly deploy-prod-now (case-sensitive, no quotes).
  4. target input: select prod from the dropdown.
  5. ref input: leave as main (default).
  6. Click "Run workflow".

The deploy-prod job runs only if both target=prod AND confirm=deploy-prod-now match exactly. If the confirm string is wrong, confirm-gate-rejected fails loudly in the run log and the deploy is blocked.

Post-deploy verification criteria: - heroku releases --app raxx-queue-prod shows a new release with "Deployed web (xxxxxxx)" - curl https://raxx-queue-prod-327fa047b4b6.herokuapp.com/health returns {"status":"ok","service":"queue",...} - heroku ps --app raxx-queue-prod shows web.1: up - heroku logs --tail --app raxx-queue-prod -n 50 shows no ERROR-level lines at startup


Deploy pipeline reference

Workflow: .github/workflows/deploy-queue.yml

Trigger: workflow_dispatch (manual only for prod)

Job chain on workflow_dispatch (target=prod, confirm=deploy-prod-now): build-test (ASan+UBSan) → build-container → deploy-prod

If target=prod but confirm does not match: build-test → build-container → confirm-gate-rejected (exits 1)

The deploy-prod job pulls the SHA-tagged image built for staging and retags it for prod — no rebuild. The exact binary that passed tests and was deployed to staging reaches prod.

Estimated duration: ~25-40 min on cache miss (vcpkg compile), ~5-8 min on cache hit.

vcpkg cache key: vcpkg-bookworm-<hash(queue/vcpkg.json)> — if the last staging run was recent, the cache should be warm.


References