Raxx · internal docs

internal · gated ↑ index

Production deploy approval gate — runbook

System: GitHub Environments / production environment Owner: operator / sre-agent Last incident: 2026-04-30 (see docs/incidents/2026-04-30-raxx-api-prod-down.md) Last reviewed: 2026-05-05 ADR: docs/architecture/adr/0028-prod-deploy-intentional-friction.md


How to tell it's broken

How to diagnose (in order)

  1. Check environment protection rules:

gh api repos/raxx-app/TradeMasterAPI/environments \ | python3 -c "import sys,json; [print(e['name'], e['protection_rules']) for e in json.load(sys.stdin)['environments']]"

Expected: production [{'type': 'required_reviewers', ...}] If production [] is returned, the reviewer gate is absent.

  1. Check which workflows reference the production environment:

grep -rn "environment:" .github/workflows/deploy-heroku.yml \ .github/workflows/deploy-console.yml .github/workflows/deploy-antlers.yml

All three should resolve to environment: production for prod-targeting runs.

  1. Verify the plan supports required reviewers: - Required reviewers on GitHub Environments requires GitHub Team or Enterprise plan. - Personal account repositories on GitHub Free/Pro do not support the required-reviewer protection rule via API. The setting IS available via the UI at Settings > Environments > production when the repo is under a qualifying plan.

Known failure modes

Failure mode A: production environment exists but has no protection rules

Symptom: gh api repos/raxx-app/TradeMasterAPI/environments shows "production" with "protection_rules": [].

Cause: The environment object was created (or reset) without the required-reviewer rule. This can happen if: - The environment was provisioned via API before the plan supported reviewer gates. - The environment was deleted and recreated without re-adding the rule.

Fix (operator — requires repo admin access):

  1. Go to: https://github.com/raxx-app/TradeMasterAPI/settings/environments
  2. Click production.
  3. Under "Deployment protection rules", click "Required reviewers".
  4. Add reviewer: MooseQuest (Kristerpher).
  5. Under "Deployment branches", select "Protected branches" (restricts to main).
  6. Click "Save protection rules".

Verification:

gh api repos/raxx-app/TradeMasterAPI/environments \
  | python3 -c "
import sys, json
envs = json.load(sys.stdin)['environments']
prod = next((e for e in envs if e['name'] == 'production'), None)
print('protection_rules:', prod['protection_rules'])
print('deployment_branch_policy:', prod['deployment_branch_policy'])
"

Expected output includes 'type': 'required_reviewers' in protection_rules and 'protected_branches': True in deployment_branch_policy.


Failure mode B: Plan does not support required reviewers (API 422)

Symptom: Attempting to set protection rules via gh api -X PUT .../environments/production returns:

{"message":"Failed to create the environment protection rule. Please ensure the billing plan supports the required reviewers protection rule.","status":"422"}

Cause: Required reviewers on GitHub Environments is a GitHub Team/Enterprise feature. GitHub Free and Pro personal accounts do not support this via API or UI.

Fix:

Option 1 (recommended) — Upgrade to GitHub Team plan: - Monthly cost: $4/user/month as of 2026-05. - Go to: https://github.com/settings/billing/plans - After upgrade, return to Failure mode A fix above.

Option 2 — Transfer repo to a GitHub organization on Team plan: - Higher operational overhead. Not recommended unless an org is already being set up.

Compensating control while plan upgrade is pending:

The deploy-freeze mechanism (docs/ops/runbooks/deploy-freeze.md) provides a manual brake on all production deploys via the console. Until the reviewer gate is active, set DEPLOY_FREEZE_OVERRIDE=0 (default) to ensure the console freeze check is in the critical path for every prod deploy.

Note: the freeze check is still bypassable by setting DEPLOY_FREEZE_OVERRIDE=1 (the break-glass override) — it is a softer control than the environment reviewer gate.


Failure mode C: production environment deleted or missing entirely

Symptom: gh api repos/raxx-app/TradeMasterAPI/environments returns only staging. Prod-targeting workflow runs reference environment: production but no gating occurs (GitHub treats a missing environment as ungated).

Fix:

  1. Create the environment: gh api -X PUT repos/raxx-app/TradeMasterAPI/environments/production This creates the environment object without protection rules.

  2. Then apply protection rules via the UI (see Failure mode A fix above).


Failure mode D: Reviewer gate appears but approver is wrong user

Symptom: The "Waiting for approval" banner appears in the Actions UI but the named reviewer is a bot or service account, not Kristerpher.

Cause: Reviewers were configured with the wrong GitHub user ID, or a bot account was added instead of the operator account.

Fix:

  1. Go to https://github.com/raxx-app/TradeMasterAPI/settings/environments.
  2. Click production.
  3. Under "Required reviewers", remove any non-operator accounts.
  4. Add MooseQuest (Kristerpher, GitHub user ID 20930225).
  5. Save.

How to approve a production deploy

When a production deploy is correctly gated, the workflow pauses at the deploy job with a yellow "Waiting for required environment approval" banner.

  1. Navigate to the Actions run URL (posted in the PR comment or Slack DM).
  2. Click "Review deployments".
  3. Select the production environment checkbox.
  4. Optionally add a comment (e.g. "approving raxx-api v1.2.3 prod").
  5. Click "Approve and deploy".

The deploy job then proceeds. The approver identity, timestamp, and run ID are recorded in GitHub's environment approval log at:

https://github.com/raxx-app/TradeMasterAPI/deployments/activity_log?environments_filter=production

Which workflows enforce this gate

The following workflows reference environment: production for production-targeted runs, meaning they will pause at the reviewer gate once protection rules are configured:

Workflow File Prod trigger
Deploy to Heroku (Raptor) .github/workflows/deploy-heroku.yml workflow_dispatch with environment=production
Deploy console .github/workflows/deploy-console.yml workflow_dispatch with environment=production
Deploy Antlers .github/workflows/deploy-antlers.yml v* or trademaster-api-v* tag push

Push-to-main events (staging only) do not trigger the production gate.


Emergency stop

To hard-freeze all production deploys immediately without requiring a GitHub plan change:

heroku config:set DEPLOY_FREEZE=true --app raxx-console-prod >/dev/null 2>&1

This activates the freeze check in deploy-heroku.yml and deploy-console.yml, blocking any new prod dispatch. See docs/ops/runbooks/deploy-freeze.md for full freeze/unfreeze procedure.


Escalation

Escalate to operator when: - The reviewer gate is absent and a production deploy is being attempted urgently. - A plan upgrade is being evaluated and cost approval is needed. - The production environment was deleted and needs recreation with correct settings.

Contact: Kristerpher (GitHub: MooseQuest)