Production deploy approval gate — runbook
System: GitHub Environments / production environment
Owner: operator / sre-agent
Last incident: 2026-04-30 (see docs/ops/incidents/2026-04-30-raxx-api-prod-down.md)
Last reviewed: 2026-05-05
ADR: docs/architecture/adr/0028-prod-deploy-intentional-friction.md
How to tell it's broken
- Symptom 1: A
workflow_dispatchondeploy-heroku.ymlordeploy-console.ymlwithenvironment=productioncompletes without pausing for reviewer approval — the deploy job starts immediately. - Symptom 2:
gh api repos/raxx-app/TradeMasterAPI/environmentsreturns"production"with"protection_rules": [](empty array). - Symptom 3: GitHub Actions UI shows the deploy job under a production run without a yellow "Waiting for approval" banner.
How to diagnose (in order)
- Check environment protection rules:
gh api repos/raxx-app/TradeMasterAPI/environments \
| python3 -c "import sys,json; [print(e['name'], e['protection_rules']) for e in json.load(sys.stdin)['environments']]"
Expected: production [{'type': 'required_reviewers', ...}]
If production [] is returned, the reviewer gate is absent.
- Check which workflows reference the production environment:
grep -rn "environment:" .github/workflows/deploy-heroku.yml \
.github/workflows/deploy-console.yml .github/workflows/deploy-antlers.yml
All three should resolve to environment: production for prod-targeting runs.
- Verify the plan supports required reviewers:
- Required reviewers on GitHub Environments requires GitHub Team or Enterprise plan.
- Personal account repositories on GitHub Free/Pro do not support the required-reviewer protection rule via API. The setting IS available via the UI at
Settings > Environments > productionwhen the repo is under a qualifying plan.
Known failure modes
Failure mode A: production environment exists but has no protection rules
Symptom: gh api repos/raxx-app/TradeMasterAPI/environments shows "production" with "protection_rules": [].
Cause: The environment object was created (or reset) without the required-reviewer rule. This can happen if: - The environment was provisioned via API before the plan supported reviewer gates. - The environment was deleted and recreated without re-adding the rule.
Fix (operator — requires repo admin access):
- Go to:
https://github.com/raxx-app/TradeMasterAPI/settings/environments - Click
production. - Under "Deployment protection rules", click "Required reviewers".
- Add reviewer:
MooseQuest(Kristerpher). - Under "Deployment branches", select "Protected branches" (restricts to
main). - Click "Save protection rules".
Verification:
gh api repos/raxx-app/TradeMasterAPI/environments \
| python3 -c "
import sys, json
envs = json.load(sys.stdin)['environments']
prod = next((e for e in envs if e['name'] == 'production'), None)
print('protection_rules:', prod['protection_rules'])
print('deployment_branch_policy:', prod['deployment_branch_policy'])
"
Expected output includes 'type': 'required_reviewers' in protection_rules and 'protected_branches': True in deployment_branch_policy.
Failure mode B: Plan does not support required reviewers (API 422)
Symptom: Attempting to set protection rules via gh api -X PUT .../environments/production returns:
{"message":"Failed to create the environment protection rule. Please ensure the billing plan supports the required reviewers protection rule.","status":"422"}
Cause: Required reviewers on GitHub Environments is a GitHub Team/Enterprise feature. GitHub Free and Pro personal accounts do not support this via API or UI.
Fix:
Option 1 (recommended) — Upgrade to GitHub Team plan:
- Monthly cost: $4/user/month as of 2026-05.
- Go to: https://github.com/settings/billing/plans
- After upgrade, return to Failure mode A fix above.
Option 2 — Transfer repo to a GitHub organization on Team plan: - Higher operational overhead. Not recommended unless an org is already being set up.
Compensating control while plan upgrade is pending:
The deploy-freeze mechanism (docs/ops/runbooks/deploy-freeze.md) provides a manual brake on all production deploys via the console. Until the reviewer gate is active, set DEPLOY_FREEZE_OVERRIDE=0 (default) to ensure the console freeze check is in the critical path for every prod deploy.
Note: the freeze check is still bypassable by setting DEPLOY_FREEZE_OVERRIDE=1 (the break-glass override) — it is a softer control than the environment reviewer gate.
Failure mode C: production environment deleted or missing entirely
Symptom: gh api repos/raxx-app/TradeMasterAPI/environments returns only staging. Prod-targeting workflow runs reference environment: production but no gating occurs (GitHub treats a missing environment as ungated).
Fix:
-
Create the environment:
gh api -X PUT repos/raxx-app/TradeMasterAPI/environments/productionThis creates the environment object without protection rules. -
Then apply protection rules via the UI (see Failure mode A fix above).
Failure mode D: Reviewer gate appears but approver is wrong user
Symptom: The "Waiting for approval" banner appears in the Actions UI but the named reviewer is a bot or service account, not Kristerpher.
Cause: Reviewers were configured with the wrong GitHub user ID, or a bot account was added instead of the operator account.
Fix:
- Go to
https://github.com/raxx-app/TradeMasterAPI/settings/environments. - Click
production. - Under "Required reviewers", remove any non-operator accounts.
- Add
MooseQuest(Kristerpher, GitHub user ID 20930225). - Save.
How to approve a production deploy
When a production deploy is correctly gated, the workflow pauses at the deploy job with a yellow "Waiting for required environment approval" banner.
- Navigate to the Actions run URL (posted in the PR comment or Slack DM).
- Click "Review deployments".
- Select the
productionenvironment checkbox. - Optionally add a comment (e.g. "approving raxx-api v1.2.3 prod").
- Click "Approve and deploy".
The deploy job then proceeds. The approver identity, timestamp, and run ID are recorded in GitHub's environment approval log at:
https://github.com/raxx-app/TradeMasterAPI/deployments/activity_log?environments_filter=production
Which workflows enforce this gate
The following workflows reference environment: production for production-targeted runs, meaning they will pause at the reviewer gate once protection rules are configured:
| Workflow | File | Prod trigger |
|---|---|---|
| Deploy to Heroku (Raptor) | .github/workflows/deploy-heroku.yml |
workflow_dispatch with environment=production |
| Deploy console | .github/workflows/deploy-console.yml |
workflow_dispatch with environment=production |
| Deploy Antlers | .github/workflows/deploy-antlers.yml |
v* or trademaster-api-v* tag push |
Push-to-main events (staging only) do not trigger the production gate.
Emergency stop
To hard-freeze all production deploys immediately without requiring a GitHub plan change:
heroku config:set DEPLOY_FREEZE=true --app raxx-console-prod >/dev/null 2>&1
This activates the freeze check in deploy-heroku.yml and deploy-console.yml, blocking any new prod dispatch. See docs/ops/runbooks/deploy-freeze.md for full freeze/unfreeze procedure.
Escalation
Escalate to operator when:
- The reviewer gate is absent and a production deploy is being attempted urgently.
- A plan upgrade is being evaluated and cost approval is needed.
- The production environment was deleted and needs recreation with correct settings.
Contact: Kristerpher (GitHub: MooseQuest)