Raxx · internal docs

internal · gated ↑ index

ADR 0028 — Intentional friction on prod deploys: manual gate over full automation

Status: Proposed Date: 2026-04-29 UTC Refs: docs/architecture/prod-deploy-gating.md, #81


Context

Every surface that ships Raxx code to production carries some risk of breakage. The question is where the human checkpoint sits: before the deploy fires, after, or not at all.

Three postures are possible:

A — Full auto: push to main (or tag) triggers prod deploy automatically. No human action required between merge and prod.

B — Intentional friction (manual gate): staging deploys automatically; prod requires an explicit human action (GitHub Environment approval click, or a typed confirmation phrase for local deploys). A human sees what will ship before approving.

C — Two-human rule: prod deploy requires approval from two distinct operators. Appropriate for regulated environments. Not applicable at current team size (one operator).

The existing API deploy workflow (deploy-heroku.yml) already implements posture B for Raptor. ADR-0020 hardened the tag-gated model for Raptor and Antlers. The console had no CI deploy at all — only a raw git push with no gate.


Decision

Adopt posture B (intentional friction) for all Heroku prod deploys, including console.

For CI-driven deploys: GitHub Environment required-reviewer gate on the production environment. The reviewer must click "Approve" in the Actions UI before the deploy job runs. This gate records the approver's identity, timestamp, and run ID in GitHub's environment approval log.

For break-glass (local script): typed confirmation phrase (deploy console to prod) that must be entered verbatim. Mis-typed phrase aborts with exit code 1. No --force or --yes flag to bypass this.

The diff preview (from-SHA, to-SHA, commit log) is shown to the operator before the confirmation prompt. The operator cannot approve a prod deploy without seeing what is about to ship.


Consequences

Positive: - Every prod deploy has a documented human checkpoint. No prod deploy can occur as an accidental side effect of a git push. - The approver's identity is logged in GitHub's environment approval record (CI path) or in console_audit_log (local script path). This satisfies the audit trail invariant. - Rollback is unrestricted. The gate is on the forward path only; heroku releases:rollback does not require approval. - Hotfix path is fast: the operator dispatches workflow_dispatch, sees the environment gate, approves immediately. No minimum wait. - The break-glass local script exists for CI outage scenarios. It carries the same confirmation gate, so the human checkpoint is preserved even when GitHub Actions is unavailable.

Negative / tradeoffs: - One extra click (CI path) or one typed phrase (local script) per prod deploy. This is the intended cost. For a solo operator with one prod console, this is under 30 seconds of overhead. - The local script confirmation phrase can be copy-pasted from memory or a doc. It is friction, not cryptographic proof of identity. The audit trail and SSH session/macOS keychain are the identity backstop for local deploys. - The GitHub Environment approval gate requires the reviewer not to be the same GitHub Actions service account that triggered the run. In practice, prod deploys are triggered by the operator's manual workflow_dispatch, so the approver (the operator) and the triggerer (also the operator) are the same human — but the gate acts on the GitHub user identity, not the workflow trigger. This is acceptable and matches the existing Raptor posture (ADR-0020 §Consequences).


Alternatives considered

Option A — Full auto (no gate)

The existing state for console before this design. Full auto is appropriate when: (1) the test suite catches all regressions, (2) the blast radius of a bad deploy is low, (3) deploys are frequent enough that the friction of a gate exceeds the risk of a bad deploy. None of these conditions hold for the console: the console mutates live infrastructure (secret rotation, RBAC changes, maintenance mode); a bad deploy is not a minor UI glitch.

Minimum soak timer (Option B variant)

A variation of B that adds a required wait step (e.g. 30 minutes) between staging deploy and prod approval becoming available. This was considered for Raptor in ADR-0020 and deferred. The rationale applies here: the soak window is "however long the operator takes to verify staging." Imposing a timer adds ceremony without proportionate safety gain for the current team size. This can be added later if velocity increases.

Typed phrase over approval click

The GitHub Environment approval click is only available when deploying via CI. Local deploys need a different gate. The typed phrase (deploy console to prod) is used for local deploys only. It is not a replacement for the CI approval gate — both exist in their respective paths.