Status: Proposed
Date: 2026-04-25
Author: raxx-pm-bot (software-architect agent)
Refs: Kristerpher's directive 2026-04-25
Raxx runs three externally-visible surfaces with distinct deploy mechanics today:
| Surface | Code path | Current staging trigger | Current prod trigger |
|---|---|---|---|
| Raptor (backend) | backend_v2/ |
push to main → Heroku raxx-api-staging |
tag v*.*.* → Heroku raxx-api-prod |
| Antlers (frontend) | frontend/trademaster_ui/ |
push to main → CF Pages staging alias |
tag v*.*.* → CF Pages raxx.app |
| Console | console/ |
manual git subtree push |
same — staging retired (#355) |
| Docs | docs/ |
not deployed | not deployed — future target |
The ask is an explicit soak gate: a window between "merged to main" and "promoted to prod" long enough to catch regressions, and a promotion step that requires a deliberate operator action rather than being a side effect of git push.
Confirmed finding — Antlers is already tag-gated, not main-push-to-prod.
deploy.yml lines 89–121 make this explicit: the deploy-frontend job deploys to staging on refs/heads/main and deploys to raxx.app (production) only on refs/tags/v. The initial assumption in the brief ("Antlers may already be safe") is confirmed true. Antlers and Raptor are on symmetric tag-gated models already.
The gap: both surfaces promote to prod the instant release-please pushes the tag. The soak window between "merged to main" and "tag cut" is whatever elapsed time separates those two events — today, that's however fast Kristerpher reviews and merges the release PR. There is no minimum soak duration, no explicit go/no-go gate, and no human step required between the release PR merge and the tag push.
These apply to the deployment pipeline itself.
deploy-heroku.yml workflow_dispatch path (manual deploy of any ref to any environment) must be preserved as an emergency bypass — it is the current hotfix path and doubles as a kill-switch.main → production PR)main auto-deploys to staging. A human opens a PR from main → production. Merging production deploys to prod. The soak window is "time between feature merge and promotion PR merge."
References: GitLab Flow long-lived environment branches (https://docs.gitlab.com/ee/topics/gitlab_flow.html#production-branch-with-gitlab-flow).
Practical shape for Raxx:
- Add a production branch, protected: require PR, require CI green, no direct pushes.
- deploy.yml gains a trigger on push: branches: [production] that fires the prod deploy job.
- release-please is retargeted to scan production rather than main for release PRs, or release-please is dropped from the prod-tag loop and replaced with manual tagging on production.
- CF Pages --branch flag: currently passes github.ref_name. On production push it would pass production, which CF Pages would treat as the production alias if configured as such.
Key friction point: release-please runs on push to main. If the release PR now targets production, release-please needs its base branch changed. That is a one-line config change but it means release-please now opens a release PR from production → production (auto-commit + tag) which is unusual. More likely: release-please stays on main but tags are no longer the prod trigger — the production branch merge becomes the prod trigger instead, and tagging is decoupled from deployment.
Both surfaces already gate prod on a semver tag. The gap is that tag creation is fast and automatic once the release PR merges. Hardening means slowing the release PR merge: add a required status check on the release PR that enforces a minimum soak time (N hours since last staging deploy), or require a manual approval step on the release PR before it can merge.
References: Trunk-based development with release trains (https://trunkbaseddevelopment.com/branch-for-release/).
Practical shape for Raxx:
- Add a GitHub Environment approval gate on production environment (zero-cost on GitHub Pro, requires 1 required reviewer: Kristerpher).
- When the tag deploy fires, Actions pauses at the deploy-prod / deploy-frontend jobs until Kristerpher clicks "Approve" in the GitHub UI.
- Soak window = time between staging deploy and approval click. No minimum unless enforced by a timed gate job.
- release-please flow is unchanged. No new branches. No workflow rewrites.
Push to main → staging. An automated check (smoke suite + N-hour timer) creates the tag automatically if all signals are green. Less operator agency, more automation.
References: Trunk-based development (https://trunkbaseddevelopment.com/), Continuous Deployment patterns.
Practical shape for Raxx: A GitHub Actions scheduled workflow checks "has staging been green for N hours since last deploy?" and calls gh release create if yes. Operator can block by closing the scheduled job or by pinning a "hold" label on the release PR.
| Axis | A: Branch promotion | B: Tag + approval gate | C: Trunk + auto-tag |
|---|---|---|---|
| Operator effort per release | High — open promo PR, wait for CI, merge | Low — one approval click in GH UI | Minimal — approve or do nothing |
| Time-to-prod once "ready" | Variable — depends on PR queue | Variable — depends on approval speed | Automatic after N hours |
| Risk of accidental prod deploy | Low — branch protection blocks direct push | Low — approval gate is mandatory | Medium — automation can misfire |
| Hotfix path | Create feature branch from production, merge to production directly (bypasses main soak), then backport to main |
Tag a hotfix SHA manually, bypass approval gate (emergency) | Disable auto-tag, manual gh release create |
| Audit trail of what's in prod | Git log of production branch + PR history |
GitHub release + tag annotation + environment approval log | GitHub release + CI run log |
| Compatibility with release-please | Requires config change to --release-type base branch |
Fully compatible — zero changes to release-please | release-please is bypassed for tag creation |
| Rollback story | git revert on production + push (or force-push with protection bypass) |
Tag a known-good SHA (v1.2.3-hotfix) |
Same as B |
| Solo-operator overhead | Moderate — one extra PR to open and merge per release | Low — one button click per release | Lowest — only needed if blocking a release |
| Workflow change scope | Large — new branch, branch protection, deploy trigger changes, CF Pages config | Small — add environment: production required reviewer in GH settings |
Medium — new scheduled workflow, auto-tag logic |
Rationale:
The existing infrastructure is already in the right shape. Both surfaces already gate prod on a semver tag. release-please already owns the tag lifecycle. Adding a required-reviewer approval gate on the production GitHub Environment is a single settings change — no workflow file rewrites, no new branches, no migration risk.
The soak window is determined by operator behavior: Kristerpher merges the release PR (which tags), sees the staging deploy succeed, verifies staging, then clicks "Approve" in the Actions environment gate. The window can be 5 minutes or 5 hours — it is an explicit operator decision each time, not an automatic timer.
Option A adds meaningful complexity for a solo operator: an extra PR-open-and-merge ceremony per release, branch protection on a long-lived production branch, and a non-trivial release-please reconfiguration. GitLab Flow's production branch exists primarily to handle parallel release streams — Raxx has one stream.
Option C trades operator agency for automation. Pre-launch, the risk of an automated deployment to prod without a human checkpoint is not worth the overhead savings.
What changes under Option B:
Settings → Environments → production — add required reviewer (Kristerpher).deploy-staging and deploy-prod jobs in deploy.yml already reference environment: staging / environment: production. The prod environment already exists and must be configured with the reviewer gate.deploy-heroku.yml, the deploy job uses environment: ${{ needs.resolve.outputs.environment }} — the approval gate fires automatically for any dispatch to production.pages deploy --branch flag already differentiates staging from prod via tag vs main push.sequenceDiagram
participant Dev as Feature Developer
participant GH as GitHub Actions
participant Staging as raxx-api-staging / raxx-app (staging)
participant Kristerpher as Operator (Kristerpher)
participant Prod as raxx-api-prod / raxx.app
Dev->>GH: merge feature PR to main
GH->>Staging: deploy.yml — staging job fires
GH->>Staging: smoke test passes
Note over Staging: soak window begins<br/>(implicit — no timer)
Kristerpher->>Staging: verify staging manually or wait for smoke
Kristerpher->>GH: merge release-please PR
GH->>GH: release-please pushes tag v*.*.*
GH->>Prod: deploy.yml — prod job fires, pauses at environment gate
GH-->>Kristerpher: email / Slack: "Waiting for approval to deploy to production"
Kristerpher->>GH: click Approve in Actions UI
GH->>Prod: deploy proceeds
GH->>Prod: smoke test (5 retries)
Prod-->>Kristerpher: deploy summary comment on release commit
Hotfix path:
sequenceDiagram
participant Kristerpher as Operator
participant GH as GitHub Actions
participant Prod as raxx-api-prod / raxx.app
Note over Kristerpher: regression detected in prod
Kristerpher->>GH: push hotfix branch, merge to main, merge release PR fast
GH->>GH: release-please tags hotfix version (v1.2.4)
GH->>Prod: prod deploy fires, pauses at approval gate
Kristerpher->>GH: Approve immediately (no soak required for known-good hotfix)
GH->>Prod: hotfix deployed
Note over Kristerpher: optional: bypass gate via workflow_dispatch<br/>in deploy-heroku.yml (emergency path preserved)
Kristerpher flagged "docs" alongside raptor and antlers. The docs surface today is docs/ in-repo content, not a deployed site. If getraxx.com or docs.raxx.app is added later:
--branch flag differentiates staging from prod, and the production environment approval gate applies to the deploy workflow.No workflow changes are needed today. When the docs site is scoped, add it to deploy.yml following the Antlers pattern.
Current state → Option B:
Settings → Environments → production. Add Kristerpher as required reviewer. Save.workflow_dispatch targeting staging first.Rollback from Option B:
Remove Kristerpher from required reviewers in the production environment settings. Deploys resume as automatic on tag push. Zero code change.
admin access to the repo. Today that is only Kristerpher — this is acceptable.deploy-heroku.yml manual dispatch to production bypasses the automated tag flow but still triggers the production environment gate (the deploy job uses the same environment block). This is correct — the gate fires on any prod deploy regardless of trigger.These must be decided before the implementation sub-card is claimed.
Approval notification channel. GitHub sends an email when the approval gate pauses. Is that sufficient, or do you want a Slack DM to D0AJ7K184TV as well? (A small notify step in the workflow can post to Slack before the environment gate fires.)
Minimum soak duration. Today the window is "as long as you take." Should there be a minimum — e.g., a 30-minute wait step before the prod jobs fire, ensuring staging has had at least N minutes of traffic? Or is the implicit window (however long between feature merge and release PR merge) sufficient?
Docs surface. Is the docs soak model in scope now, or deferred until a docs site is actually provisioned?
Approval self-review. GitHub's required reviewer cannot approve their own pending deployment if they are the only reviewer and triggered the workflow via their own merge. In practice, for push events, the workflow is triggered by the GITHUB_ACTIONS bot (release-please's tag push), not by Kristerpher's user — so self-approval should work. This should be confirmed with a dry-run once the gate is configured.
Console. Kristerpher said console is out of scope for this design. Confirm: the manual git subtree push deploy path for console gets no soak gate, and that is intentional.
| Phase | What | When |
|---|---|---|
| Dark | Read this doc, decide open questions | Now |
| Configure | Add required reviewer to production environment in GitHub settings |
After open questions resolved |
| Verify | Trigger a canary release (even a patch bump) to confirm approval gate fires correctly | First release after configuration |
| GA | All subsequent prod deploys go through the approval gate | Ongoing |
No feature flags, no dark launch, no schema migration. This is a settings change plus human process.