Raxx · internal docs

internal · gated ↑ index

Branch Promotion Strategy — Soak Gate for Raptor / Antlers / Docs

Status: Proposed
Date: 2026-04-25
Author: raxx-pm-bot (software-architect agent)
Refs: Kristerpher's directive 2026-04-25


1. Context

Raxx runs three externally-visible surfaces with distinct deploy mechanics today:

Surface Code path Current staging trigger Current prod trigger
Raptor (backend) backend_v2/ push to main → Heroku raxx-api-staging tag v*.*.* → Heroku raxx-api-prod
Antlers (frontend) frontend/trademaster_ui/ push to main → CF Pages staging alias tag v*.*.* → CF Pages raxx.app
Console console/ manual git subtree push same — staging retired (#355)
Docs docs/ not deployed not deployed — future target

The ask is an explicit soak gate: a window between "merged to main" and "promoted to prod" long enough to catch regressions, and a promotion step that requires a deliberate operator action rather than being a side effect of git push.

Confirmed finding — Antlers is already tag-gated, not main-push-to-prod.

deploy.yml lines 89–121 make this explicit: the deploy-frontend job deploys to staging on refs/heads/main and deploys to raxx.app (production) only on refs/tags/v. The initial assumption in the brief ("Antlers may already be safe") is confirmed true. Antlers and Raptor are on symmetric tag-gated models already.

The gap: both surfaces promote to prod the instant release-please pushes the tag. The soak window between "merged to main" and "tag cut" is whatever elapsed time separates those two events — today, that's however fast Kristerpher reviews and merges the release PR. There is no minimum soak duration, no explicit go/no-go gate, and no human step required between the release PR merge and the tag push.


2. Invariants

These apply to the deployment pipeline itself.


3. What the three options actually mean for Raxx

Option A — Branch promotion (mainproduction PR)

main auto-deploys to staging. A human opens a PR from mainproduction. Merging production deploys to prod. The soak window is "time between feature merge and promotion PR merge."

References: GitLab Flow long-lived environment branches (https://docs.gitlab.com/ee/topics/gitlab_flow.html#production-branch-with-gitlab-flow).

Practical shape for Raxx: - Add a production branch, protected: require PR, require CI green, no direct pushes. - deploy.yml gains a trigger on push: branches: [production] that fires the prod deploy job. - release-please is retargeted to scan production rather than main for release PRs, or release-please is dropped from the prod-tag loop and replaced with manual tagging on production. - CF Pages --branch flag: currently passes github.ref_name. On production push it would pass production, which CF Pages would treat as the production alias if configured as such.

Key friction point: release-please runs on push to main. If the release PR now targets production, release-please needs its base branch changed. That is a one-line config change but it means release-please now opens a release PR from production → production (auto-commit + tag) which is unusual. More likely: release-please stays on main but tags are no longer the prod trigger — the production branch merge becomes the prod trigger instead, and tagging is decoupled from deployment.

Option B — Tag-only promotion (current model, hardened)

Both surfaces already gate prod on a semver tag. The gap is that tag creation is fast and automatic once the release PR merges. Hardening means slowing the release PR merge: add a required status check on the release PR that enforces a minimum soak time (N hours since last staging deploy), or require a manual approval step on the release PR before it can merge.

References: Trunk-based development with release trains (https://trunkbaseddevelopment.com/branch-for-release/).

Practical shape for Raxx: - Add a GitHub Environment approval gate on production environment (zero-cost on GitHub Pro, requires 1 required reviewer: Kristerpher). - When the tag deploy fires, Actions pauses at the deploy-prod / deploy-frontend jobs until Kristerpher clicks "Approve" in the GitHub UI. - Soak window = time between staging deploy and approval click. No minimum unless enforced by a timed gate job. - release-please flow is unchanged. No new branches. No workflow rewrites.

Option C — Trunk + automated soak signal

Push to main → staging. An automated check (smoke suite + N-hour timer) creates the tag automatically if all signals are green. Less operator agency, more automation.

References: Trunk-based development (https://trunkbaseddevelopment.com/), Continuous Deployment patterns.

Practical shape for Raxx: A GitHub Actions scheduled workflow checks "has staging been green for N hours since last deploy?" and calls gh release create if yes. Operator can block by closing the scheduled job or by pinning a "hold" label on the release PR.


4. Comparison table

Axis A: Branch promotion B: Tag + approval gate C: Trunk + auto-tag
Operator effort per release High — open promo PR, wait for CI, merge Low — one approval click in GH UI Minimal — approve or do nothing
Time-to-prod once "ready" Variable — depends on PR queue Variable — depends on approval speed Automatic after N hours
Risk of accidental prod deploy Low — branch protection blocks direct push Low — approval gate is mandatory Medium — automation can misfire
Hotfix path Create feature branch from production, merge to production directly (bypasses main soak), then backport to main Tag a hotfix SHA manually, bypass approval gate (emergency) Disable auto-tag, manual gh release create
Audit trail of what's in prod Git log of production branch + PR history GitHub release + tag annotation + environment approval log GitHub release + CI run log
Compatibility with release-please Requires config change to --release-type base branch Fully compatible — zero changes to release-please release-please is bypassed for tag creation
Rollback story git revert on production + push (or force-push with protection bypass) Tag a known-good SHA (v1.2.3-hotfix) Same as B
Solo-operator overhead Moderate — one extra PR to open and merge per release Low — one button click per release Lowest — only needed if blocking a release
Workflow change scope Large — new branch, branch protection, deploy trigger changes, CF Pages config Small — add environment: production required reviewer in GH settings Medium — new scheduled workflow, auto-tag logic

5. Recommendation: Option B with explicit approval gate

Rationale:

The existing infrastructure is already in the right shape. Both surfaces already gate prod on a semver tag. release-please already owns the tag lifecycle. Adding a required-reviewer approval gate on the production GitHub Environment is a single settings change — no workflow file rewrites, no new branches, no migration risk.

The soak window is determined by operator behavior: Kristerpher merges the release PR (which tags), sees the staging deploy succeed, verifies staging, then clicks "Approve" in the Actions environment gate. The window can be 5 minutes or 5 hours — it is an explicit operator decision each time, not an automatic timer.

Option A adds meaningful complexity for a solo operator: an extra PR-open-and-merge ceremony per release, branch protection on a long-lived production branch, and a non-trivial release-please reconfiguration. GitLab Flow's production branch exists primarily to handle parallel release streams — Raxx has one stream.

Option C trades operator agency for automation. Pre-launch, the risk of an automated deployment to prod without a human checkpoint is not worth the overhead savings.

What changes under Option B:

  1. GitHub UI: Settings → Environments → production — add required reviewer (Kristerpher).
  2. Both deploy-staging and deploy-prod jobs in deploy.yml already reference environment: staging / environment: production. The prod environment already exists and must be configured with the reviewer gate.
  3. Similarly in deploy-heroku.yml, the deploy job uses environment: ${{ needs.resolve.outputs.environment }} — the approval gate fires automatically for any dispatch to production.
  4. No workflow YAML changes required.
  5. No release-please config changes required.
  6. CF Pages: no changes. The pages deploy --branch flag already differentiates staging from prod via tag vs main push.

6. Soak sequence

sequenceDiagram
    participant Dev as Feature Developer
    participant GH as GitHub Actions
    participant Staging as raxx-api-staging / raxx-app (staging)
    participant Kristerpher as Operator (Kristerpher)
    participant Prod as raxx-api-prod / raxx.app

    Dev->>GH: merge feature PR to main
    GH->>Staging: deploy.yml — staging job fires
    GH->>Staging: smoke test passes
    Note over Staging: soak window begins<br/>(implicit — no timer)
    Kristerpher->>Staging: verify staging manually or wait for smoke
    Kristerpher->>GH: merge release-please PR
    GH->>GH: release-please pushes tag v*.*.*
    GH->>Prod: deploy.yml — prod job fires, pauses at environment gate
    GH-->>Kristerpher: email / Slack: "Waiting for approval to deploy to production"
    Kristerpher->>GH: click Approve in Actions UI
    GH->>Prod: deploy proceeds
    GH->>Prod: smoke test (5 retries)
    Prod-->>Kristerpher: deploy summary comment on release commit

Hotfix path:

sequenceDiagram
    participant Kristerpher as Operator
    participant GH as GitHub Actions
    participant Prod as raxx-api-prod / raxx.app

    Note over Kristerpher: regression detected in prod
    Kristerpher->>GH: push hotfix branch, merge to main, merge release PR fast
    GH->>GH: release-please tags hotfix version (v1.2.4)
    GH->>Prod: prod deploy fires, pauses at approval gate
    Kristerpher->>GH: Approve immediately (no soak required for known-good hotfix)
    GH->>Prod: hotfix deployed
    Note over Kristerpher: optional: bypass gate via workflow_dispatch<br/>in deploy-heroku.yml (emergency path preserved)

7. Docs surface

Kristerpher flagged "docs" alongside raptor and antlers. The docs surface today is docs/ in-repo content, not a deployed site. If getraxx.com or docs.raxx.app is added later:

No workflow changes are needed today. When the docs site is scoped, add it to deploy.yml following the Antlers pattern.


8. Migration plan

Current state → Option B:

  1. In GitHub UI: navigate to Settings → Environments → production. Add Kristerpher as required reviewer. Save.
  2. Verify that the next tag deploy pauses for approval before proceeding. Confirm via a staging-only test deploy using workflow_dispatch targeting staging first.
  3. No YAML changes, no branch creation, no release-please config changes.

Rollback from Option B:

Remove Kristerpher from required reviewers in the production environment settings. Deploys resume as automatic on tag push. Zero code change.


9. Security considerations


10. Open questions for Kristerpher

These must be decided before the implementation sub-card is claimed.

  1. Approval notification channel. GitHub sends an email when the approval gate pauses. Is that sufficient, or do you want a Slack DM to D0AJ7K184TV as well? (A small notify step in the workflow can post to Slack before the environment gate fires.)

  2. Minimum soak duration. Today the window is "as long as you take." Should there be a minimum — e.g., a 30-minute wait step before the prod jobs fire, ensuring staging has had at least N minutes of traffic? Or is the implicit window (however long between feature merge and release PR merge) sufficient?

  3. Docs surface. Is the docs soak model in scope now, or deferred until a docs site is actually provisioned?

  4. Approval self-review. GitHub's required reviewer cannot approve their own pending deployment if they are the only reviewer and triggered the workflow via their own merge. In practice, for push events, the workflow is triggered by the GITHUB_ACTIONS bot (release-please's tag push), not by Kristerpher's user — so self-approval should work. This should be confirmed with a dry-run once the gate is configured.

  5. Console. Kristerpher said console is out of scope for this design. Confirm: the manual git subtree push deploy path for console gets no soak gate, and that is intentional.


11. Rollout plan

Phase What When
Dark Read this doc, decide open questions Now
Configure Add required reviewer to production environment in GitHub settings After open questions resolved
Verify Trigger a canary release (even a patch bump) to confirm approval gate fires correctly First release after configuration
GA All subsequent prod deploys go through the approval gate Ongoing

No feature flags, no dark launch, no schema migration. This is a settings change plus human process.