Raxx · internal docs

internal · gated

ADR 0115 — develop → release → main branching model

Status: Proposed Date: 2026-06-29 UTC Deciders: Kristerpher (operator directive), software-architect Scope: All CI/CD workflows (.github/workflows/), branch-protection rules, deploy triggers, CI gate scripts


Context

The monorepo currently uses a single long-lived branch (main) as the integration surface, staging trigger, and production trigger simultaneously. The consequence is:

  1. Staging and production share a trigger. Push to main auto-deploys Raptor, Console, Velvet, Queue, and Antlers Next to staging. For Antlers Next (prod), deploy-antlers-next-prod.yml also fires on push to main (path-filtered to frontend/raxx-next/**), meaning a single merge to main can deploy to prod with no explicit promotion step.

  2. Heavy scans run on every feature PR. OWASP ZAP (security-zap.yml), Queue Docker smoke (queue-docker-smoke.yml — 25–30 min cold build on ubuntu), and Playwright e2e (e2e-smoke.yml, antlers-next-ci.yml) trigger on every PR that touches their path filter. With multiple daily PRs this compounds rapidly.

  3. Double-trigger billing. antlers-next-ci.yml and ios-ci.yml carry both a bare push: paths: trigger (no branches: filter) and a pull_request: paths: trigger. GitHub Actions fires both events when a developer pushes to a feature branch with an open PR — running typecheck + build + vitest + Playwright twice, and iOS builds (macOS runners at 10× ubuntu cost) twice.

  4. No explicit staging-to-prod promotion gate. The only gate between staging and prod is a manual workflow_dispatch call (for Heroku services) or the Antlers prod push-on-main auto-trigger. There is no branch that semantically means "this has been validated on staging and is cleared for prod."

Operator directive (verbatim): "develop → release → main … Release points to staging and main points to production. The heaviest scans should only be when we merge to main, because we don't have to churn the big issues."


Invariants

These apply to the pipeline design and are non-negotiable:


Branch roles

develop — integration branch (default branch)

All feature work targets develop. Agents branch from develop. All PR-time validation (cheap gates) runs here. develop never deploys anywhere automatically; it is purely an integration surface.

Why develop is the default branch: GitHub's default branch governs the PR base for new PRs, the branch shown on the repo home page, and the target of git clone. Making develop the default branch means opening a PR via the GitHub UI defaults to targeting develop, which is what every contributor (human and agent) should do.

release — staging promotion branch

A PR from developrelease is the explicit staging promotion step. Merging developrelease auto-deploys all services to their staging environments. Heavy boundary scans (ZAP against staging, Queue Docker smoke, full Playwright e2e) run as required checks on the developrelease PR so they gate the merge, not the post-merge deploy.

main — production branch

A PR from releasemain is the explicit production promotion step. Merging releasemain auto-deploys all services to production. The heaviest scans (nightly-class security scan, terraform plan with cloud credentials, iOS full build) run as required checks on releasemain PRs. main is a receive-only branch; no feature work ever targets it directly.


Data model / state machine

feature/* ─── PR ──► develop ─── PR ──► release ─── PR ──► main
                       ↓                    ↓                ↓
                  (no deploy)          → STAGING          → PROD
                  cheap CI             medium CI          heavy CI

Merge strategy per boundary

Boundary Strategy Rationale
feature → develop Squash merge Current convention; keeps develop linear
develop → release Merge commit Preserves develop's commit graph on release
release → main Merge commit Release SHA is the prod deploy artifact; must be traceable

CI trigger matrix

This table states the new trigger for every workflow that currently references main or has a double-trigger pattern. Workflows not listed are schedule-only or workflow_dispatch-only and require no trigger changes.

Lightweight gates (run on every PR, branch-agnostic)

Workflow New on: trigger Gating
ci-pr.yml pull_request: types: [opened, synchronize, reopened] (no branches filter — unchanged) smoke_suite, base_branch_lint, stale-branch-guard, flag_promotion_check, migration-gate, pii-scan, sqitch_plan_lint, asset-manifest-check
pii-scan.yml pull_request + push: branches: [develop, release, main] Hard gate; remains branch-agnostic
lint-cf-tokens.yml pull_request + push: branches: [develop, release, main] Add develop + release to push filter
lint-cf-access-headers.yml same pattern same
lint-cf-pages-deploy-uniqueness.yml same pattern same
lint-workflow-secret-names.yml same pattern same
terraform-validate.yml pull_request: (branch-agnostic) + push: branches: [release, main] fmt + validate stays cheap at PR; runs post-merge on release/main

Heavy boundary gates (run only at develop → release or release → main)

These workflows REMOVE their current broad pull_request: or bare-push: trigger and replace it with a boundary-scoped trigger:

Workflow Old trigger New trigger Cost saved
security-zap.yml pull_request: (all PRs touching Antlers/backend) pull_request: branches: [release, main] + schedule: ZAP fires once per promotion, not per feature PR
queue-docker-smoke.yml pull_request: (all PRs touching queue/) pull_request: branches: [release, main] + paths filter Saves 25–30 min Docker build per feature PR
e2e-smoke.yml pull_request: (all PRs touching auth/options/frontend) pull_request: branches: [release, main] + schedule: Full Playwright fires once per promotion
ios-ci.yml push: paths: [ios/**] + pull_request: paths: [ios/**] pull_request: paths: [ios/**] ONLY (remove push trigger) Eliminates push/PR double-billing on macOS runners (10× cost)
antlers-next-ci.yml push: paths: [frontend/raxx-next/**] + pull_request: paths: [...] pull_request: paths: [...] ONLY (remove push trigger) Eliminates double-billing: typecheck + build + Playwright run once, not twice

CI on integration branches (post-merge validation)

Workflow New trigger Purpose
ci.yml (rename: ci-develop.yml) push: branches: [develop] Post-merge smoke on integration branch (replaces push-to-main)
ci-boundary.yml (NEW — minimal) push: branches: [release, main] Confirm deploy-ready SHA is clean after boundary merge; thin wrapper re-running pii-scan + gitleaks

The existing ci.yml currently triggers on push: branches: [main]. After renaming to signal its new home, it triggers on push: branches: [develop]. A lightweight new workflow ci-boundary.yml handles post-merge checks on release and main (the boundary merge CI is primarily handled by the PR checks; ci-boundary.yml is a thin safety net).

YAML trigger shape — concrete examples

Eliminating the Antlers double-trigger (antlers-next-ci.yml):

# BEFORE
on:
  push:
    paths:
      - "frontend/raxx-next/**"
      - ".github/workflows/antlers-next-ci.yml"
  pull_request:
    paths:
      - "frontend/raxx-next/**"
      - ".github/workflows/antlers-next-ci.yml"

# AFTER — remove push trigger entirely
on:
  pull_request:
    paths:
      - "frontend/raxx-next/**"
      - ".github/workflows/antlers-next-ci.yml"

Moving ZAP to the release/main boundary (security-zap.yml):

# BEFORE
on:
  pull_request:
    types: [opened, synchronize, reopened]
    paths:
      - 'frontend/raxx-next/**'
      - 'backend_v2/api/**'
  schedule:
    - cron: '7 9 * * 1'

# AFTER
on:
  pull_request:
    types: [opened, synchronize, reopened]
    branches: [release, main]          # only fires on develop→release or release→main PRs
    paths:
      - 'frontend/raxx-next/**'
      - 'backend_v2/api/**'
  schedule:
    - cron: '7 9 * * 1'               # weekly schedule unchanged
  workflow_dispatch:                   # unchanged

Moving Queue Docker smoke to boundary (queue-docker-smoke.yml):

# AFTER — add branches filter; do NOT remove pull_request trigger
on:
  pull_request:
    types: [opened, synchronize, reopened]
    branches: [release, main]          # gates the promotion PR
    paths:
      - 'queue/**'
      - '.github/workflows/queue-docker-smoke.yml'

Deploy re-pointing

The core semantic change: release branch push → staging; main branch push → prod.

Dynamic services (Heroku-deployed)

Workflow Current auto-trigger New auto-trigger Dispatch remains?
deploy-heroku.yml push: branches: [main] → staging push: branches: [release] → staging AND push: branches: [main] → prod Yes
deploy-console.yml push: branches: [main] → staging push: branches: [release] → staging Yes
deploy-queue.yml push: branches: [main] → staging push: branches: [release] → staging Yes
deploy-velvet.yml push: branches: [main] → staging push: branches: [release] → staging Yes

For deploy-heroku.yml specifically, the current model has one push trigger (main → staging) and a dispatch path (main → prod). The new model adds a second push trigger (main → prod) while keeping dispatch as the emergency override. The job condition logic (environment: staging vs environment: production) moves from the input to the branch context:

# Conceptual — implementation detail for feature-developer
on:
  push:
    branches:
      - release   # → staging auto
      - main      # → prod auto
  workflow_dispatch:
    inputs:
      environment: ...   # kept for emergency overrides

Antlers Next (Cloudflare Pages)

Critical note: CF Pages uses the term "production branch" (--branch=main in the wrangler deploy command) to mean the slot that serves the custom domain. This is a CF Pages API concept entirely distinct from the git branch named main. The --branch=main flag in the CF Pages deploy step does not change and should not be confused with the git main branch.

Workflow Current git trigger New git trigger CF Pages --branch flag
deploy-antlers-next-staging.yml push: branches: [main] push: branches: [release] --branch=main (unchanged — CF Pages production slot on staging project)
deploy-antlers-next-prod.yml push: branches: [main] push: branches: [main] --branch=main (unchanged)

This resolves the current anomaly where Antlers prod deploys on every merge to main. After the migration, prod deploys only when releasemain merges (an explicit promotion step), not on every feature landing.

Static-site / docs surfaces

The following surfaces have no staging counterpart and deploy to production directly. They remain on push: branches: [main] with no change:

These sites are content-only or docs; they do not touch order flow or credentials. Keeping them on main is acceptable and avoids the overhead of a staging deploy cycle for docs content.

Cloudflare Workers / WAF


Branch protection specification

develop (new — required checks)

Check Required Notes
smoke_suite (ci-pr.yml) Yes (if code changed) Light; < 5 min
base_branch_lint Yes Must compare against origin/develop (see Migration §Scripts)
stale-branch-guard Yes Must compare against origin/develop
migration-gate Yes (if migrations changed) Unchanged
pii-scan Yes Unchanged
flag_promotion_check Yes (if flags changed) Unchanged
gitleaks Yes Unchanged

Required reviewers: 0 (ADR-0020 reviewer-gate waiver currently in effect). Direct pushes: disallowed. Force-push: disallowed.

release (new)

Check Required Notes
All develop checks Yes ci-pr.yml runs branch-agnostic
security-zap Yes Antlers/backend path filter
queue-docker-smoke Yes (if queue/ changed) Path-filtered
e2e-smoke Yes Full Playwright
antlers-next-ci (full: typecheck + build + Playwright) Yes (if Antlers changed) PR trigger

Required reviewers: 1 (Kristerpher). This is the staging promotion gate. Direct pushes: disallowed. Only source: PRs from develop.

main (update existing)

Check Required Notes
All release checks Yes ci-pr.yml branch-agnostic
ios-ci Yes (if ios/ changed) Full iOS build at prod boundary
terraform-validate (+ plan with cloud creds) Yes (if terraform/ changed) Plan requires credentials
nightly-security-scan (manual dispatch mode) Advisory Run before major releases

Required reviewers: 1 (Kristerpher). This is the production promotion gate. Direct pushes: disallowed. Only source: PRs from release (and emergency hotfix path described below).

Emergency hotfix path

When a production incident requires a fix bypassing the develop→release→main cycle:

  1. Branch from main HEAD (not develop).
  2. Open PR targeting main directly with label hotfix.
  3. Required reviewer approves. CI runs in reduced mode (merge allowed on green light-gate subset; heavy scans advisory).
  4. After merge to main, cherry-pick the fix commit to release and then develop to keep branches in sync. This is a manual step — the sre-agent runbook (to be filed as sub-card) must document this procedure.

Scripts that require updates

Two CI scripts hard-code origin/main and must be parameterized on the PR base branch:

scripts/ci/check_pr_base_clean.sh

Currently: PR_COMMITS=$(git log --format="%H" origin/main..HEAD 2>/dev/null)

New: accept the base branch via environment variable, defaulting to develop:

BASE_BRANCH="${GITHUB_BASE_REF:-develop}"
git fetch origin "${BASE_BRANCH}" --quiet
PR_COMMITS=$(git log --format="%H" "origin/${BASE_BRANCH}..HEAD" 2>/dev/null)

In ci-pr.yml's base_branch_lint job, the GITHUB_BASE_REF context variable is set automatically by GitHub Actions for pull_request events. No workflow change needed — only the script changes.

scripts/ci/check_stale_branch.sh

Currently: hard-codes origin/main for fork-point and commit-count checks.

New:

BASE_BRANCH="${GITHUB_BASE_REF:-develop}"
FORK_SHA=$(git merge-base "origin/${BASE_BRANCH}" HEAD)
COMMITS_BEHIND=$(git rev-list --count "${FORK_SHA}..origin/${BASE_BRANCH}")

The ci-pr.yml step that fetches origin/main before running the stale check must also be updated:

- name: Fetch latest integration branch
  run: git fetch origin ${{ github.base_ref }} --quiet

_shared-conventions.md — branch base rule

The existing rule says: "Every agent-dispatched feature branch MUST be created directly from origin/main." After migration this becomes origin/develop.

The base_branch_lint CI job reference in that doc ("CI job base_branch_lint fails any PR where a commit in origin/main..HEAD also appears on another remote branch") must be updated to say origin/<base_branch> to reflect the parameterized script.

check_pr_base_clean.sh — exclusion of develop, release, main

The script excludes the current PR branch from its "commit appears on other branches" check. After the migration, it should also exclude the three integration branches (develop, release, main) from the "other branches" scan to avoid false positives when a PR branch forks from a point that is also the tip of one of the integration branches.


Migration plan (ordered, SRE-executable)

Operator sign-off is required before executing steps marked [GATE].

Phase 0 — pre-flight (no risk)

0a. Merge all in-flight PRs to main before any trigger changes. Any PR still open against main after step 1 must be retargeted manually. 0b. Confirm main is green on all CI checks. 0c. Snapshot current branch-protection settings to docs/ops/runbooks/branch-protection-snapshot-2026-NNNN.md for rollback reference.

Phase 1 — create branches [GATE: operator sign-off]

1a. Create develop from main HEAD: bash git fetch origin main git push origin origin/main:refs/heads/develop 1b. Create release from main HEAD: bash git push origin origin/main:refs/heads/release At this point all three branches point to the same SHA. No behavior has changed.

Phase 2 — retarget in-flight PRs [GATE: operator sign-off]

2a. List all open PRs currently targeting main: bash gh pr list --base main --json number,title,headRefName 2b. For each open PR, change base to develop: bash gh pr edit <number> --base develop GitHub will re-run CI against the new base. Authors should git fetch origin develop && git rebase origin/develop locally to keep their branches current.

Phase 3 — update scripts and conventions

3a. Update scripts/ci/check_pr_base_clean.sh — parameterize on $GITHUB_BASE_REF. 3b. Update scripts/ci/check_stale_branch.sh — parameterize on $GITHUB_BASE_REF. 3c. Update ci-pr.yml stale-branch step fetch from origin main to origin ${{ github.base_ref }}. 3d. Update .claude/agents/_shared-conventions.md — "branch from origin/develop". 3e. Update memory entry feedback_pr_base_main.md — "develop is the new main for agents".

These changes ship as a single PR targeting develop (the first PR to target the new default branch). This PR is also the validation that the retargeted ci-pr.yml machinery works.

Phase 4 — deploy workflow re-pointing

4a. For each dynamic service workflow listed in the Deploy re-pointing section, change push: branches: [main]push: branches: [release] (staging trigger) and add push: branches: [main] → prod trigger where applicable. 4b. For deploy-antlers-next-staging.yml: change git trigger to push: branches: [release]; leave the CF Pages --branch=main flag unchanged. 4c. For deploy-antlers-next-prod.yml: trigger stays on push: branches: [main] — no change. 4d. Ship as a single PR targeting develop, then fast-track through releasemain using the new flow (first use of the new promotion path).

Phase 5 — CI trigger matrix changes

5a. Remove bare push: trigger from antlers-next-ci.yml (keep pull_request: only). 5b. Remove bare push: paths: [ios/**] trigger from ios-ci.yml (keep pull_request: only). 5c. Add branches: [release, main] filter to security-zap.yml pull_request: trigger. 5d. Add branches: [release, main] filter to queue-docker-smoke.yml pull_request: trigger. 5e. Add branches: [release, main] filter to e2e-smoke.yml pull_request: trigger. 5f. Update ci.yml trigger from push: branches: [main]push: branches: [develop]. (The same jobs, now running on the integration branch.) 5g. Add push: branches: [develop, release, main] to lint-*.yml and pii-scan.yml push filters. 5h. Create minimal ci-boundary.yml (pii-scan + gitleaks) on push: branches: [release, main]. Ship as single PR targeting develop.

Phase 6 — branch protection [GATE: operator sign-off]

6a. Set protection on develop per spec above (Kristerpher applies via GitHub Settings). 6b. Set protection on release per spec above (1 required reviewer). 6c. Update main protection to add 1 required reviewer on PR (was: push-only). 6d. Verify required status checks match the new workflow names (the CI job names must match what branch protection lists).

Phase 7 — change default branch [GATE: operator sign-off; IRREVERSIBLE without operator action]

7a. In GitHub Settings → Branches, change default branch from main to develop. 7b. Update any repository links in docs that reference the default branch implicitly (e.g., github.com/raxx-app/TradeMasterAPI/blob/main/... in runbooks). 7c. Announce to all active developers / agent configurations that develop is now the base for all new work.


Rollback

The migration is designed so each phase is independently reversible before the next phase begins.

Phase Rollback
Phase 1 git push origin --delete develop release (no behaviors changed yet)
Phase 2 Re-run gh pr edit <number> --base main for each retargeted PR
Phase 3 Revert the PR that changed the scripts (standard git revert)
Phase 4 Revert the deploy workflow PR (standard git revert); staging goes back to being triggered by main push, prod stays manual
Phase 5 Revert the CI trigger PR
Phase 6 Remove branch protection rules (Settings → Branches, delete rules)
Phase 7 (irreversible) Change default branch back to main in Settings (requires operator UI action; cannot be done via API without admin PAT)

If rolling back after Phase 7, all in-flight PRs targeting develop must be retargeted to main again (same procedure as Phase 2, in reverse).


Security considerations


Security / GDPR checklist


Open questions

  1. Static-site prod-on-main: getraxx, customer-docs, internal-docs, flag-docs, status-page, mockups, support, antlers-cutover all deploy to prod on push to main with no staging counterpart. Should any of these gain a staging counterpart (and thus trigger on release instead)? Waiting on operator decision before sub-card for static sites is filed.

  2. Merge strategy for develop → release: Merge commit is recommended (to preserve develop's squash history on release). Confirm with operator that the release branch will not be a linear history requirement.

  3. Release cadence: Is the develop → release promotion daily, on demand, or gated on a time-based soak? The ADR designs the mechanism; the cadence is an operator process decision.

  4. Hotfix path reviewer requirement: The spec says the required reviewer on main applies to all PRs including hotfixes. A hotfix during an incident may not allow time for review. Should hotfixes have a bypass policy (label hotfix + admin override)? Needs operator decision before branch-protection sub-card is filed.

  5. check_pr_base_clean.sh behavior on develop → release PRs: When a developer merges developrelease, the commits in develop..HEAD will include squash commits that were previously on feature branches. The script checks for commits that "also appear on other remote branches." After feature branches are deleted post-merge (standard GitHub behavior), this should be clean. But if feature branches are NOT deleted, the script may flag them as contamination. Confirm that GitHub is configured to auto-delete head branches on merge.


Alternatives considered

Keep single-main, add manual dispatch gate for prod (current direction, hardened)

The existing branch-promotion-strategy.md (Option B) proposes hardening the current model with an Environment approval gate on dispatch. Rejected for this design because it does not address the CI cost problem — heavy scans still run on every feature PR.

develop + main (skip release)

Two-branch model: develop = staging trigger, main = prod trigger. Simpler, but loses the ability to accumulate multiple develop merges before a staging push, and conflates "integration" with "deployed to staging," which can create noise when staging deploys are frequent.

Trunk-based with feature flags

All work on main, feature flags gate exposure. Rejected because the monorepo's deploy complexity (8+ deploy targets, some with no staging) makes trunk-based prod deploys too risky pre-launch. The existing paper-first gate is a runtime invariant, not a deploy-pipeline gate.


Revisit when