ADR 0115 — develop → release → main branching model

Status: Proposed Date: 2026-06-29 UTC Deciders: Kristerpher (operator directive), software-architect Scope: All CI/CD workflows (.github/workflows/), branch-protection rules, deploy triggers, CI gate scripts

Context

The monorepo currently uses a single long-lived branch (main) as the integration surface, staging trigger, and production trigger simultaneously. The consequence is:

Staging and production share a trigger. Push to main auto-deploys Raptor, Console, Velvet, Queue, and Antlers Next to staging. For Antlers Next (prod), deploy-antlers-next-prod.yml also fires on push to main (path-filtered to frontend/raxx-next/**), meaning a single merge to main can deploy to prod with no explicit promotion step.
Heavy scans run on every feature PR. OWASP ZAP (security-zap.yml), Queue Docker smoke (queue-docker-smoke.yml — 25–30 min cold build on ubuntu), and Playwright e2e (e2e-smoke.yml, antlers-next-ci.yml) trigger on every PR that touches their path filter. With multiple daily PRs this compounds rapidly.
Double-trigger billing. antlers-next-ci.yml and ios-ci.yml carry both a bare push: paths: trigger (no branches: filter) and a pull_request: paths: trigger. GitHub Actions fires both events when a developer pushes to a feature branch with an open PR — running typecheck + build + vitest + Playwright twice, and iOS builds (macOS runners at 10× ubuntu cost) twice.
No explicit staging-to-prod promotion gate. The only gate between staging and prod is a manual workflow_dispatch call (for Heroku services) or the Antlers prod push-on-main auto-trigger. There is no branch that semantically means "this has been validated on staging and is cleared for prod."

Operator directive (verbatim): "develop → release → main … Release points to staging and main points to production. The heaviest scans should only be when we merge to main, because we don't have to churn the big issues."

Invariants

These apply to the pipeline design and are non-negotiable:

Audit trail for every state change affecting money, permissions, or data access. Every prod deploy must be traceable to a specific SHA, actor identity, and timestamp. The GitHub Environment record plus the existing console audit callback satisfy this; the new model must not route around the environment gate.
No stored credentials. Deploy secrets (HEROKU_API_KEY, CF Pages tokens) remain in GitHub Environment secrets. No new credential storage surface is introduced.
Kill-switch must remain available. The existing workflow_dispatch path on every deploy workflow (manual deploy of any ref to any environment) is preserved as the emergency hotfix path.
Paper-first gating is a runtime invariant. Not a deploy-pipeline gate. No change here.
GDPR / PII. The PII scan (pii-scan.yml) must remain a hard gate on all integration branches, not relaxed to main-only.

Branch roles

`develop` — integration branch (default branch)

All feature work targets develop. Agents branch from develop. All PR-time validation (cheap gates) runs here. develop never deploys anywhere automatically; it is purely an integration surface.

Why develop is the default branch: GitHub's default branch governs the PR base for new PRs, the branch shown on the repo home page, and the target of git clone. Making develop the default branch means opening a PR via the GitHub UI defaults to targeting develop, which is what every contributor (human and agent) should do.

`release` — staging promotion branch

A PR from develop → release is the explicit staging promotion step. Merging develop → release auto-deploys all services to their staging environments. Heavy boundary scans (ZAP against staging, Queue Docker smoke, full Playwright e2e) run as required checks on the develop → release PR so they gate the merge, not the post-merge deploy.

`main` — production branch

A PR from release → main is the explicit production promotion step. Merging release → main auto-deploys all services to production. The heaviest scans (nightly-class security scan, terraform plan with cloud credentials, iOS full build) run as required checks on release → main PRs. main is a receive-only branch; no feature work ever targets it directly.

Data model / state machine

feature/* ─── PR ──► develop ─── PR ──► release ─── PR ──► main
                       ↓                    ↓                ↓
                  (no deploy)          → STAGING          → PROD
                  cheap CI             medium CI          heavy CI

Merge strategy per boundary

Boundary	Strategy	Rationale
feature → develop	Squash merge	Current convention; keeps develop linear
develop → release	Merge commit	Preserves develop's commit graph on release
release → main	Merge commit	Release SHA is the prod deploy artifact; must be traceable

CI trigger matrix

This table states the new trigger for every workflow that currently references main or has a double-trigger pattern. Workflows not listed are schedule-only or workflow_dispatch-only and require no trigger changes.

Lightweight gates (run on every PR, branch-agnostic)

Workflow	New `on:` trigger	Gating
ci-pr.yml	`pull_request: types: [opened, synchronize, reopened]` (no branches filter — unchanged)	smoke_suite, base_branch_lint, stale-branch-guard, flag_promotion_check, migration-gate, pii-scan, sqitch_plan_lint, asset-manifest-check
pii-scan.yml	`pull_request` + `push: branches: [develop, release, main]`	Hard gate; remains branch-agnostic
lint-cf-tokens.yml	`pull_request` + `push: branches: [develop, release, main]`	Add develop + release to push filter
lint-cf-access-headers.yml	same pattern	same
lint-cf-pages-deploy-uniqueness.yml	same pattern	same
lint-workflow-secret-names.yml	same pattern	same
terraform-validate.yml	`pull_request:` (branch-agnostic) + `push: branches: [release, main]`	fmt + validate stays cheap at PR; runs post-merge on release/main

Heavy boundary gates (run only at develop → release or release → main)

These workflows REMOVE their current broad pull_request: or bare-push: trigger and replace it with a boundary-scoped trigger:

Workflow	Old trigger	New trigger	Cost saved
security-zap.yml	`pull_request:` (all PRs touching Antlers/backend)	`pull_request: branches: [release, main]` + `schedule:`	ZAP fires once per promotion, not per feature PR
queue-docker-smoke.yml	`pull_request:` (all PRs touching queue/)	`pull_request: branches: [release, main]` + paths filter	Saves 25–30 min Docker build per feature PR
e2e-smoke.yml	`pull_request:` (all PRs touching auth/options/frontend)	`pull_request: branches: [release, main]` + `schedule:`	Full Playwright fires once per promotion
ios-ci.yml	`push: paths: [ios/]` + `pull_request: paths: [ios/]`	`pull_request: paths: [ios/**]` ONLY (remove push trigger)	Eliminates push/PR double-billing on macOS runners (10× cost)
antlers-next-ci.yml	`push: paths: [frontend/raxx-next/**]` + `pull_request: paths: [...]`	`pull_request: paths: [...]` ONLY (remove push trigger)	Eliminates double-billing: typecheck + build + Playwright run once, not twice

CI on integration branches (post-merge validation)

Workflow	New trigger	Purpose
ci.yml (rename: ci-develop.yml)	`push: branches: [develop]`	Post-merge smoke on integration branch (replaces push-to-main)
ci-boundary.yml (NEW — minimal)	`push: branches: [release, main]`	Confirm deploy-ready SHA is clean after boundary merge; thin wrapper re-running pii-scan + gitleaks

The existing ci.yml currently triggers on push: branches: [main]. After renaming to signal its new home, it triggers on push: branches: [develop]. A lightweight new workflow ci-boundary.yml handles post-merge checks on release and main (the boundary merge CI is primarily handled by the PR checks; ci-boundary.yml is a thin safety net).

YAML trigger shape — concrete examples

Eliminating the Antlers double-trigger (antlers-next-ci.yml):

# BEFORE
on:
  push:
    paths:
      - "frontend/raxx-next/**"
      - ".github/workflows/antlers-next-ci.yml"
  pull_request:
    paths:
      - "frontend/raxx-next/**"
      - ".github/workflows/antlers-next-ci.yml"

# AFTER — remove push trigger entirely
on:
  pull_request:
    paths:
      - "frontend/raxx-next/**"
      - ".github/workflows/antlers-next-ci.yml"

Moving ZAP to the release/main boundary (security-zap.yml):

# BEFORE
on:
  pull_request:
    types: [opened, synchronize, reopened]
    paths:
      - 'frontend/raxx-next/**'
      - 'backend_v2/api/**'
  schedule:
    - cron: '7 9 * * 1'

# AFTER
on:
  pull_request:
    types: [opened, synchronize, reopened]
    branches: [release, main]          # only fires on develop→release or release→main PRs
    paths:
      - 'frontend/raxx-next/**'
      - 'backend_v2/api/**'
  schedule:
    - cron: '7 9 * * 1'               # weekly schedule unchanged
  workflow_dispatch:                   # unchanged

Moving Queue Docker smoke to boundary (queue-docker-smoke.yml):

# AFTER — add branches filter; do NOT remove pull_request trigger
on:
  pull_request:
    types: [opened, synchronize, reopened]
    branches: [release, main]          # gates the promotion PR
    paths:
      - 'queue/**'
      - '.github/workflows/queue-docker-smoke.yml'

Deploy re-pointing

The core semantic change: release branch push → staging; main branch push → prod.

Dynamic services (Heroku-deployed)

Workflow	Current auto-trigger	New auto-trigger	Dispatch remains?
deploy-heroku.yml	`push: branches: [main]` → staging	`push: branches: [release]` → staging AND `push: branches: [main]` → prod	Yes
deploy-console.yml	`push: branches: [main]` → staging	`push: branches: [release]` → staging	Yes
deploy-queue.yml	`push: branches: [main]` → staging	`push: branches: [release]` → staging	Yes
deploy-velvet.yml	`push: branches: [main]` → staging	`push: branches: [release]` → staging	Yes

For deploy-heroku.yml specifically, the current model has one push trigger (main → staging) and a dispatch path (main → prod). The new model adds a second push trigger (main → prod) while keeping dispatch as the emergency override. The job condition logic (environment: staging vs environment: production) moves from the input to the branch context:

# Conceptual — implementation detail for feature-developer
on:
  push:
    branches:
      - release   # → staging auto
      - main      # → prod auto
  workflow_dispatch:
    inputs:
      environment: ...   # kept for emergency overrides

Antlers Next (Cloudflare Pages)

Critical note: CF Pages uses the term "production branch" (--branch=main in the wrangler deploy command) to mean the slot that serves the custom domain. This is a CF Pages API concept entirely distinct from the git branch named main. The --branch=main flag in the CF Pages deploy step does not change and should not be confused with the git main branch.

Workflow	Current git trigger	New git trigger	CF Pages `--branch` flag
deploy-antlers-next-staging.yml	`push: branches: [main]`	`push: branches: [release]`	`--branch=main` (unchanged — CF Pages production slot on staging project)
deploy-antlers-next-prod.yml	`push: branches: [main]`	`push: branches: [main]`	`--branch=main` (unchanged)

This resolves the current anomaly where Antlers prod deploys on every merge to main. After the migration, prod deploys only when release → main merges (an explicit promotion step), not on every feature landing.

Static-site / docs surfaces

The following surfaces have no staging counterpart and deploy to production directly. They remain on push: branches: [main] with no change:

deploy-getraxx.yml, deploy-customer-docs.yml, deploy-internal-docs.yml, deploy-flag-docs.yml, deploy-status-page.yml, deploy-support.yml, deploy-mockups.yml, deploy-antlers-cutover.yml

These sites are content-only or docs; they do not touch order flow or credentials. Keeping them on main is acceptable and avoids the overhead of a staging deploy cycle for docs content.

Cloudflare Workers / WAF

deploy-status-worker.yml, deploy-worker-waf-log-shipper.yml: re-point from push: branches: [main] to push: branches: [release] for staging deploy, and add push: branches: [main] for prod. Same pattern as Heroku services.

Branch protection specification

`develop` (new — required checks)

Check	Required	Notes
smoke_suite (ci-pr.yml)	Yes (if code changed)	Light; < 5 min
base_branch_lint	Yes	Must compare against `origin/develop` (see Migration §Scripts)
stale-branch-guard	Yes	Must compare against `origin/develop`
migration-gate	Yes (if migrations changed)	Unchanged
pii-scan	Yes	Unchanged
flag_promotion_check	Yes (if flags changed)	Unchanged
gitleaks	Yes	Unchanged

Required reviewers: 0 (ADR-0020 reviewer-gate waiver currently in effect). Direct pushes: disallowed. Force-push: disallowed.

`release` (new)

Check	Required	Notes
All develop checks	Yes	ci-pr.yml runs branch-agnostic
security-zap	Yes	Antlers/backend path filter
queue-docker-smoke	Yes (if queue/ changed)	Path-filtered
e2e-smoke	Yes	Full Playwright
antlers-next-ci (full: typecheck + build + Playwright)	Yes (if Antlers changed)	PR trigger

Required reviewers: 1 (Kristerpher). This is the staging promotion gate. Direct pushes: disallowed. Only source: PRs from develop.

`main` (update existing)

Check	Required	Notes
All release checks	Yes	ci-pr.yml branch-agnostic
ios-ci	Yes (if ios/ changed)	Full iOS build at prod boundary
terraform-validate (+ plan with cloud creds)	Yes (if terraform/ changed)	Plan requires credentials
nightly-security-scan (manual dispatch mode)	Advisory	Run before major releases

Required reviewers: 1 (Kristerpher). This is the production promotion gate. Direct pushes: disallowed. Only source: PRs from release (and emergency hotfix path described below).

Emergency hotfix path

When a production incident requires a fix bypassing the develop→release→main cycle:

Branch from main HEAD (not develop).
Open PR targeting main directly with label hotfix.
Required reviewer approves. CI runs in reduced mode (merge allowed on green light-gate subset; heavy scans advisory).
After merge to main, cherry-pick the fix commit to release and then develop to keep branches in sync. This is a manual step — the sre-agent runbook (to be filed as sub-card) must document this procedure.

Scripts that require updates

Two CI scripts hard-code origin/main and must be parameterized on the PR base branch:

`scripts/ci/check_pr_base_clean.sh`

Currently: PR_COMMITS=$(git log --format="%H" origin/main..HEAD 2>/dev/null)

New: accept the base branch via environment variable, defaulting to develop:

BASE_BRANCH="${GITHUB_BASE_REF:-develop}"
git fetch origin "${BASE_BRANCH}" --quiet
PR_COMMITS=$(git log --format="%H" "origin/${BASE_BRANCH}..HEAD" 2>/dev/null)

In ci-pr.yml's base_branch_lint job, the GITHUB_BASE_REF context variable is set automatically by GitHub Actions for pull_request events. No workflow change needed — only the script changes.

`scripts/ci/check_stale_branch.sh`

Currently: hard-codes origin/main for fork-point and commit-count checks.

New:

BASE_BRANCH="${GITHUB_BASE_REF:-develop}"
FORK_SHA=$(git merge-base "origin/${BASE_BRANCH}" HEAD)
COMMITS_BEHIND=$(git rev-list --count "${FORK_SHA}..origin/${BASE_BRANCH}")

The ci-pr.yml step that fetches origin/main before running the stale check must also be updated:

- name: Fetch latest integration branch
  run: git fetch origin ${{ github.base_ref }} --quiet

`_shared-conventions.md` — branch base rule

The existing rule says: "Every agent-dispatched feature branch MUST be created directly from origin/main." After migration this becomes origin/develop.

The base_branch_lint CI job reference in that doc ("CI job base_branch_lint fails any PR where a commit in origin/main..HEAD also appears on another remote branch") must be updated to say origin/<base_branch> to reflect the parameterized script.

`check_pr_base_clean.sh` — exclusion of `develop`, `release`, `main`

The script excludes the current PR branch from its "commit appears on other branches" check. After the migration, it should also exclude the three integration branches (develop, release, main) from the "other branches" scan to avoid false positives when a PR branch forks from a point that is also the tip of one of the integration branches.

Migration plan (ordered, SRE-executable)

Operator sign-off is required before executing steps marked [GATE].

Phase 0 — pre-flight (no risk)

0a. Merge all in-flight PRs to main before any trigger changes. Any PR still open against main after step 1 must be retargeted manually. 0b. Confirm main is green on all CI checks. 0c. Snapshot current branch-protection settings to docs/ops/runbooks/branch-protection-snapshot-2026-NNNN.md for rollback reference.

Phase 1 — create branches [GATE: operator sign-off]

1a. Create develop from main HEAD: bash git fetch origin main git push origin origin/main:refs/heads/develop 1b. Create release from main HEAD: bash git push origin origin/main:refs/heads/release At this point all three branches point to the same SHA. No behavior has changed.

Phase 2 — retarget in-flight PRs [GATE: operator sign-off]

2a. List all open PRs currently targeting main: bash gh pr list --base main --json number,title,headRefName 2b. For each open PR, change base to develop: bash gh pr edit <number> --base develop GitHub will re-run CI against the new base. Authors should git fetch origin develop && git rebase origin/develop locally to keep their branches current.

Phase 3 — update scripts and conventions

3a. Update scripts/ci/check_pr_base_clean.sh — parameterize on $GITHUB_BASE_REF. 3b. Update scripts/ci/check_stale_branch.sh — parameterize on $GITHUB_BASE_REF. 3c. Update ci-pr.yml stale-branch step fetch from origin main to origin ${{ github.base_ref }}. 3d. Update .claude/agents/_shared-conventions.md — "branch from origin/develop". 3e. Update memory entry feedback_pr_base_main.md — "develop is the new main for agents".

These changes ship as a single PR targeting develop (the first PR to target the new default branch). This PR is also the validation that the retargeted ci-pr.yml machinery works.

Phase 4 — deploy workflow re-pointing

4a. For each dynamic service workflow listed in the Deploy re-pointing section, change push: branches: [main] → push: branches: [release] (staging trigger) and add push: branches: [main] → prod trigger where applicable. 4b. For deploy-antlers-next-staging.yml: change git trigger to push: branches: [release]; leave the CF Pages --branch=main flag unchanged. 4c. For deploy-antlers-next-prod.yml: trigger stays on push: branches: [main] — no change. 4d. Ship as a single PR targeting develop, then fast-track through release → main using the new flow (first use of the new promotion path).

Phase 5 — CI trigger matrix changes

5a. Remove bare push: trigger from antlers-next-ci.yml (keep pull_request: only). 5b. Remove bare push: paths: [ios/**] trigger from ios-ci.yml (keep pull_request: only). 5c. Add branches: [release, main] filter to security-zap.yml pull_request: trigger. 5d. Add branches: [release, main] filter to queue-docker-smoke.yml pull_request: trigger. 5e. Add branches: [release, main] filter to e2e-smoke.yml pull_request: trigger. 5f. Update ci.yml trigger from push: branches: [main] → push: branches: [develop]. (The same jobs, now running on the integration branch.) 5g. Add push: branches: [develop, release, main] to lint-*.yml and pii-scan.yml push filters. 5h. Create minimal ci-boundary.yml (pii-scan + gitleaks) on push: branches: [release, main]. Ship as single PR targeting develop.

Phase 6 — branch protection [GATE: operator sign-off]

6a. Set protection on develop per spec above (Kristerpher applies via GitHub Settings). 6b. Set protection on release per spec above (1 required reviewer). 6c. Update main protection to add 1 required reviewer on PR (was: push-only). 6d. Verify required status checks match the new workflow names (the CI job names must match what branch protection lists).

Phase 7 — change default branch [GATE: operator sign-off; IRREVERSIBLE without operator action]

7a. In GitHub Settings → Branches, change default branch from main to develop. 7b. Update any repository links in docs that reference the default branch implicitly (e.g., github.com/raxx-app/TradeMasterAPI/blob/main/... in runbooks). 7c. Announce to all active developers / agent configurations that develop is now the base for all new work.

Rollback

The migration is designed so each phase is independently reversible before the next phase begins.

Phase	Rollback
Phase 1	`git push origin --delete develop release` (no behaviors changed yet)
Phase 2	Re-run `gh pr edit <number> --base main` for each retargeted PR
Phase 3	Revert the PR that changed the scripts (standard git revert)
Phase 4	Revert the deploy workflow PR (standard git revert); staging goes back to being triggered by main push, prod stays manual
Phase 5	Revert the CI trigger PR
Phase 6	Remove branch protection rules (Settings → Branches, delete rules)
Phase 7 (irreversible)	Change default branch back to `main` in Settings (requires operator UI action; cannot be done via API without admin PAT)

If rolling back after Phase 7, all in-flight PRs targeting develop must be retargeted to main again (same procedure as Phase 2, in reverse).

Security considerations

PII scan is non-negotiable on all branches. The pii-scan.yml push trigger must include develop, release, and main. Relaxing PII gates to main-only would mean PII could exist undetected on develop for an entire release cycle.
Audit trail integrity. The new model adds an explicit human approval step (required reviewer) on release and main PRs, which IMPROVES the audit trail versus the current push-to-main auto-deploy path.
Emergency hotfix path does not bypass PII or gitleaks. The hotfix PR targets main directly, but ci-pr.yml is branch-agnostic — it still runs all light gates.
No new credential surfaces. The workflow_dispatch paths on deploy workflows remain and continue to use the existing GitHub Environment secrets. No new secret storage is introduced.
Breach notification path. Unchanged — the nightly security scan and ZAP weekly schedule continue to operate and file issues.

PII collected: None — this ADR describes pipeline topology, not data collection.
Retention period: N/A.
Deletion on DSR: N/A.
Audit trail: Every deploy action is attributed to the branch SHA, the GitHub Actions run ID, and the approving reviewer. Merge commits on release and main are traceable to the originating develop squash commit. No regression.
Stored credentials: None introduced. Existing GitHub Environment secrets unchanged.
Breach notification path: ZAP (weekly schedule) and nightly-security-scan remain operative. No change to alert routing.
Secrets location + rotation: All in GitHub Environment secrets and Infisical vault. Rotatable without redeploy (existing policy unchanged).
Kill-switch: workflow_dispatch paths preserved on all deploy workflows. Prod deploys can be halted by removing the main push trigger from any workflow without touching the dispatch path.

Open questions

Static-site prod-on-main: getraxx, customer-docs, internal-docs, flag-docs, status-page, mockups, support, antlers-cutover all deploy to prod on push to main with no staging counterpart. Should any of these gain a staging counterpart (and thus trigger on release instead)? Waiting on operator decision before sub-card for static sites is filed.
Merge strategy for develop → release: Merge commit is recommended (to preserve develop's squash history on release). Confirm with operator that the release branch will not be a linear history requirement.
Release cadence: Is the develop → release promotion daily, on demand, or gated on a time-based soak? The ADR designs the mechanism; the cadence is an operator process decision.
Hotfix path reviewer requirement: The spec says the required reviewer on main applies to all PRs including hotfixes. A hotfix during an incident may not allow time for review. Should hotfixes have a bypass policy (label hotfix + admin override)? Needs operator decision before branch-protection sub-card is filed.
check_pr_base_clean.sh behavior on develop → release PRs: When a developer merges develop → release, the commits in develop..HEAD will include squash commits that were previously on feature branches. The script checks for commits that "also appear on other remote branches." After feature branches are deleted post-merge (standard GitHub behavior), this should be clean. But if feature branches are NOT deleted, the script may flag them as contamination. Confirm that GitHub is configured to auto-delete head branches on merge.

Alternatives considered

Keep single-main, add manual dispatch gate for prod (current direction, hardened)

The existing branch-promotion-strategy.md (Option B) proposes hardening the current model with an Environment approval gate on dispatch. Rejected for this design because it does not address the CI cost problem — heavy scans still run on every feature PR.

develop + main (skip release)

Two-branch model: develop = staging trigger, main = prod trigger. Simpler, but loses the ability to accumulate multiple develop merges before a staging push, and conflates "integration" with "deployed to staging," which can create noise when staging deploys are frequent.

Trunk-based with feature flags

All work on main, feature flags gate exposure. Rejected because the monorepo's deploy complexity (8+ deploy targets, some with no staging) makes trunk-based prod deploys too risky pre-launch. The existing paper-first gate is a runtime invariant, not a deploy-pipeline gate.

Revisit when

Repo volume exceeds 50 PRs/day — at that point the develop→release PR promotion overhead may need automation (scheduled develop→release merge bot).
Antlers and Raptor are separated into independent repos — the branching model would then apply per-repo rather than monorepo-wide.
ADR-0020 reviewer-gate waiver is lifted — the required reviewer count on develop should be revisited at that point.