ADR 0115 — develop → release → main branching model
Status: Proposed Date: 2026-06-29 UTC Deciders: Kristerpher (operator directive), software-architect Scope: All CI/CD workflows (.github/workflows/), branch-protection rules, deploy triggers, CI gate scripts
Context
The monorepo currently uses a single long-lived branch (main) as the integration
surface, staging trigger, and production trigger simultaneously. The consequence is:
-
Staging and production share a trigger. Push to
mainauto-deploys Raptor, Console, Velvet, Queue, and Antlers Next to staging. For Antlers Next (prod),deploy-antlers-next-prod.ymlalso fires on push tomain(path-filtered tofrontend/raxx-next/**), meaning a single merge to main can deploy to prod with no explicit promotion step. -
Heavy scans run on every feature PR. OWASP ZAP (
security-zap.yml), Queue Docker smoke (queue-docker-smoke.yml— 25–30 min cold build on ubuntu), and Playwright e2e (e2e-smoke.yml,antlers-next-ci.yml) trigger on every PR that touches their path filter. With multiple daily PRs this compounds rapidly. -
Double-trigger billing.
antlers-next-ci.ymlandios-ci.ymlcarry both a barepush: paths:trigger (nobranches:filter) and apull_request: paths:trigger. GitHub Actions fires both events when a developer pushes to a feature branch with an open PR — running typecheck + build + vitest + Playwright twice, and iOS builds (macOS runners at 10× ubuntu cost) twice. -
No explicit staging-to-prod promotion gate. The only gate between staging and prod is a manual
workflow_dispatchcall (for Heroku services) or the Antlers prod push-on-mainauto-trigger. There is no branch that semantically means "this has been validated on staging and is cleared for prod."
Operator directive (verbatim): "develop → release → main … Release points to staging and main points to production. The heaviest scans should only be when we merge to main, because we don't have to churn the big issues."
Invariants
These apply to the pipeline design and are non-negotiable:
- Audit trail for every state change affecting money, permissions, or data access. Every prod deploy must be traceable to a specific SHA, actor identity, and timestamp. The GitHub Environment record plus the existing console audit callback satisfy this; the new model must not route around the environment gate.
- No stored credentials. Deploy secrets (HEROKU_API_KEY, CF Pages tokens) remain in GitHub Environment secrets. No new credential storage surface is introduced.
- Kill-switch must remain available. The existing
workflow_dispatchpath on every deploy workflow (manual deploy of any ref to any environment) is preserved as the emergency hotfix path. - Paper-first gating is a runtime invariant. Not a deploy-pipeline gate. No change here.
- GDPR / PII. The PII scan (
pii-scan.yml) must remain a hard gate on all integration branches, not relaxed to main-only.
Branch roles
develop — integration branch (default branch)
All feature work targets develop. Agents branch from develop. All PR-time
validation (cheap gates) runs here. develop never deploys anywhere
automatically; it is purely an integration surface.
Why develop is the default branch: GitHub's default branch governs the PR
base for new PRs, the branch shown on the repo home page, and the target of
git clone. Making develop the default branch means opening a PR via the GitHub
UI defaults to targeting develop, which is what every contributor (human and
agent) should do.
release — staging promotion branch
A PR from develop → release is the explicit staging promotion step.
Merging develop → release auto-deploys all services to their staging
environments. Heavy boundary scans (ZAP against staging, Queue Docker smoke,
full Playwright e2e) run as required checks on the develop → release PR so
they gate the merge, not the post-merge deploy.
main — production branch
A PR from release → main is the explicit production promotion step.
Merging release → main auto-deploys all services to production. The heaviest
scans (nightly-class security scan, terraform plan with cloud credentials,
iOS full build) run as required checks on release → main PRs. main is a
receive-only branch; no feature work ever targets it directly.
Data model / state machine
feature/* ─── PR ──► develop ─── PR ──► release ─── PR ──► main
↓ ↓ ↓
(no deploy) → STAGING → PROD
cheap CI medium CI heavy CI
Merge strategy per boundary
| Boundary | Strategy | Rationale |
|---|---|---|
| feature → develop | Squash merge | Current convention; keeps develop linear |
| develop → release | Merge commit | Preserves develop's commit graph on release |
| release → main | Merge commit | Release SHA is the prod deploy artifact; must be traceable |
CI trigger matrix
This table states the new trigger for every workflow that currently references main
or has a double-trigger pattern. Workflows not listed are schedule-only or
workflow_dispatch-only and require no trigger changes.
Lightweight gates (run on every PR, branch-agnostic)
| Workflow | New on: trigger |
Gating |
|---|---|---|
| ci-pr.yml | pull_request: types: [opened, synchronize, reopened] (no branches filter — unchanged) |
smoke_suite, base_branch_lint, stale-branch-guard, flag_promotion_check, migration-gate, pii-scan, sqitch_plan_lint, asset-manifest-check |
| pii-scan.yml | pull_request + push: branches: [develop, release, main] |
Hard gate; remains branch-agnostic |
| lint-cf-tokens.yml | pull_request + push: branches: [develop, release, main] |
Add develop + release to push filter |
| lint-cf-access-headers.yml | same pattern | same |
| lint-cf-pages-deploy-uniqueness.yml | same pattern | same |
| lint-workflow-secret-names.yml | same pattern | same |
| terraform-validate.yml | pull_request: (branch-agnostic) + push: branches: [release, main] |
fmt + validate stays cheap at PR; runs post-merge on release/main |
Heavy boundary gates (run only at develop → release or release → main)
These workflows REMOVE their current broad pull_request: or bare-push: trigger
and replace it with a boundary-scoped trigger:
| Workflow | Old trigger | New trigger | Cost saved |
|---|---|---|---|
| security-zap.yml | pull_request: (all PRs touching Antlers/backend) |
pull_request: branches: [release, main] + schedule: |
ZAP fires once per promotion, not per feature PR |
| queue-docker-smoke.yml | pull_request: (all PRs touching queue/) |
pull_request: branches: [release, main] + paths filter |
Saves 25–30 min Docker build per feature PR |
| e2e-smoke.yml | pull_request: (all PRs touching auth/options/frontend) |
pull_request: branches: [release, main] + schedule: |
Full Playwright fires once per promotion |
| ios-ci.yml | push: paths: [ios/**] + pull_request: paths: [ios/**] |
pull_request: paths: [ios/**] ONLY (remove push trigger) |
Eliminates push/PR double-billing on macOS runners (10× cost) |
| antlers-next-ci.yml | push: paths: [frontend/raxx-next/**] + pull_request: paths: [...] |
pull_request: paths: [...] ONLY (remove push trigger) |
Eliminates double-billing: typecheck + build + Playwright run once, not twice |
CI on integration branches (post-merge validation)
| Workflow | New trigger | Purpose |
|---|---|---|
| ci.yml (rename: ci-develop.yml) | push: branches: [develop] |
Post-merge smoke on integration branch (replaces push-to-main) |
| ci-boundary.yml (NEW — minimal) | push: branches: [release, main] |
Confirm deploy-ready SHA is clean after boundary merge; thin wrapper re-running pii-scan + gitleaks |
The existing ci.yml currently triggers on push: branches: [main]. After renaming
to signal its new home, it triggers on push: branches: [develop]. A lightweight
new workflow ci-boundary.yml handles post-merge checks on release and main (the
boundary merge CI is primarily handled by the PR checks; ci-boundary.yml is a
thin safety net).
YAML trigger shape — concrete examples
Eliminating the Antlers double-trigger (antlers-next-ci.yml):
# BEFORE
on:
push:
paths:
- "frontend/raxx-next/**"
- ".github/workflows/antlers-next-ci.yml"
pull_request:
paths:
- "frontend/raxx-next/**"
- ".github/workflows/antlers-next-ci.yml"
# AFTER — remove push trigger entirely
on:
pull_request:
paths:
- "frontend/raxx-next/**"
- ".github/workflows/antlers-next-ci.yml"
Moving ZAP to the release/main boundary (security-zap.yml):
# BEFORE
on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- 'frontend/raxx-next/**'
- 'backend_v2/api/**'
schedule:
- cron: '7 9 * * 1'
# AFTER
on:
pull_request:
types: [opened, synchronize, reopened]
branches: [release, main] # only fires on develop→release or release→main PRs
paths:
- 'frontend/raxx-next/**'
- 'backend_v2/api/**'
schedule:
- cron: '7 9 * * 1' # weekly schedule unchanged
workflow_dispatch: # unchanged
Moving Queue Docker smoke to boundary (queue-docker-smoke.yml):
# AFTER — add branches filter; do NOT remove pull_request trigger
on:
pull_request:
types: [opened, synchronize, reopened]
branches: [release, main] # gates the promotion PR
paths:
- 'queue/**'
- '.github/workflows/queue-docker-smoke.yml'
Deploy re-pointing
The core semantic change: release branch push → staging; main branch push → prod.
Dynamic services (Heroku-deployed)
| Workflow | Current auto-trigger | New auto-trigger | Dispatch remains? |
|---|---|---|---|
| deploy-heroku.yml | push: branches: [main] → staging |
push: branches: [release] → staging AND push: branches: [main] → prod |
Yes |
| deploy-console.yml | push: branches: [main] → staging |
push: branches: [release] → staging |
Yes |
| deploy-queue.yml | push: branches: [main] → staging |
push: branches: [release] → staging |
Yes |
| deploy-velvet.yml | push: branches: [main] → staging |
push: branches: [release] → staging |
Yes |
For deploy-heroku.yml specifically, the current model has one push trigger (main →
staging) and a dispatch path (main → prod). The new model adds a second push trigger
(main → prod) while keeping dispatch as the emergency override. The job condition
logic (environment: staging vs environment: production) moves from the input
to the branch context:
# Conceptual — implementation detail for feature-developer
on:
push:
branches:
- release # → staging auto
- main # → prod auto
workflow_dispatch:
inputs:
environment: ... # kept for emergency overrides
Antlers Next (Cloudflare Pages)
Critical note: CF Pages uses the term "production branch" (--branch=main in
the wrangler deploy command) to mean the slot that serves the custom domain. This
is a CF Pages API concept entirely distinct from the git branch named main. The
--branch=main flag in the CF Pages deploy step does not change and should
not be confused with the git main branch.
| Workflow | Current git trigger | New git trigger | CF Pages --branch flag |
|---|---|---|---|
| deploy-antlers-next-staging.yml | push: branches: [main] |
push: branches: [release] |
--branch=main (unchanged — CF Pages production slot on staging project) |
| deploy-antlers-next-prod.yml | push: branches: [main] |
push: branches: [main] |
--branch=main (unchanged) |
This resolves the current anomaly where Antlers prod deploys on every merge to main.
After the migration, prod deploys only when release → main merges (an explicit
promotion step), not on every feature landing.
Static-site / docs surfaces
The following surfaces have no staging counterpart and deploy to production directly.
They remain on push: branches: [main] with no change:
- deploy-getraxx.yml, deploy-customer-docs.yml, deploy-internal-docs.yml, deploy-flag-docs.yml, deploy-status-page.yml, deploy-support.yml, deploy-mockups.yml, deploy-antlers-cutover.yml
These sites are content-only or docs; they do not touch order flow or credentials.
Keeping them on main is acceptable and avoids the overhead of a staging deploy cycle
for docs content.
Cloudflare Workers / WAF
- deploy-status-worker.yml, deploy-worker-waf-log-shipper.yml: re-point from
push: branches: [main]topush: branches: [release]for staging deploy, and addpush: branches: [main]for prod. Same pattern as Heroku services.
Branch protection specification
develop (new — required checks)
| Check | Required | Notes |
|---|---|---|
| smoke_suite (ci-pr.yml) | Yes (if code changed) | Light; < 5 min |
| base_branch_lint | Yes | Must compare against origin/develop (see Migration §Scripts) |
| stale-branch-guard | Yes | Must compare against origin/develop |
| migration-gate | Yes (if migrations changed) | Unchanged |
| pii-scan | Yes | Unchanged |
| flag_promotion_check | Yes (if flags changed) | Unchanged |
| gitleaks | Yes | Unchanged |
Required reviewers: 0 (ADR-0020 reviewer-gate waiver currently in effect). Direct pushes: disallowed. Force-push: disallowed.
release (new)
| Check | Required | Notes |
|---|---|---|
| All develop checks | Yes | ci-pr.yml runs branch-agnostic |
| security-zap | Yes | Antlers/backend path filter |
| queue-docker-smoke | Yes (if queue/ changed) | Path-filtered |
| e2e-smoke | Yes | Full Playwright |
| antlers-next-ci (full: typecheck + build + Playwright) | Yes (if Antlers changed) | PR trigger |
Required reviewers: 1 (Kristerpher). This is the staging promotion gate.
Direct pushes: disallowed. Only source: PRs from develop.
main (update existing)
| Check | Required | Notes |
|---|---|---|
| All release checks | Yes | ci-pr.yml branch-agnostic |
| ios-ci | Yes (if ios/ changed) | Full iOS build at prod boundary |
| terraform-validate (+ plan with cloud creds) | Yes (if terraform/ changed) | Plan requires credentials |
| nightly-security-scan (manual dispatch mode) | Advisory | Run before major releases |
Required reviewers: 1 (Kristerpher). This is the production promotion gate.
Direct pushes: disallowed. Only source: PRs from release (and emergency hotfix
path described below).
Emergency hotfix path
When a production incident requires a fix bypassing the develop→release→main cycle:
- Branch from
mainHEAD (notdevelop). - Open PR targeting
maindirectly with labelhotfix. - Required reviewer approves. CI runs in reduced mode (merge allowed on green light-gate subset; heavy scans advisory).
- After merge to
main, cherry-pick the fix commit toreleaseand thendevelopto keep branches in sync. This is a manual step — the sre-agent runbook (to be filed as sub-card) must document this procedure.
Scripts that require updates
Two CI scripts hard-code origin/main and must be parameterized on the PR base
branch:
scripts/ci/check_pr_base_clean.sh
Currently: PR_COMMITS=$(git log --format="%H" origin/main..HEAD 2>/dev/null)
New: accept the base branch via environment variable, defaulting to develop:
BASE_BRANCH="${GITHUB_BASE_REF:-develop}"
git fetch origin "${BASE_BRANCH}" --quiet
PR_COMMITS=$(git log --format="%H" "origin/${BASE_BRANCH}..HEAD" 2>/dev/null)
In ci-pr.yml's base_branch_lint job, the GITHUB_BASE_REF context variable is
set automatically by GitHub Actions for pull_request events. No workflow change
needed — only the script changes.
scripts/ci/check_stale_branch.sh
Currently: hard-codes origin/main for fork-point and commit-count checks.
New:
BASE_BRANCH="${GITHUB_BASE_REF:-develop}"
FORK_SHA=$(git merge-base "origin/${BASE_BRANCH}" HEAD)
COMMITS_BEHIND=$(git rev-list --count "${FORK_SHA}..origin/${BASE_BRANCH}")
The ci-pr.yml step that fetches origin/main before running the stale check must
also be updated:
- name: Fetch latest integration branch
run: git fetch origin ${{ github.base_ref }} --quiet
_shared-conventions.md — branch base rule
The existing rule says: "Every agent-dispatched feature branch MUST be created
directly from origin/main." After migration this becomes origin/develop.
The base_branch_lint CI job reference in that doc ("CI job base_branch_lint
fails any PR where a commit in origin/main..HEAD also appears on another remote
branch") must be updated to say origin/<base_branch> to reflect the parameterized
script.
check_pr_base_clean.sh — exclusion of develop, release, main
The script excludes the current PR branch from its "commit appears on other branches"
check. After the migration, it should also exclude the three integration branches
(develop, release, main) from the "other branches" scan to avoid false
positives when a PR branch forks from a point that is also the tip of one of the
integration branches.
Migration plan (ordered, SRE-executable)
Operator sign-off is required before executing steps marked [GATE].
Phase 0 — pre-flight (no risk)
0a. Merge all in-flight PRs to main before any trigger changes. Any PR still open
against main after step 1 must be retargeted manually.
0b. Confirm main is green on all CI checks.
0c. Snapshot current branch-protection settings to docs/ops/runbooks/branch-protection-snapshot-2026-NNNN.md for rollback reference.
Phase 1 — create branches [GATE: operator sign-off]
1a. Create develop from main HEAD:
bash
git fetch origin main
git push origin origin/main:refs/heads/develop
1b. Create release from main HEAD:
bash
git push origin origin/main:refs/heads/release
At this point all three branches point to the same SHA. No behavior has changed.
Phase 2 — retarget in-flight PRs [GATE: operator sign-off]
2a. List all open PRs currently targeting main:
bash
gh pr list --base main --json number,title,headRefName
2b. For each open PR, change base to develop:
bash
gh pr edit <number> --base develop
GitHub will re-run CI against the new base. Authors should git fetch origin develop && git rebase origin/develop locally to keep their branches current.
Phase 3 — update scripts and conventions
3a. Update scripts/ci/check_pr_base_clean.sh — parameterize on $GITHUB_BASE_REF.
3b. Update scripts/ci/check_stale_branch.sh — parameterize on $GITHUB_BASE_REF.
3c. Update ci-pr.yml stale-branch step fetch from origin main to origin ${{ github.base_ref }}.
3d. Update .claude/agents/_shared-conventions.md — "branch from origin/develop".
3e. Update memory entry feedback_pr_base_main.md — "develop is the new main for agents".
These changes ship as a single PR targeting develop (the first PR to target the
new default branch). This PR is also the validation that the retargeted ci-pr.yml
machinery works.
Phase 4 — deploy workflow re-pointing
4a. For each dynamic service workflow listed in the Deploy re-pointing section,
change push: branches: [main] → push: branches: [release] (staging trigger)
and add push: branches: [main] → prod trigger where applicable.
4b. For deploy-antlers-next-staging.yml: change git trigger to push: branches: [release];
leave the CF Pages --branch=main flag unchanged.
4c. For deploy-antlers-next-prod.yml: trigger stays on push: branches: [main] — no change.
4d. Ship as a single PR targeting develop, then fast-track through release → main
using the new flow (first use of the new promotion path).
Phase 5 — CI trigger matrix changes
5a. Remove bare push: trigger from antlers-next-ci.yml (keep pull_request: only).
5b. Remove bare push: paths: [ios/**] trigger from ios-ci.yml (keep pull_request: only).
5c. Add branches: [release, main] filter to security-zap.yml pull_request: trigger.
5d. Add branches: [release, main] filter to queue-docker-smoke.yml pull_request: trigger.
5e. Add branches: [release, main] filter to e2e-smoke.yml pull_request: trigger.
5f. Update ci.yml trigger from push: branches: [main] → push: branches: [develop].
(The same jobs, now running on the integration branch.)
5g. Add push: branches: [develop, release, main] to lint-*.yml and pii-scan.yml push filters.
5h. Create minimal ci-boundary.yml (pii-scan + gitleaks) on push: branches: [release, main].
Ship as single PR targeting develop.
Phase 6 — branch protection [GATE: operator sign-off]
6a. Set protection on develop per spec above (Kristerpher applies via GitHub Settings).
6b. Set protection on release per spec above (1 required reviewer).
6c. Update main protection to add 1 required reviewer on PR (was: push-only).
6d. Verify required status checks match the new workflow names (the CI job names
must match what branch protection lists).
Phase 7 — change default branch [GATE: operator sign-off; IRREVERSIBLE without operator action]
7a. In GitHub Settings → Branches, change default branch from main to develop.
7b. Update any repository links in docs that reference the default branch implicitly
(e.g., github.com/raxx-app/TradeMasterAPI/blob/main/... in runbooks).
7c. Announce to all active developers / agent configurations that develop is now
the base for all new work.
Rollback
The migration is designed so each phase is independently reversible before the next phase begins.
| Phase | Rollback |
|---|---|
| Phase 1 | git push origin --delete develop release (no behaviors changed yet) |
| Phase 2 | Re-run gh pr edit <number> --base main for each retargeted PR |
| Phase 3 | Revert the PR that changed the scripts (standard git revert) |
| Phase 4 | Revert the deploy workflow PR (standard git revert); staging goes back to being triggered by main push, prod stays manual |
| Phase 5 | Revert the CI trigger PR |
| Phase 6 | Remove branch protection rules (Settings → Branches, delete rules) |
| Phase 7 (irreversible) | Change default branch back to main in Settings (requires operator UI action; cannot be done via API without admin PAT) |
If rolling back after Phase 7, all in-flight PRs targeting develop must be
retargeted to main again (same procedure as Phase 2, in reverse).
Security considerations
- PII scan is non-negotiable on all branches. The
pii-scan.ymlpush trigger must includedevelop,release, andmain. Relaxing PII gates to main-only would mean PII could exist undetected ondevelopfor an entire release cycle. - Audit trail integrity. The new model adds an explicit human approval step
(required reviewer) on
releaseandmainPRs, which IMPROVES the audit trail versus the current push-to-main auto-deploy path. - Emergency hotfix path does not bypass PII or gitleaks. The hotfix PR targets
maindirectly, but ci-pr.yml is branch-agnostic — it still runs all light gates. - No new credential surfaces. The
workflow_dispatchpaths on deploy workflows remain and continue to use the existing GitHub Environment secrets. No new secret storage is introduced. - Breach notification path. Unchanged — the nightly security scan and ZAP weekly schedule continue to operate and file issues.
Security / GDPR checklist
- PII collected: None — this ADR describes pipeline topology, not data collection.
- Retention period: N/A.
- Deletion on DSR: N/A.
- Audit trail: Every deploy action is attributed to the branch SHA, the GitHub
Actions run ID, and the approving reviewer. Merge commits on
releaseandmainare traceable to the originatingdevelopsquash commit. No regression. - Stored credentials: None introduced. Existing GitHub Environment secrets unchanged.
- Breach notification path: ZAP (weekly schedule) and nightly-security-scan remain operative. No change to alert routing.
- Secrets location + rotation: All in GitHub Environment secrets and Infisical vault. Rotatable without redeploy (existing policy unchanged).
- Kill-switch:
workflow_dispatchpaths preserved on all deploy workflows. Prod deploys can be halted by removing themainpush trigger from any workflow without touching the dispatch path.
Open questions
-
Static-site prod-on-main: getraxx, customer-docs, internal-docs, flag-docs, status-page, mockups, support, antlers-cutover all deploy to prod on push to
mainwith no staging counterpart. Should any of these gain a staging counterpart (and thus trigger onreleaseinstead)? Waiting on operator decision before sub-card for static sites is filed. -
Merge strategy for develop → release: Merge commit is recommended (to preserve develop's squash history on release). Confirm with operator that the release branch will not be a linear history requirement.
-
Release cadence: Is the develop → release promotion daily, on demand, or gated on a time-based soak? The ADR designs the mechanism; the cadence is an operator process decision.
-
Hotfix path reviewer requirement: The spec says the required reviewer on
mainapplies to all PRs including hotfixes. A hotfix during an incident may not allow time for review. Should hotfixes have a bypass policy (labelhotfix+ admin override)? Needs operator decision before branch-protection sub-card is filed. -
check_pr_base_clean.shbehavior on develop → release PRs: When a developer mergesdevelop→release, the commits indevelop..HEADwill include squash commits that were previously on feature branches. The script checks for commits that "also appear on other remote branches." After feature branches are deleted post-merge (standard GitHub behavior), this should be clean. But if feature branches are NOT deleted, the script may flag them as contamination. Confirm that GitHub is configured to auto-delete head branches on merge.
Alternatives considered
Keep single-main, add manual dispatch gate for prod (current direction, hardened)
The existing branch-promotion-strategy.md (Option B) proposes hardening the
current model with an Environment approval gate on dispatch. Rejected for this
design because it does not address the CI cost problem — heavy scans still run
on every feature PR.
develop + main (skip release)
Two-branch model: develop = staging trigger, main = prod trigger. Simpler, but loses the ability to accumulate multiple develop merges before a staging push, and conflates "integration" with "deployed to staging," which can create noise when staging deploys are frequent.
Trunk-based with feature flags
All work on main, feature flags gate exposure. Rejected because the monorepo's
deploy complexity (8+ deploy targets, some with no staging) makes trunk-based
prod deploys too risky pre-launch. The existing paper-first gate is a runtime
invariant, not a deploy-pipeline gate.
Revisit when
- Repo volume exceeds 50 PRs/day — at that point the develop→release PR promotion overhead may need automation (scheduled develop→release merge bot).
- Antlers and Raptor are separated into independent repos — the branching model would then apply per-repo rather than monorepo-wide.
- ADR-0020 reviewer-gate waiver is lifted — the required reviewer count on
developshould be revisited at that point.