Status: Accepted
Date: 2026-05-06 UTC
Refs: ADR-0020, ADR-0028, ADR-0035, docs/architecture/branch-promotion-strategy.md, operator complaint 2026-05-06
Trunk-based stays. Gitflow is rejected — long-lived branches worsen the drift problem rather than solving it, and they multiply merge cost on a solo-operator + parallel-agent fleet. The observed friction (favicon revert, card-details-popup missing from prod, staging/prod flag lag) traces to three specific gaps: no required-reviewer gate on the production Environment (toggle available since #1202 shipped the runbook), no post-merge production-state checklist to catch static asset and flag divergence before the operator notices it live, and no dashboard surface showing which flags are staging-on / prod-off. These gaps are closed by the hardening plan in this ADR — not by switching branching models.
Raxx is operated by a solo founder (Kristerpher) with a fleet of parallel agent-produced PRs merging to a single main branch. As of 2026-05-06, the operator reports:
deploy-console.yml workflow_dispatch) that recently returned a 502, leaving prod state ambiguous.The question raised: would Gitflow reduce this friction?
develop branch accumulating features before a batch release.main and merge to main. This is enforced policy (feedback_pr_base_main.md). It is the correct policy. Any branching model that requires agents to target a different base creates coordination complexity and violates the established incident-driven rule (incidents #330 and #457).feedback_pr_cancelled_checks_are_duplicates.md).main → staging auto-deploys → operator approves prod gate. This is the correct model per ADR-0020 and ADR-0028.These apply to this ADR and any sub-cards it produces:
main for short-lived feature work is correct; persistent environment branches are not.origin/main. The base_branch_lint CI job enforces this. Nothing in this ADR relaxes it.Retain trunk-based development. Reject Gitflow.
Harden the trunk model with four targeted fixes that close the diagnosed root causes.
| Gitflow benefit | Real cost in Raxx's situation | Failure mode this creates |
|---|---|---|
develop branch isolates in-progress work from main |
All agent PRs already branch from main and are individually gated by CI. There is no un-gated code accumulating on main. |
Agents targeting develop instead of main violates feedback_pr_base_main.md and incidents #330/#457. PRs would need re-targeting for every session. |
Long-lived release/* branches allow last-minute stabilization |
Raxx has one release stream and no release-team. Release-please already owns the tag. A release/* branch gives the operator a second branch to watch with no additional safety signal. |
Merge conflicts between develop → release/* and release/* → main compound the drift problem; two extra merge surfaces for the same code. |
hotfix/* branches isolate prod fixes from in-progress work |
The hotfix path under trunk-based is: merge to main, approve gate immediately. This takes minutes. Gitflow hotfix branches require cherry-picks from main → develop and from main → the hotfix branch, both of which can conflict. |
Cherry-pick conflicts delay hotfixes. On a solo+agent setup, the agent that produced the hotfix is likely not the agent that opened the backport PR. Cross-agent cherry-pick coordination is a new failure surface. |
Explicit integration gate via develop → release PR |
CI already gates every PR to main. Staging auto-deploys on merge and provides an integration environment. Adding a develop → release PR gate would be a third merge event for code already tested twice. |
Increases merge count per feature by 3x with no new information (the CI gate and staging soak are already happening). |
| Visual separation of what's "ready" vs "in progress" | GitHub PR labels (status: staging-verified, status: ready-for-prod) and the flag promotion queue (ADR-0035) provide this without branch topology changes. |
Branch proliferation that agents do not clean up becomes stale branch noise in the remote. Each stale branch is a merge conflict surface for the next agent session. |
Summary: Gitflow's benefits exist for multi-team, multi-release-stream organizations where main would otherwise be unusable as a stable base. None of those conditions hold for Raxx. Every Gitflow benefit Raxx would get is already provided by the existing CI gate + staging soak + ADR-0028 friction model. The costs are all real and specific to the solo+agent setup.
The reported friction does not trace to the branching model. It traces to three distinct gaps in the existing trunk model:
production Environment is not yet toggledADR-0020 selected the tag + environment approval gate model. #1202 shipped the runbook documenting exactly how to enable it. The toggle has not been set. Without the gate, staging-verified code can reach prod without a deliberate operator checkpoint between "staging looks good" and "prod updated." The favicon revert is consistent with a staging deploy overwriting a prod state that was never explicitly confirmed by the operator.
Root cause: configuration gap, not a branching-model gap.
After a PR merges and staging deploys, there is no structured step that asks: "did this change affect static assets? does prod still have the right asset hashes? are any flags that staging-on but prod-off blocking this feature?" The card-details popup stall is consistent with a feature that passed staging CI but depends on a flag that was never promoted to prod (ADR-0035 describes exactly this failure mode).
Root cause: process gap, not a branching-model gap.
The operator cannot see, at a glance, how many and which flags are staging-on / prod-off. The promotion queue exists in the database (ADR-0035) but has no dashboard widget. Without this, the operator only learns of flag drift when a feature is visibly missing from prod.
Root cause: observability gap, not a branching-model gap.
When multiple agents open PRs in parallel against main, a slow-moving PR's branch falls behind. If the agent that opened it does not rebase before the PR is merged, the merge commit may silently revert changes made by a concurrent PR that already landed. The favicon incident is consistent with this pattern.
Root cause: agent workflow discipline gap, not a branching-model gap. Gitflow does not fix this — it moves the conflict surface from main to develop, where it is equally possible and harder to observe.
Drift is defined as the aggregate gap between main's current state and production's observed state across three dimensions:
| Dimension | Metric | Alert threshold |
|---|---|---|
| Code drift | Hours since last prod deploy vs. timestamp of last merge to main |
> 48 hours without a prod deploy following any merge triggers a console warning |
| Flag drift | Count of flags where staging_enabled = true AND prod_enabled = false |
> 0 flags pending promotion for > 24 hours triggers a console badge |
| Static asset drift | Count of files under dist/ or CF Pages asset manifest where prod content-hash != main's last-built hash |
> 0 divergent asset hashes after a prod deploy is a deploy verification failure |
These three metrics are observable without Gitflow. A Gitflow develop branch adds a fourth metric (hours since develop was merged to release) without reducing the other three.
production EnvironmentThe runbook from #1202 documents the exact GitHub UI path. This is a settings toggle, zero YAML changes. Effect: every prod deploy (code or flag) pauses for a deliberate Kristerpher approval before proceeding.
Owner: operator (Kristerpher, settings toggle only). No sub-card needed — this is a one-minute action against the runbook in #1202.
A runbook (not automated enforcement — a checklist the operator runs before approving the prod gate) that covers:
/flags/promotions in the console (or query console_flag_promotions WHERE state = 'pending'). Any feature that is staging-enabled but not prod-enabled must be a deliberate choice, not an oversight.feedback_aws_workloads_use_ssm_not_vault.md) before approving the deploy.heroku releases --app raxx-console-prod | head -3) and the smoke suite passes.This runbook lives at docs/runbooks/post-merge-prod-checklist.md and is referenced from the prod deploy approval notification.
Sub-card needed: file a card for feature-developer to write the runbook and wire a link to it into the prod approval notification step.
A console dashboard widget (read-only, no action in v1) that shows:
staging_enabled = true AND prod_enabled = falseThis gives the operator an at-a-glance view of drift without navigating to /flags/promotions. It surfaces the "card-details popup never reached prod" class of failure before the operator notices it in the live product.
Sub-card needed: file a card for feature-developer to add the dashboard widget.
Two changes:
main at the time the PR is opened for review. Threshold: 10 commits or 48 hours — whichever is larger. This forces the agent (or operator) to rebase before the PR can merge.origin/main immediately before pushing the final commit. This is already implicit in feedback_commit_agent_docs_immediately.md (commit before any rebase storm) but needs to be a named step in the agent workflow conventions.Sub-card needed: file a card for feature-developer to add the stale-branch CI guard workflow step.
| Phase | What | Gate |
|---|---|---|
| Immediate | Kristerpher enables required-reviewer gate on production Environment per #1202 runbook |
Operator action only — no PR needed |
| Sprint 1 | H2: post-merge prod checklist runbook + link in approval notification | Sub-card #N1 |
| Sprint 1 | H3: flag-promotion-pending dashboard widget | Sub-card #N2 |
| Sprint 1 | H4: stale-branch CI guard | Sub-card #N3 |
| After Sprint 1 | Measure: code drift metric, flag drift count, asset hash divergence | Console dashboard or Slack report |
None block implementation. One for Kristerpher's awareness:
Fully analyzed in §4. Rejected. Long-lived branches worsen the drift problem, multiply merge cost, and add cross-agent coordination complexity without providing any safety property that trunk-based + the H1–H4 hardening plan does not already provide.
production branchA production long-lived branch (GitLab Flow variant) was evaluated in docs/architecture/branch-promotion-strategy.md as Option A and rejected there. The reasoning holds here: it adds a branch-protection ceremony and a second merge event per release with no additional safety signal beyond the approval gate (H1) already provides. Specifically: agent PRs targeting main would need to be manually re-promoted to production — that is the exact coordination step that is missing today, and adding a branch makes it harder, not easier, to see what is pending.
Evaluated and rejected in ADR-0020. Still rejected here. Automated prod deploys without a human checkpoint are not appropriate for the current pre-launch posture. The operator's approval is the signal; a timer is not a substitute.
These are scoped for feature-developer. Do not claim until card-groomer has processed them.
| Card | Title | Depends on |
|---|---|---|
| #N1 | Write post-merge production-state checklist runbook + wire link into prod approval notification | #1202 (runbook already exists for deploy gate) |
| #N2 | Console dashboard: flag-promotion-pending widget (count badge + list view) | ADR-0035, console_flag_promotions table (#552) |
| #N3 | CI: stale-branch guard — fail PR if branch is > N commits or 48h behind main |
None |
Operator action (not a card): enable the required-reviewer gate on the production GitHub Environment per the runbook in #1202. This takes under two minutes and unblocks H1 immediately.