Raxx · internal docs

internal · gated ↑ index

ADR 0050 — Trunk-based SDLC affirmed; Gitflow rejected; hardening plan for drift and revert friction

Status: Accepted
Date: 2026-05-06 UTC
Refs: ADR-0020, ADR-0028, ADR-0035, docs/architecture/branch-promotion-strategy.md, operator complaint 2026-05-06


TL;DR

Trunk-based stays. Gitflow is rejected — long-lived branches worsen the drift problem rather than solving it, and they multiply merge cost on a solo-operator + parallel-agent fleet. The observed friction (favicon revert, card-details-popup missing from prod, staging/prod flag lag) traces to three specific gaps: no required-reviewer gate on the production Environment (toggle available since #1202 shipped the runbook), no post-merge production-state checklist to catch static asset and flag divergence before the operator notices it live, and no dashboard surface showing which flags are staging-on / prod-off. These gaps are closed by the hardening plan in this ADR — not by switching branching models.


1. Context

Raxx is operated by a solo founder (Kristerpher) with a fleet of parallel agent-produced PRs merging to a single main branch. As of 2026-05-06, the operator reports:

The question raised: would Gitflow reduce this friction?

Raxx-specific context that matters for this decision


2. Invariants

These apply to this ADR and any sub-cards it produces:


3. Decision

Retain trunk-based development. Reject Gitflow.

Harden the trunk model with four targeted fixes that close the diagnosed root causes.


4. Gitflow rejection

Decision matrix

Gitflow benefit Real cost in Raxx's situation Failure mode this creates
develop branch isolates in-progress work from main All agent PRs already branch from main and are individually gated by CI. There is no un-gated code accumulating on main. Agents targeting develop instead of main violates feedback_pr_base_main.md and incidents #330/#457. PRs would need re-targeting for every session.
Long-lived release/* branches allow last-minute stabilization Raxx has one release stream and no release-team. Release-please already owns the tag. A release/* branch gives the operator a second branch to watch with no additional safety signal. Merge conflicts between developrelease/* and release/*main compound the drift problem; two extra merge surfaces for the same code.
hotfix/* branches isolate prod fixes from in-progress work The hotfix path under trunk-based is: merge to main, approve gate immediately. This takes minutes. Gitflow hotfix branches require cherry-picks from maindevelop and from main → the hotfix branch, both of which can conflict. Cherry-pick conflicts delay hotfixes. On a solo+agent setup, the agent that produced the hotfix is likely not the agent that opened the backport PR. Cross-agent cherry-pick coordination is a new failure surface.
Explicit integration gate via developrelease PR CI already gates every PR to main. Staging auto-deploys on merge and provides an integration environment. Adding a developrelease PR gate would be a third merge event for code already tested twice. Increases merge count per feature by 3x with no new information (the CI gate and staging soak are already happening).
Visual separation of what's "ready" vs "in progress" GitHub PR labels (status: staging-verified, status: ready-for-prod) and the flag promotion queue (ADR-0035) provide this without branch topology changes. Branch proliferation that agents do not clean up becomes stale branch noise in the remote. Each stale branch is a merge conflict surface for the next agent session.

Summary: Gitflow's benefits exist for multi-team, multi-release-stream organizations where main would otherwise be unusable as a stable base. None of those conditions hold for Raxx. Every Gitflow benefit Raxx would get is already provided by the existing CI gate + staging soak + ADR-0028 friction model. The costs are all real and specific to the solo+agent setup.


5. Root cause diagnosis

The reported friction does not trace to the branching model. It traces to three distinct gaps in the existing trunk model:

Gap 1 — Required-reviewer gate on production Environment is not yet toggled

ADR-0020 selected the tag + environment approval gate model. #1202 shipped the runbook documenting exactly how to enable it. The toggle has not been set. Without the gate, staging-verified code can reach prod without a deliberate operator checkpoint between "staging looks good" and "prod updated." The favicon revert is consistent with a staging deploy overwriting a prod state that was never explicitly confirmed by the operator.

Root cause: configuration gap, not a branching-model gap.

Gap 2 — No post-merge production-state checklist

After a PR merges and staging deploys, there is no structured step that asks: "did this change affect static assets? does prod still have the right asset hashes? are any flags that staging-on but prod-off blocking this feature?" The card-details popup stall is consistent with a feature that passed staging CI but depends on a flag that was never promoted to prod (ADR-0035 describes exactly this failure mode).

Root cause: process gap, not a branching-model gap.

Gap 3 — No at-a-glance flag-drift surface

The operator cannot see, at a glance, how many and which flags are staging-on / prod-off. The promotion queue exists in the database (ADR-0035) but has no dashboard widget. Without this, the operator only learns of flag drift when a feature is visibly missing from prod.

Root cause: observability gap, not a branching-model gap.

Gap 4 — Agent rebase hygiene (contributing factor)

When multiple agents open PRs in parallel against main, a slow-moving PR's branch falls behind. If the agent that opened it does not rebase before the PR is merged, the merge commit may silently revert changes made by a concurrent PR that already landed. The favicon incident is consistent with this pattern.

Root cause: agent workflow discipline gap, not a branching-model gap. Gitflow does not fix this — it moves the conflict surface from main to develop, where it is equally possible and harder to observe.


6. Drift — measurable definition

Drift is defined as the aggregate gap between main's current state and production's observed state across three dimensions:

Dimension Metric Alert threshold
Code drift Hours since last prod deploy vs. timestamp of last merge to main > 48 hours without a prod deploy following any merge triggers a console warning
Flag drift Count of flags where staging_enabled = true AND prod_enabled = false > 0 flags pending promotion for > 24 hours triggers a console badge
Static asset drift Count of files under dist/ or CF Pages asset manifest where prod content-hash != main's last-built hash > 0 divergent asset hashes after a prod deploy is a deploy verification failure

These three metrics are observable without Gitflow. A Gitflow develop branch adds a fourth metric (hours since develop was merged to release) without reducing the other three.


7. Hardening plan

H1 — Enable required-reviewer gate on production Environment

The runbook from #1202 documents the exact GitHub UI path. This is a settings toggle, zero YAML changes. Effect: every prod deploy (code or flag) pauses for a deliberate Kristerpher approval before proceeding.

Owner: operator (Kristerpher, settings toggle only). No sub-card needed — this is a one-minute action against the runbook in #1202.

H2 — Post-merge production-state checklist runbook

A runbook (not automated enforcement — a checklist the operator runs before approving the prod gate) that covers:

  1. Static asset diff. Compare the CF Pages asset manifest for the staging alias vs. the production alias. Any file where the content-hash differs is a candidate for silent revert. Flag these before approving.
  2. Flag-promotion-pending list. Open /flags/promotions in the console (or query console_flag_promotions WHERE state = 'pending'). Any feature that is staging-enabled but not prod-enabled must be a deliberate choice, not an oversight.
  3. Env-var seed check. For any PR that adds a new env var, confirm the var is seeded in both the Heroku prod config and SSM (per feedback_aws_workloads_use_ssm_not_vault.md) before approving the deploy.
  4. Deploy-status verify. After the prod deploy completes, confirm the Heroku release is green (heroku releases --app raxx-console-prod | head -3) and the smoke suite passes.

This runbook lives at docs/runbooks/post-merge-prod-checklist.md and is referenced from the prod deploy approval notification.

Sub-card needed: file a card for feature-developer to write the runbook and wire a link to it into the prod approval notification step.

H3 — Flag-promotion-pending dashboard widget

A console dashboard widget (read-only, no action in v1) that shows:

This gives the operator an at-a-glance view of drift without navigating to /flags/promotions. It surfaces the "card-details popup never reached prod" class of failure before the operator notices it in the live product.

Sub-card needed: file a card for feature-developer to add the dashboard widget.

H4 — Agent rebase hygiene enforcement

Two changes:

  1. Stale-branch guard in CI. A workflow step that fails the PR if the feature branch is more than N commits behind main at the time the PR is opened for review. Threshold: 10 commits or 48 hours — whichever is larger. This forces the agent (or operator) to rebase before the PR can merge.
  2. Agent session convention. Each agent that opens a PR should rebase to origin/main immediately before pushing the final commit. This is already implicit in feedback_commit_agent_docs_immediately.md (commit before any rebase storm) but needs to be a named step in the agent workflow conventions.

Sub-card needed: file a card for feature-developer to add the stale-branch CI guard workflow step.


8. Rollout plan

Phase What Gate
Immediate Kristerpher enables required-reviewer gate on production Environment per #1202 runbook Operator action only — no PR needed
Sprint 1 H2: post-merge prod checklist runbook + link in approval notification Sub-card #N1
Sprint 1 H3: flag-promotion-pending dashboard widget Sub-card #N2
Sprint 1 H4: stale-branch CI guard Sub-card #N3
After Sprint 1 Measure: code drift metric, flag drift count, asset hash divergence Console dashboard or Slack report

9. Security considerations


10. Open questions

None block implementation. One for Kristerpher's awareness:

  1. Stale-branch threshold. The 10-commit / 48-hour threshold in H4 is a starting point. If the agent fleet is producing PRs faster than expected, 10 commits may be too tight and cause excessive forced-rebases. Adjust after Sprint 1 observability.

11. Alternatives considered

Gitflow

Fully analyzed in §4. Rejected. Long-lived branches worsen the drift problem, multiply merge cost, and add cross-agent coordination complexity without providing any safety property that trunk-based + the H1–H4 hardening plan does not already provide.

GitLab Flow with production branch

A production long-lived branch (GitLab Flow variant) was evaluated in docs/architecture/branch-promotion-strategy.md as Option A and rejected there. The reasoning holds here: it adds a branch-protection ceremony and a second merge event per release with no additional safety signal beyond the approval gate (H1) already provides. Specifically: agent PRs targeting main would need to be manually re-promoted to production — that is the exact coordination step that is missing today, and adding a branch makes it harder, not easier, to see what is pending.

Automated soak timer (branch-promotion-strategy Option C)

Evaluated and rejected in ADR-0020. Still rejected here. Automated prod deploys without a human checkpoint are not appropriate for the current pre-launch posture. The operator's approval is the signal; a timer is not a substitute.


Action items (sub-cards to file)

These are scoped for feature-developer. Do not claim until card-groomer has processed them.

Card Title Depends on
#N1 Write post-merge production-state checklist runbook + wire link into prod approval notification #1202 (runbook already exists for deploy gate)
#N2 Console dashboard: flag-promotion-pending widget (count badge + list view) ADR-0035, console_flag_promotions table (#552)
#N3 CI: stale-branch guard — fail PR if branch is > N commits or 48h behind main None

Operator action (not a card): enable the required-reviewer gate on the production GitHub Environment per the runbook in #1202. This takes under two minutes and unblocks H1 immediately.