Raxx · internal docs

internal · gated

Alembic multi-head resolution — 2026-06-05

System: console (Alembic migration chain) Owner: sre-agent Last incident: 2026-06-05 Last reviewed: 2026-06-05

How to tell it's broken

How to diagnose (in order)

  1. alembic heads — lists all current heads; expected: exactly one.
  2. alembic history --verbose — trace the DAG to identify the branch point.
  3. For each head, inspect the migration file: what is its down_revision? If two files share the same down_revision and neither lists the other as a parent, you have a fork.
  4. Cross-reference with open PRs: gh pr list --state open --json number,headRefName then inspect each PR branch's migration files for conflicting down_revision.

How this happened (2026-06-05 incident)

Two feature PRs were branched from the same main commit and both claimed down_revision = "0144":

Result: two Alembic heads on main — 0146 and 0148.

Three additional open PRs each also used down_revision = "0144", which would have added three more heads on merge: - PR #3323 (0147) — tax §1256 tagger - PR #3277 (0145) — console billing dashboard - PR #3300 (0149, 0150) — email verification

Known failure modes

Failure mode A: Two heads on main after back-to-back merges

Symptom: alembic heads returns 2+ lines; deploy fails.

Cause: Two PRs branched from the same commit both claim the same down_revision without a rebase/merge between them.

Fix:

  1. Create a merge migration (pure chain node, no DDL):
# console/migrations/versions/XXXX_merge_<label>.py
revision: str = "XXXX"
down_revision: Union[tuple, None] = ("HEAD_A", "HEAD_B")
branch_labels = None
depends_on = None

def upgrade() -> None:
    pass

def downgrade() -> None:
    pass
  1. Set revision to the next available slot (check alembic heads and the filenames; pick the next sequential number not yet in use).

  2. Commit the file on a new ops branch, open a PR against main.

  3. For every open PR whose down_revision points at any of the now-merged heads, update to point at the new merge revision, then force-push.

Verification:

alembic heads   # must return exactly 1 line
alembic upgrade head --sql | head  # dry-run; must not error

Failure mode B: Open PR adds a new head on merge

Symptom: alembic heads returns 1 line today, but a PR in review has down_revision pointing at an older revision (not the current head).

Cause: The PR was branched before recent merges and not rebased.

Fix: In the PR branch, update the migration file's down_revision to the current single head, then force-push. If the PR already has a worktree isolation context, do the edit there and push.

Verification:

git show origin/<branch>:console/migrations/versions/<file>.py | grep down_revision

Should show the current head revision ID.

Failure mode C: CI does not catch multi-head before deploy

Symptom: The two-head state landed on main without a gate.

Cause: No CI step runs alembic heads and fails the build on count > 1.

Fix (permanent — issue #3325): Add a CI lint job:

count=$(alembic -c console/migrations/alembic.ini heads | wc -l)
if [ "$count" -gt 1 ]; then
  echo "ERROR: $count Alembic heads detected — merge required before deploy"
  exit 1
fi

Wire this into the migration-lint or pre-deploy job.

2026-06-05 resolution

Merge migration created: 0151_merge_tax_heads.py - down_revision = ("0146", "0148") - No DDL, pure chain node - Merged via PR #3326 (ops/alembic-merge-heads-2026-06-05)

Open PRs updated (down_revision → "0151"):

PR Branch File Old down_revision New down_revision
#3323 feature/tax-s1256-tagger 0147 0144 0151
#3277 feature/console-billing-customer-dashboard 0145 0144 0151
#3300 feature/email-verification-e2e-3272 0149 0144 0151

Recommended merge order: 1. PR #3326 (merge node — must land first) 2. PR #3277, #3300, #3323 in any order (all chain off 0151; pick one, the others become head conflicts again until each lands — merge quickly or open additional merge nodes as needed)

Emergency stop

If a broken migration has been applied to a database:

alembic downgrade -1   # step back one revision

For multiple steps, repeat or use alembic downgrade <target_revision>. These migrations are all no-op DDL (flag promotion rows), so downgrade is safe.

Escalation

Escalate to operator if: - A migration with real DDL (schema changes) is involved in the multi-head - alembic downgrade fails with a database error - The merge node produces a cycle in the DAG (alembic history shows a loop)

Action items from this incident

# Action Owner Due Issue
1 Add alembic heads count gate to CI migration-lint job sre-agent / feature-developer 2026-06-12 #3325
2 Add PR template checklist item: "down_revision set to current alembic heads output" operator 2026-06-12 #3325
3 Wire Sentry / CI alert if deploy job fails with multi-head error sre-agent 2026-06-19 #3325