Migration-numbering runbook
System: Console (Alembic), Raptor (Alembic), Queue (sqitch)
Owner: sre-agent / operator (Kristerpher)
Last incident: 2026-06-05 — three parallel feature-dev PRs (#3277, #3322, #3323) all claimed Console migration 0145
Last reviewed: 2026-06-17
What this runbook covers
- How sequential migration numbers are assigned across the three migration chains
- What the
migration-collision-checkCI job detects and why - How to renumber a file when a collision is detected
- How to diagnose a collision that slipped through to main
For multi-head Alembic recovery (the downstream consequence of a merged collision),
see docs/ops/runbooks/console-migrations.md §Multi-head recovery.
The three migration chains
| Chain | Directory | Filename pattern | Sequence key |
|---|---|---|---|
| Console (Alembic) | console/migrations/versions/ |
NNNN_<slug>.py |
4-digit zero-padded prefix |
| Raptor (Alembic) | backend_v2/alembic/versions/ |
NNNN_<slug>.py |
4-digit zero-padded prefix |
| Queue (sqitch) | queue/migrations/sqitch/sqitch.plan |
NN-<slug> change entry |
2-digit prefix in plan file |
Each chain's sequence numbers must be globally unique within that chain. Two PRs claiming the same number in the same chain is a migration-number collision.
Why collisions happen
All feature-dev agents branch from main at dispatch time. Each agent picks the
next available number with:
ls console/migrations/versions/ | sort | tail -1
When two or more agents dispatch in the same wave, they all see the same "last number on main" and claim the same next number. The first PR to merge wins; the second breaks the Alembic chain and requires a renumber hotfix.
How the CI guard works
.github/workflows/migration-collision-check.yml runs on every PR that touches:
- console/migrations/versions/**
- backend_v2/alembic/versions/**
- queue/migrations/sqitch/sqitch.plan
It calls scripts/ci/check_migration_collision.py, which:
- Diffs this PR against its base SHA to find new migration files / sqitch entries.
- Extracts the sequence number from each (the leading
NNNNorNNprefix). - Calls
gh pr list --json number,filesto get all other open PRs' changed files. - Detects any other open PR with the same sequence number in the same chain.
- If a collision is found: exits 1 and posts a PR comment naming the colliding PR and the next available number.
The check completes in under 15 seconds (no DB, no Docker) and runs as a
first-pass gate before the heavier migration-gate job.
Running the guard locally
# From repo root. Replace <base-sha> with origin/main or the PR base commit.
python3 scripts/ci/check_migration_collision.py \
--base-sha origin/main \
--pr-number 0
No --post-comment flag in local runs — comment posting requires GH_TOKEN.
Expected clean output:
check_migration_collision: base='origin/main' pr=0
Found 1 new migration(s) in this PR:
[console] 0193 — console/migrations/versions/0193_promote_my_feature.py
Found 0 migration file(s) across 0 other open PR(s).
No migration-number collisions detected. OK.
Renumber procedure (when a collision is detected)
Follow this checklist when the migration-collision-check job fails OR when two
open PRs have the same migration number and you need to renumber one.
1. Identify the next available number
The CI comment tells you. Or compute manually:
# Console — find highest number currently on main
ls console/migrations/versions/ | grep -E '^\d{4}_' | sort | tail -1
# Raptor
ls backend_v2/alembic/versions/ | grep -E '^\d{4}_' | sort | tail -1
# Queue — find highest sqitch entry
grep -E '^\d{2,4}-' queue/migrations/sqitch/sqitch.plan | sort | tail -1
Check all open PRs on GitHub for any intermediate numbers already claimed but not yet merged. Use the next unclaimed number.
2. Alembic (Console or Raptor) — rename + update internals
# Example: rename 0145_promote_my_feature.py → 0193_promote_my_feature.py
cd console/migrations/versions
git mv 0145_promote_my_feature.py 0193_promote_my_feature.py
Open the renamed file and update two constants:
revision: str = "0193" # was "0145"
down_revision: Union[str, None] = "0192" # was whatever the old predecessor was
Also update _FLAG if this is a B1 promotion migration — the B1 gate's regex
looks for it within 500 chars of _INSERT_SQL.
Verify the chain is linear:
bash scripts/ci/check_alembic_single_head.sh --console-only
# Expected: Console: OK (exactly 1 head)
3. Sqitch — rename the entry in the plan
Sqitch change entries are inside queue/migrations/sqitch/sqitch.plan. You
cannot rename a sqitch change the same way as an Alembic file — sqitch tracks
deploy state by change name. If the change has not yet been deployed:
# Old entry
07-billing-invoice-refund-column 2026-06-17T00:00:00Z raxx-dev-bot <ops@raxx.app>
# New entry (rename to avoid the collision)
08-billing-invoice-refund-column 2026-06-17T00:00:00Z raxx-dev-bot <ops@raxx.app>
Also rename the corresponding SQL files:
cd queue/migrations/sqitch
mv deploy/07-billing-invoice-refund-column.sql deploy/08-billing-invoice-refund-column.sql
mv revert/07-billing-invoice-refund-column.sql revert/08-billing-invoice-refund-column.sql
mv verify/07-billing-invoice-refund-column.sql verify/08-billing-invoice-refund-column.sql
If the change has already been deployed to any environment, escalate to the operator — sqitch tracks deployed state by change name and renaming a deployed change requires manual sqitch state surgery.
4. Post-rename validation
# Guard script — should pass now
python3 scripts/ci/check_migration_collision.py --base-sha origin/main --pr-number 0
# Alembic single-head assert (no DB needed)
bash scripts/ci/check_alembic_single_head.sh
# Sqitch plan lint (catches bracket-timestamp and format errors)
bash scripts/ci/validate-sqitch-plan.sh queue/migrations/sqitch/sqitch.plan
5. Push + watch CI
Push the renamed file(s) to the PR branch. The migration-collision-check job
should now pass. The migration-gate job runs the full upgrade/downgrade
round-trip as a final correctness check.
Collision slipped to main — recovery
If a collision merged (second PR with same number lands on main before the guard was in place, or with a bypass):
- Detect: the
alembic-heads-gateCI job on any subsequent migration-touching PR will fail with "Console alembic has 2 heads". Or run locally:
bash
cd console && alembic -c migrations/alembic.ini heads
- Identify the two files sharing a
down_revision:
bash
grep -rn "^down_revision" console/migrations/versions/ | sort -t'"' -k2
The duplicated value appears twice.
-
Renumber the later file (the one from the PR that merged second), using the procedure in §2 above. Renumber to the next available number after the current head.
-
File a hotfix PR with the rename. Do not deploy until CI is green.
Full multi-head recovery steps are in docs/ops/runbooks/console-migrations.md
§Multi-head recovery.
CI coverage summary
| Job | Workflow | Trigger | What it checks |
|---|---|---|---|
migration-collision-check |
migration-collision-check.yml |
PR opens / pushes | Sequence-number uniqueness across open PRs (fast, no DB) |
alembic-heads-gate |
ci-pr.yml |
PR touches migration dirs | Single-head assertion (fast, no DB) |
migration-gate |
ci-pr.yml |
PR touches migration dirs | Full upgrade/downgrade round-trip (Postgres) |
The three jobs are complementary:
- migration-collision-check catches the collision before it merges.
- alembic-heads-gate catches it after a second PR merges (if the first was
already on main and the guard was bypassed).
- migration-gate provides the full correctness guarantee (actual SQL execution).
Related references
| Reference | Description |
|---|---|
| Issue #3325 | Feature request for this guard |
| PR #3277 | First PR to claim 0145 (merged) — triggered the guard request |
| PR #3322 | Renumbered from 0145 to avoid collision |
| PR #3323 | Renumbered from 0145 to avoid collision |
docs/ops/runbooks/console-migrations.md |
Alembic multi-head recovery |
scripts/ci/check_migration_collision.py |
Guard script |
scripts/ci/validate-sqitch-plan.sh |
Sqitch plan format linter |
scripts/ci/check_alembic_single_head.sh |
Alembic heads gate |