Raxx · internal docs

internal · gated

Migration-numbering runbook

System: Console (Alembic), Raptor (Alembic), Queue (sqitch) Owner: sre-agent / operator (Kristerpher) Last incident: 2026-06-05 — three parallel feature-dev PRs (#3277, #3322, #3323) all claimed Console migration 0145 Last reviewed: 2026-06-17


What this runbook covers

For multi-head Alembic recovery (the downstream consequence of a merged collision), see docs/ops/runbooks/console-migrations.md §Multi-head recovery.


The three migration chains

Chain Directory Filename pattern Sequence key
Console (Alembic) console/migrations/versions/ NNNN_<slug>.py 4-digit zero-padded prefix
Raptor (Alembic) backend_v2/alembic/versions/ NNNN_<slug>.py 4-digit zero-padded prefix
Queue (sqitch) queue/migrations/sqitch/sqitch.plan NN-<slug> change entry 2-digit prefix in plan file

Each chain's sequence numbers must be globally unique within that chain. Two PRs claiming the same number in the same chain is a migration-number collision.


Why collisions happen

All feature-dev agents branch from main at dispatch time. Each agent picks the next available number with:

ls console/migrations/versions/ | sort | tail -1

When two or more agents dispatch in the same wave, they all see the same "last number on main" and claim the same next number. The first PR to merge wins; the second breaks the Alembic chain and requires a renumber hotfix.


How the CI guard works

.github/workflows/migration-collision-check.yml runs on every PR that touches: - console/migrations/versions/** - backend_v2/alembic/versions/** - queue/migrations/sqitch/sqitch.plan

It calls scripts/ci/check_migration_collision.py, which:

  1. Diffs this PR against its base SHA to find new migration files / sqitch entries.
  2. Extracts the sequence number from each (the leading NNNN or NN prefix).
  3. Calls gh pr list --json number,files to get all other open PRs' changed files.
  4. Detects any other open PR with the same sequence number in the same chain.
  5. If a collision is found: exits 1 and posts a PR comment naming the colliding PR and the next available number.

The check completes in under 15 seconds (no DB, no Docker) and runs as a first-pass gate before the heavier migration-gate job.


Running the guard locally

# From repo root. Replace <base-sha> with origin/main or the PR base commit.
python3 scripts/ci/check_migration_collision.py \
  --base-sha origin/main \
  --pr-number 0

No --post-comment flag in local runs — comment posting requires GH_TOKEN.

Expected clean output:

check_migration_collision: base='origin/main'  pr=0
Found 1 new migration(s) in this PR:
  [console] 0193 — console/migrations/versions/0193_promote_my_feature.py
Found 0 migration file(s) across 0 other open PR(s).
No migration-number collisions detected. OK.

Renumber procedure (when a collision is detected)

Follow this checklist when the migration-collision-check job fails OR when two open PRs have the same migration number and you need to renumber one.

1. Identify the next available number

The CI comment tells you. Or compute manually:

# Console — find highest number currently on main
ls console/migrations/versions/ | grep -E '^\d{4}_' | sort | tail -1

# Raptor
ls backend_v2/alembic/versions/ | grep -E '^\d{4}_' | sort | tail -1

# Queue — find highest sqitch entry
grep -E '^\d{2,4}-' queue/migrations/sqitch/sqitch.plan | sort | tail -1

Check all open PRs on GitHub for any intermediate numbers already claimed but not yet merged. Use the next unclaimed number.

2. Alembic (Console or Raptor) — rename + update internals

# Example: rename 0145_promote_my_feature.py → 0193_promote_my_feature.py
cd console/migrations/versions
git mv 0145_promote_my_feature.py 0193_promote_my_feature.py

Open the renamed file and update two constants:

revision: str = "0193"                        # was "0145"
down_revision: Union[str, None] = "0192"      # was whatever the old predecessor was

Also update _FLAG if this is a B1 promotion migration — the B1 gate's regex looks for it within 500 chars of _INSERT_SQL.

Verify the chain is linear:

bash scripts/ci/check_alembic_single_head.sh --console-only
# Expected: Console: OK (exactly 1 head)

3. Sqitch — rename the entry in the plan

Sqitch change entries are inside queue/migrations/sqitch/sqitch.plan. You cannot rename a sqitch change the same way as an Alembic file — sqitch tracks deploy state by change name. If the change has not yet been deployed:

# Old entry
07-billing-invoice-refund-column 2026-06-17T00:00:00Z raxx-dev-bot <ops@raxx.app>

# New entry (rename to avoid the collision)
08-billing-invoice-refund-column 2026-06-17T00:00:00Z raxx-dev-bot <ops@raxx.app>

Also rename the corresponding SQL files:

cd queue/migrations/sqitch
mv deploy/07-billing-invoice-refund-column.sql deploy/08-billing-invoice-refund-column.sql
mv revert/07-billing-invoice-refund-column.sql revert/08-billing-invoice-refund-column.sql
mv verify/07-billing-invoice-refund-column.sql verify/08-billing-invoice-refund-column.sql

If the change has already been deployed to any environment, escalate to the operator — sqitch tracks deployed state by change name and renaming a deployed change requires manual sqitch state surgery.

4. Post-rename validation

# Guard script — should pass now
python3 scripts/ci/check_migration_collision.py --base-sha origin/main --pr-number 0

# Alembic single-head assert (no DB needed)
bash scripts/ci/check_alembic_single_head.sh

# Sqitch plan lint (catches bracket-timestamp and format errors)
bash scripts/ci/validate-sqitch-plan.sh queue/migrations/sqitch/sqitch.plan

5. Push + watch CI

Push the renamed file(s) to the PR branch. The migration-collision-check job should now pass. The migration-gate job runs the full upgrade/downgrade round-trip as a final correctness check.


Collision slipped to main — recovery

If a collision merged (second PR with same number lands on main before the guard was in place, or with a bypass):

  1. Detect: the alembic-heads-gate CI job on any subsequent migration-touching PR will fail with "Console alembic has 2 heads". Or run locally:

bash cd console && alembic -c migrations/alembic.ini heads

  1. Identify the two files sharing a down_revision:

bash grep -rn "^down_revision" console/migrations/versions/ | sort -t'"' -k2

The duplicated value appears twice.

  1. Renumber the later file (the one from the PR that merged second), using the procedure in §2 above. Renumber to the next available number after the current head.

  2. File a hotfix PR with the rename. Do not deploy until CI is green.

Full multi-head recovery steps are in docs/ops/runbooks/console-migrations.md §Multi-head recovery.


CI coverage summary

Job Workflow Trigger What it checks
migration-collision-check migration-collision-check.yml PR opens / pushes Sequence-number uniqueness across open PRs (fast, no DB)
alembic-heads-gate ci-pr.yml PR touches migration dirs Single-head assertion (fast, no DB)
migration-gate ci-pr.yml PR touches migration dirs Full upgrade/downgrade round-trip (Postgres)

The three jobs are complementary: - migration-collision-check catches the collision before it merges. - alembic-heads-gate catches it after a second PR merges (if the first was already on main and the guard was bypassed). - migration-gate provides the full correctness guarantee (actual SQL execution).


Reference Description
Issue #3325 Feature request for this guard
PR #3277 First PR to claim 0145 (merged) — triggered the guard request
PR #3322 Renumbered from 0145 to avoid collision
PR #3323 Renumbered from 0145 to avoid collision
docs/ops/runbooks/console-migrations.md Alembic multi-head recovery
scripts/ci/check_migration_collision.py Guard script
scripts/ci/validate-sqitch-plan.sh Sqitch plan format linter
scripts/ci/check_alembic_single_head.sh Alembic heads gate