System: Console Alembic migration gate + Raptor Alembic migration gate (CI job migration-gate in .github/workflows/ci-pr.yml)
Owner: sre-agent / operator
Last incident: 2026-05-11 (gate broken from day one — missing kris@ seed row, missing raptor_app role; fixed in PR #1654 and PR #1656)
Last reviewed: 2026-05-12
The migration-gate CI job (ci-pr.yml, added in PR #1623, closes issue #1552) runs automatically on every PR that touches:
console/migrations/** (Console Alembic tree)backend_v2/alembic/versions/** (Raptor Alembic tree)It spins up an ephemeral postgres:15 service container and runs a full round-trip on both migration trees:
upgrade head → assert single head → downgrade -1 → upgrade head
Each tree gets its own ephemeral database (console_ci, raptor_ci) so there is no schema bleed.
What it catches before merge:
| Failure class | First observed | Example PR |
|---|---|---|
Multiple heads (duplicate down_revision) |
2026-05-10 | #1548, #1550 |
| Postgres-incompatible expression (STABLE vs IMMUTABLE) | 2026-05-10 | #1551 |
Migration that errors on upgrade or downgrade |
2026-05-11 | #1654, #1656 |
| Missing seed data required by a migration | 2026-05-11 | #1654 |
| Missing Postgres role required by a migration GRANT | 2026-05-11 | #1656 |
The gate runs against an ephemeral database with no vault credentials. This bypasses the RDS IAM restriction that prevents CREATE ROLE WITH PASSWORD on Heroku Standard-0 (feedback: heroku_pg_rds_password_gotcha.md).
The ephemeral CI database starts empty. Two categories of "environment prerequisites" that exist in prod but not in a fresh Postgres must be supplied by the gate itself before running alembic upgrade head.
kris@moosequest.net admin row (required by migration 0016)Migration: console/migrations/versions/0016_raxx_platform_owners_group.py
Why it is required: Migration 0016 seeds the raxx-platform-owners RBAC group and enrolls the operator (kris@moosequest.net) as its first member. Step 3 of upgrade() executes SELECT id FROM admins WHERE email = :email. If the row is absent it raises:
RuntimeError: 0016 migration: operator 'kris@moosequest.net' not found in the admins table.
This is intentional on prod — the operator account must exist before the platform-owners group is seeded. In CI there is no prior data, so the gate supplies a stub row.
How the gate supplies it (two-phase upgrade, PR #1654):
Console — alembic upgrade to 0015 (pre-seed) — brings the schema up to the migration that creates the admins table.Console — seed kris@ admin row (required by 0016) — inserts the stub row:INSERT INTO admins (id, email, display_name, created_at, updated_at, disabled_at)
VALUES (
'00000000-0000-0000-0000-000000000001',
'kris@moosequest.net',
'CI Seed — do not use in prod',
'2026-01-01T00:00:00',
'2026-01-01T00:00:00',
NULL
)
ON CONFLICT (email) DO NOTHING;
Console — alembic upgrade head — completes from 0016 onward.Why ON CONFLICT DO NOTHING: The round-trip downgrade -1 / upgrade head must not fail on the second upgrade head. Migration 0016's downgrade() only removes rbac_user_groups membership — the admins row is untouched — so the seed is still present for the second upgrade.
Columns seeded: id, email, display_name, created_at, updated_at, disabled_at. The admins table has no password columns (schema from migration 0001). The stub UUID and timestamps are CI-only and are never used for authentication.
raptor_app Postgres role (required by migration 0002+)Migration: backend_v2/alembic/versions/0002_paper_orders.py (and all subsequent Raptor migrations that add tables)
Why it is required: Migration 0002 executes:
GRANT SELECT, INSERT, UPDATE, DELETE ON paper_orders TO raptor_app;
GRANT USAGE ON SEQUENCE paper_orders_id_seq TO raptor_app;
On prod, the raptor_app role is provisioned during the Postgres Standard-0 setup (RM-9), outside of Alembic. The CI ephemeral database has no such provisioning. Without the role, upgrade head fails with:
psycopg2.errors.UndefinedObject: role "raptor_app" does not exist
How the gate supplies it (PR #1656):
DO $$ BEGIN
IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = 'raptor_app') THEN
CREATE ROLE raptor_app NOLOGIN;
END IF;
END $$;
This runs before alembic upgrade head on the Raptor tree. The role has no login privilege and no initial permissions — the GRANT statements inside each migration attach the needed permissions.
Any migration that queries an existing table for lookup data OR executes a GRANT to a named role is a seed-dependent migration. The gate must be updated in the same PR as the migration. The pattern:
alembic upgrade <N-1> (up to the migration before the one with the dependency).ON CONFLICT DO NOTHING / IF NOT EXISTS).alembic upgrade head.Every seed INSERT must use ON CONFLICT DO NOTHING. Every role creation must use the IF NOT EXISTS guard (or the DO $$ BEGIN IF NOT EXISTS ... END $$ block for Postgres < 9.6 syntax, though CI uses Postgres 15). This ensures the gate remains idempotent across re-runs and round-trips.
migration-gate job in GitHub Actions shows FAILED for a PR touching console/migrations/** or backend_v2/alembic/versions/**.Console — alembic upgrade to 0015 (pre-seed) — pre-seed step failedConsole — seed kris@ admin row (required by 0016) — psql seed failedConsole — alembic upgrade head — a migration erroredConsole — assert single head — multiple heads detectedConsole — downgrade -1 (reversibility check) — downgrade erroredRaptor — create raptor_app role (required by 0002+) — role creation failedRaptor — alembic upgrade head — a Raptor migration erroredRaptor — assert single head — multiple Raptor heads detectedOpen the failing CI run. Navigate to the migration-gate job and expand the failing step. The Alembic stack trace or psql error is the primary signal.
Classify the error.
| Error message | Failure class |
|---|---|
RuntimeError: ... not found in the admins table |
Missing seed row (new migration requires a row that the gate does not supply yet) |
psycopg2.errors.UndefinedObject: role "..." does not exist |
Missing Postgres role (new migration GRANTs to a role the gate does not create yet) |
DETAIL: Multiple heads are present |
Duplicate down_revision — two migrations claim the same parent |
ERROR: cannot use a non-immutable function in index expression |
Postgres-incompatible expression in CREATE INDEX ... WHERE |
alembic.util.exc.CommandError: Can't locate revision ... |
Broken down_revision pointer — a migration references a revision ID that does not exist |
Any other psycopg2 error |
Migration-specific Postgres incompatibility |
```bash # Console cd console && alembic -c migrations/alembic.ini heads
# Raptor cd backend_v2 && alembic heads ```
Each output line marked (head) is a branch tip. Find the two migrations that share the same down_revision value.
alembic upgrade head. If it is missing, the fix is a new seed step in ci-pr.yml (see "Recovery steps" below).Symptom: Console — alembic upgrade head fails with:
RuntimeError: 0016 migration: operator 'kris@moosequest.net' not found in the admins table.
Cause: A new migration was added that runs before or alongside 0016 and depends on the same seed, OR the seed step was removed or reordered in a ci-pr.yml change.
Fix:
Verify the seed step is present and runs after alembic upgrade 0015 but before alembic upgrade head. If the seed step exists but the error still fires, the cause is likely a new migration that executes before 0016 and operates on data that requires a different seed row — diagnose by reading the new migration's upgrade() body.
If a completely new migration introduces its own seed requirement, add the corresponding seed step following the pattern established by PR #1654.
Verification: Re-run the migration-gate job. Console — seed kris@ admin row (required by 0016) must exit 0, followed by Console — alembic upgrade head exiting 0.
Symptom: Raptor — alembic upgrade head fails with:
psycopg2.errors.UndefinedObject: role "raptor_app" does not exist
LINE 1: GRANT SELECT, INSERT, UPDATE, DELETE ON <table> TO raptor_app
Cause: A new Raptor migration GRANTs to raptor_app (or another named role) that the gate's role-creation step does not cover. Most commonly this happens if a new Raptor migration GRANTs to a role other than raptor_app.
Fix:
Add a role-creation step to the Raptor section of the migration-gate job in ci-pr.yml, modeled on the existing raptor_app step:
DO $$ BEGIN
IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = '<new_role_name>') THEN
CREATE ROLE <new_role_name> NOLOGIN;
END IF;
END $$;
Place this step before Raptor — alembic upgrade head.
Verification: Re-run the migration-gate job. The new role-creation step must exit 0, and Raptor — alembic upgrade head must exit 0 after it.
Symptom: Console — assert single head or Raptor — assert single head fails with output like:
Console head count: 2
ERROR: Console alembic has 2 heads — expected exactly 1.
Or alembic upgrade head itself may error:
DETAIL: Multiple heads are present; please specify the target revision.
Cause: Two migrations were merged into main that both set down_revision to the same parent revision. This is the duplicate-revision collision pattern. Historical examples:
0022, 0024 both set down_revision = "0021" / "0022" respectively — caused by concurrent feature branches merging without rebasing.0037_promote_trace_mode_transitions.py and 0038_promote_demo_paper_fill.py both claimed revision "0038". Fixed by renumbering:2225e2f2): renamed 0037_promote_trace_mode_transitions.py → 0038_promote_trace_mode_transitions.py, updated revision = "0038".fd175f8e): renamed 0038_promote_demo_paper_fill.py → 0040_promote_demo_paper_fill.py, updated revision = "0040" and down_revision = "0039", producing the correct linear chain: ...0037 → 0038 (trace_mode) → 0039 (auth_backup_codes) → 0040 (demo_paper_fill).Fix:
Identify the two colliding migrations. Choose the one that was merged later (higher PR number) and renumber it to the next available revision ID:
XXXX_old_name.py → YYYY_old_name.py where YYYY is the next unused number.revision = "YYYY" inside the file.down_revision = "ZZZZ" to point to the correct predecessor.alembic -c console/migrations/alembic.ini heads should return exactly one (head).The fix can be committed directly to a PR branch. The migration-gate job on that PR will validate the chain.
Verification: Console — assert single head or Raptor — assert single head must report head count: 1.
Symptom: alembic upgrade head fails with:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.InvalidColumnDefinition)
ERROR: cannot use a non-immutable function in index expression
Cause: A migration creates a partial index with a WHERE clause that uses a non-IMMUTABLE function (e.g., CURRENT_TIMESTAMP, now()). Postgres requires index predicates to be IMMUTABLE. This was the failure mode in PR #1551.
Fix: Change the index predicate to use only IMMUTABLE expressions (literal values, IMMUTABLE-marked functions). If a time-bounded partial index is genuinely needed, use a fixed literal timestamp or restructure the index.
Verification: alembic upgrade head must exit 0. Confirm the index exists with the correct predicate:
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = '<table>';
down_revision pointerSymptom: alembic upgrade head fails with:
alembic.util.exc.CommandError: Can't locate revision identified by '<hash>'
Cause: A migration's down_revision references a revision ID that does not exist in the chain. This happens when a migration file is deleted, renamed without updating the successor's down_revision, or when a PR adds a migration that references a revision that has not merged yet.
Fix: Update the down_revision in the orphaned migration to point to the correct predecessor revision ID. Run alembic history to see the full chain and identify the gap.
Verification: alembic history --verbose must show a contiguous chain with no gaps.
migration-gate fails
│
├── Step: "Console — alembic upgrade to 0015 (pre-seed)"
│ └── Cause: schema error before 0015 — look at migrations 0001–0015
│ → Feature-dev owner of the failing migration must fix it.
│
├── Step: "Console — seed kris@ admin row (required by 0016)"
│ └── Cause: psql syntax error or admins table missing
│ → Check that 0015 actually created the admins table (gate step ordering)
│ → If admins table is missing: upgrade-to-0015 step may have silently exited 0 on an error
│ → Escalate to operator if the admins schema has been modified in a recent migration
│
├── Step: "Console — alembic upgrade head"
│ ├── RuntimeError: operator not found in admins table
│ │ → A new migration requires a seed row that the gate does not supply.
│ │ → SRE adds seed step to ci-pr.yml. Must ship in the same PR as the migration.
│ ├── psycopg2.errors.UndefinedObject (role)
│ │ → A new migration GRANTs to a role the gate does not create.
│ │ → SRE adds role-creation step to ci-pr.yml. Must ship in the same PR as the migration.
│ ├── Other psycopg2 error
│ │ → Migration-specific Postgres incompatibility — feature-dev fixes the migration.
│ └── alembic.util.exc.CommandError: Can't locate revision
│ → Broken down_revision pointer — feature-dev fixes the chain.
│
├── Step: "Console — assert single head" / "Raptor — assert single head"
│ └── head count > 1
│ → Duplicate revision collision. Identify and renumber the later migration.
│ → Feature-dev (whichever PR introduced the collision) fixes it.
│ → SRE verifies the fix resolves the gate before approving.
│
├── Step: "Console — downgrade -1" / "Raptor — downgrade -1"
│ └── Downgrade errored
│ → The migration's downgrade() function has a bug.
│ → Feature-dev fixes the downgrade() path. All migrations must be reversible.
│
├── Step: "Raptor — create raptor_app role (required by 0002+)"
│ └── DO block errored — likely psql connectivity issue or syntax
│ → Check the service container is healthy (rare; usually a transient CI issue)
│ → Re-run the job once. If it fails a second time, file a type:reliability issue.
│
└── Step: "Raptor — alembic upgrade head"
├── psycopg2.errors.UndefinedObject (role)
│ → New migration GRANTs to a role the gate does not create → SRE adds role step
└── Other error
→ Feature-dev fixes the migration.
ci-pr.yml in the same PR as the migration.ci-pr.yml in the same PR as the migration.upgrade() or downgrade() body contains a bug (Postgres-incompatible expression, wrong SQL, broken FK ordering).down_revision pointer — requires the migration author to confirm the intended predecessor.ci-pr.yml without a documented reason.Contact: operator (kris@moosequest.net) via Slack #ops or direct message (D0AJ7K184TV).
| Reference | Description |
|---|---|
| PR #1623 | Gate introduced (ci-pr.yml, migration-gate job) |
| PR #1654 | kris@ admin row seed step added (two-phase Console upgrade) |
| PR #1656 | raptor_app role creation step added |
| PR #1781 | Emergency recovery — 0037_promote_trace_mode_transitions renamed to 0038 (duplicate 0037 chain repair) |
| PR #1778 | 0038_promote_demo_paper_fill renamed to 0040 (duplicate 0038 collision resolved) |
| Issue #1655 | Tracking issue — documents the seed-step pattern |
| Issue #1552 | Original gate request |
| Issue #1548 | Duplicate down_revision hotfix #1 |
| Issue #1550 | Duplicate down_revision hotfix #2 |
| Issue #1551 | Postgres-incompatible index predicate hotfix |
feedback_heroku_pg_rds_password_gotcha.md |
Why the gate uses ephemeral Postgres instead of the RDS-backed Heroku Postgres |
feedback_gh_actions_transitive_skip.md |
Why every job downstream of detect-changes needs an explicit if: clause |