Raptor Prod Postgres Cutover SOP
Runbook ID: RM-10
Card: #1568
Epic: #1556
Staging SOP (RM-9): docs/ops/runbooks/raptor-postgres-staging-cutover.md
Roles runbook: docs/ops/runbooks/raptor-postgres-roles.md
Design doc: docs/architecture/raptor-postgres-migration/design.md
Migration plan: docs/architecture/raptor-postgres-migration/migration-plan.md
Target cutover deadline: 2026-05-16 23:59 UTC
Estimated wall-clock: 1–2 hours (including smoke validation)
Operator pre-cutover briefing
This SOP cuts raxx-api-prod from SQLite to Heroku Postgres Standard-0.
The 48-hour safety buffer before the 2026-05-23 UTC v1 launch is the
hard floor. Prod cutover must land by 2026-05-16 23:59 UTC to leave
the full buffer intact.
This is a pre-launch, zero-customer environment. No customer data
exists in SQLite and no customer data will be migrated. The rollback path
(reverting to SQLite) is valid only while the Postgres users table shows
count(*) = 0. Once any customer row exists, rollback is closed — you
must fix forward.
Do NOT execute this runbook concurrently with any other prod maintenance. Execute from a single terminal session. Keep a second terminal open for log tailing.
Downstream dependency: RM-11 (#1569) (SC-A1 role separation) gates on this cutover being complete. Do not start RM-11 until this runbook is signed off at the bottom.
Section 1 — Pre-flight checklist
Complete every item before entering the maintenance window. Each command output must be captured in the cutover record (see Section 8).
1.1 RM-9 staging soak verified clean
The RM-9 staging soak window closed 2026-05-13 22:55 UTC. Confirm no
Sentry errors have fired for raxx-api-staging since 2026-05-10 22:55 UTC
(the RM-9 staging cutover timestamp).
# Open Sentry → project: raxx-backend → filter by environment=staging
# Date range: 2026-05-10 22:55 UTC to now
# Expected: zero unresolved issues tagged environment=staging
Also run the staging log check:
heroku logs -a raxx-api-staging --since 72h | grep -i "sqlite\|OperationalError\|database is locked"
# Expected: no output
If Sentry shows errors or the grep returns hits: STOP. Do not proceed to prod. Investigate on staging first; file a blocking comment on #1568.
1.2 Heroku CLI authenticated
heroku auth:whoami
# Expected: your Heroku account email
# If not: heroku login
Confirm access to the raxx-app team and the raxx-api-prod app:
heroku apps --team raxx-app | grep raxx-api-prod
# Expected: raxx-api-prod
1.3 Prod Postgres Standard-0 provisioned and healthy
heroku addons -a raxx-api-prod
# Expected: one addon of type heroku-postgresql, plan standard-0
# Capture the addon name (e.g. postgresql-<adjective>-NNNNN) — you need it for the upgrade step if it is still Essential-tier
heroku pg:info -a raxx-api-prod
# Expected: Plan: Standard 0, Status: Available
# If Plan: Essential-0 or Essential-1 or Essential-2 — see Section 2.1 (upgrade step)
# If Status: anything other than Available — STOP, wait for Heroku support
1.4 DATABASE_URL is already a Postgres URL
Heroku sets DATABASE_URL automatically when the Postgres addon is attached
as the primary. Verify it points to Postgres, not SQLite:
heroku config:get DATABASE_URL -a raxx-api-prod
# Expected: postgresql://... (amazonaws.com:5432 in the hostname)
# If sqlite+pysqlite:///./raptor.db — STOP, addon is not attached as primary
1.5 Infisical credentials valid
Credential staleness has caused silent failures before. Verify before you start (see project memory note on session env staleness):
# Confirm any Infisical-sourced env vars you rely on during this runbook are fresh.
# At minimum, confirm you can reach the vault:
curl -fsS https://vault.raxx.app/api/status
# Expected: 200 with {"status":"ok"}
If the vault token has expired, refresh it via infisical login before
proceeding. Do not skip this — a stale token mid-runbook will cause a
confusing partial state.
1.6 Alembic migrations current on main
git fetch origin main
git log --oneline origin/main -5
# Confirm the latest RM-phase migrations are in the tip of main
1.7 Snapshot confirmation (rollback anchor)
Because the Heroku dyno filesystem is ephemeral, the SQLite file at
./raptor.db on the dyno is not accessible for backup via the CLI. The
rollback procedure (Section 6) reverts DATABASE_URL to the SQLite path,
which causes the dyno to recreate an empty database from schema on next
restart. This is safe only because we are pre-launch with zero customers.
Confirm this invariant before proceeding:
# Run this AFTER Postgres is confirmed available (Section 1.3):
heroku pg:psql -a raxx-api-prod -c "SELECT count(*) FROM users;" 2>/dev/null || echo "TABLE_NOT_FOUND"
# Expected: 0 rows (table may not exist yet if alembic has not run — TABLE_NOT_FOUND is also acceptable)
# If count > 0: STOP. Customer data exists. Rollback is NOT safe. Escalate immediately.
Record the output in the cutover log.
Section 2 — Maintenance window plan
2.1 Expected downtime
- Heroku maintenance mode: Raptor returns HTTP 503 to all callers during the window.
- Expected user-visible downtime: 5–15 minutes (restart + schema migration + smoke).
- Pre-launch context: no real users are affected. The operator is the only caller.
2.2 Enable maintenance mode
heroku maintenance:on -a raxx-api-prod
# Expected: Enabling maintenance mode for raxx-api-prod... done
heroku maintenance -a raxx-api-prod
# Expected: on
2.3 Rollback trigger
If any of the following occur, execute Section 6 (Rollback) immediately:
- More than 15 minutes elapse from
heroku restart(Section 4.8) without the health endpoint returning 200. heroku logs --tailshows a crash loop (Error R10,Error H10, or repeatedStopping all processeswithin the same dyno boot).- The smoke suite (Section 5) returns failures that cannot be diagnosed within 20 minutes.
Section 3 — Step-by-step Postgres upgrade (if still Essential-tier)
Skip this section if heroku pg:info (Section 1.3) already shows
Plan: Standard 0. If it shows Essential-tier, complete this section first.
Standard-0 is required before pg:credentials:create will work.
Essential-tier databases reject the command with:
Error: You can't create a custom credential on Essential-tier databases.
# Step 3.1 — Identify the addon name
heroku addons -a raxx-api-prod
# Note the addon name, e.g. postgresql-adjacent-27271
# Step 3.2 — Upgrade to Standard-0
heroku addons:upgrade <addon-name> heroku-postgresql:standard-0 -a raxx-api-prod
# Expected: Starting upgrade of postgresql-<name> to heroku-postgresql:standard-0... (free)
# Heroku will display a maintenance window warning — this is expected and acceptable pre-launch.
# Step 3.3 — Poll until upgrade completes
heroku pg:info -a raxx-api-prod
# Repeat every 60 seconds until: Plan: Standard 0, Status: Available
# Typical upgrade time: 2–5 minutes
Do not proceed to Section 4 until the addon shows Standard 0 and Available.
Section 4 — Step-by-step cutover commands
Run these in order. Capture every command and its output in the cutover record.
4.1 Run Alembic migrations on prod Postgres
This creates the full schema on the prod Postgres database. It is safe to run multiple times (Alembic is idempotent via revision tracking).
heroku run alembic upgrade head -a raxx-api-prod
# Expected output ends with:
# Running upgrade ... -> 0001_raptor_baseline, raptor baseline
# (or the current head revision if later RM phases added migrations)
#
# If error: "Can't locate revision identified by '...'"
# → Run: heroku run alembic stamp 0001_raptor_baseline -a raxx-api-prod
# → Then retry: heroku run alembic upgrade head -a raxx-api-prod
4.2 Verify schema is present
heroku pg:psql -a raxx-api-prod -c "\dt"
# Expected: ~35 tables listed (users, sessions, paper_orders, historical_bars,
# webauthn_credentials, backup_codes, customer_audit_events, ...)
# If fewer than 35 tables: migration did not fully apply — STOP, re-run step 4.1
4.3 Confirm zero customer rows
heroku pg:psql -a raxx-api-prod -c "SELECT count(*) FROM users;"
# Expected: 0
# If not 0: STOP. Do not proceed. Escalate.
4.4 Create the raptor_app credential
IMPORTANT — RDS password workaround. Heroku Postgres runs on AWS RDS.
The DATABASE_URL owner credential is NOT a member of the RDS rds_password
group. Attempting to create a role via CREATE ROLE ... WITH LOGIN PASSWORD
directly in psql will fail with:
ERROR: must be a member of rds_password to alter passwords
Do NOT use raw SQL to create the role. Use the Heroku CLI path, which goes through Heroku's RDS-aware credential management:
heroku pg:credentials:create DATABASE --name raptor_app -a raxx-api-prod
# Expected: Adding credential raptor_app to postgresql-<name>... done
#
# Argument order matters:
# - "DATABASE" is the positional addon identifier (literal string)
# - "--name raptor_app" is the credential/role name
# DO NOT swap them — the CLI returns "Unknown database: raptor_app" if the
# role name goes in the positional slot.
Reference: feedback_heroku_pg_rds_password_gotcha.md and
docs/ops/runbooks/raptor-postgres-roles.md (Failure mode A).
4.5 Retrieve and store the raptor_app connection URL
heroku pg:credentials:url DATABASE --name raptor_app -a raxx-api-prod
# Copy the "Connection URL:" value (postgres://raptor_app:...) shown.
# Handle this URL as a secret — it contains the credential password.
Set it in Heroku config. Stdout is silenced because the URL contains
the password — echoing it to the terminal is a secret leak
(feedback_heroku_config_set_echoes_secrets.md):
heroku config:set RAPTOR_APP_DATABASE_URL="<URL from above>" -a raxx-api-prod >/dev/null 2>&1
Verify the var is set without exposing the value:
heroku config:get RAPTOR_APP_DATABASE_URL -a raxx-api-prod | cut -c1-40
# Expected: postgres://raptor_app:... (first 40 chars only)
4.6 Apply DML grants and revoke migration-table access
Open a psql session as the DATABASE owner:
heroku pg:psql -a raxx-api-prod
Run this block as a single copy-paste unit:
-- Full DML on all application tables
-- (SELECT already granted by pg:credentials:create)
GRANT INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO raptor_app;
-- Sequence access (required for INSERT into SERIAL/BIGSERIAL columns)
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO raptor_app;
-- Deny access to migration-management tables —
-- raptor_app must never be able to stamp a migration as applied
REVOKE ALL ON TABLE alembic_version FROM raptor_app;
REVOKE ALL ON TABLE schema_migrations FROM raptor_app;
Still in psql, verify the grants landed:
SELECT rolname, rolcanlogin FROM pg_catalog.pg_roles WHERE rolname = 'raptor_app';
-- Expected: raptor_app | t
\dp historical_bars
-- Expected: raptor_app=arwd/... in the Access Privileges column
-- a=INSERT, r=SELECT, w=UPDATE, d=DELETE
\dp alembic_version
-- Expected: NO raptor_app entry (or an empty privilege string)
Exit psql (\q).
4.7 Tail logs in a second terminal (open before step 4.8)
Open a second terminal window and start tailing before the restart:
heroku logs -a raxx-api-prod --tail
Keep this window visible throughout steps 4.8–4.10.
4.8 Restart the dyno
heroku restart -a raxx-api-prod
# Wait ~30 seconds for the dyno to come up
In the log tail (second terminal), watch for:
Database engine initialized: dialect=postgresql
Watch against:
sqlite3.OperationalError
database is locked
role "raptor_app" does not exist
permission denied for table
If any of the "watch against" patterns appear within 2 minutes of restart: execute the rollback (Section 6) immediately.
4.9 Disable maintenance mode
Only after the dyno is up and the health endpoint responds (verify in step 5.1 below before running this):
heroku maintenance:off -a raxx-api-prod
# Expected: Disabling maintenance mode for raxx-api-prod... done
4.10 Capture current alembic head revision
heroku run alembic current -a raxx-api-prod
# Capture the revision hash shown. Record it in the cutover log.
# Example: INFO [alembic.runtime.migration] Running on postgresql
# Current revision for postgresql://...: 0001_raptor_baseline (head)
Section 5 — Post-cutover verification
Complete all items before declaring the cutover successful.
5.1 Health endpoint
curl -fsS https://api.raxx.app/healthz
# Expected: {"status":"ok","db":"postgresql","version":"..."}
# If db is "sqlite": DATABASE_URL is still pointing at SQLite — see rollback
# If HTTP error: check heroku logs for the reason
5.2 Trade-related read endpoints
# Historical data fetch (uses HistoricalBarRepository → Postgres)
curl -fsS "https://api.raxx.app/api/historical-data/AAPL?range=1d" \
-H "Cookie: <session-cookie-from-auth-smoke>"
# Expected: 200 with OHLCV bar data
# Billing snapshot (reads vendor_billing_snapshots table)
curl -fsS "https://api.raxx.app/api/billing/snapshot" \
-H "Cookie: <session-cookie-from-auth-smoke>"
# Expected: 200 (may return empty array pre-launch)
5.3 Auth flow smoke
# Register a test passkey
# POST /api/auth/register with a throwaway test user
# Login with that passkey
# POST /api/auth/login — expect 200 + session cookie
# Verify session: GET /api/system/whoami or equivalent authed endpoint
# Logout: POST /api/auth/logout — expect 200
# Verify in Postgres the test user row was created:
heroku pg:psql -a raxx-api-prod -c "SELECT id, email, created_at FROM users ORDER BY created_at DESC LIMIT 3;"
Or run the full smoke suite if available:
scripts/ci/run_smoke.sh --env=prod
5.4 Schema migration state
heroku run alembic current -a raxx-api-prod
# Expected revision must match what was captured in step 4.10.
# If it shows "(head)" that is correct.
# If it shows a revision hash below the expected head: a migration did not apply — re-run step 4.1.
Verify the table count:
heroku pg:psql -a raxx-api-prod -c "SELECT count(*) FROM information_schema.tables WHERE table_schema='public';"
# Expected: approximately 35 (exact number depends on RM-phase migrations merged to main)
5.5 Sentry: no init errors
Open Sentry → project: raxx-backend → environment: production.
Filter to the last 30 minutes. Confirm:
- Zero
OperationalErrorevents - Zero
ProgrammingErrorevents - Zero
role "raptor_app" does not existevents - Zero crash events (
Error R10,Error H10)
If any of these appear: do NOT proceed to monitoring phase. Investigate and either fix forward or execute rollback (Section 6) depending on severity.
5.6 Log scan for SQLite references
heroku logs -a raxx-api-prod --since 30m | grep -i "sqlite\|database is locked\|OperationalError"
# Expected: no output
Section 6 — Rollback procedure
This rollback is valid only pre-launch (zero customer rows on Postgres).
Once any real customer data exists in Postgres, this procedure destroys it.
Check SELECT count(*) FROM users on Postgres before running. If count > 0,
escalate to the operator — you must fix forward, not roll back.
The Heroku dyno filesystem is ephemeral. Reverting DATABASE_URL to the
SQLite path causes the dyno to start a new empty SQLite database from scratch
on next boot. This is acceptable pre-launch because no production customer
data existed in SQLite either.
Rollback sequence
# Step R1 — Enable maintenance mode (if not already on)
heroku maintenance:on -a raxx-api-prod
# Step R2 — Revert DATABASE_URL to SQLite path
heroku config:set DATABASE_URL="sqlite+pysqlite:///./raptor.db" -a raxx-api-prod >/dev/null 2>&1
# Step R3 — Restart dyno to pick up the reverted URL
heroku restart -a raxx-api-prod
# Step R4 — Verify
heroku config:get DATABASE_URL -a raxx-api-prod
# Expected: sqlite+pysqlite:///./raptor.db
curl -fsS https://api.raxx.app/healthz
# Expected: {"status":"ok","db":"sqlite"}
# Step R5 — Disable maintenance mode
heroku maintenance:off -a raxx-api-prod
Post-rollback actions
- Leave
RAPTOR_APP_DATABASE_URLand theraptor_appPostgres credential in place. Removing them is unnecessary and could complicate re-attempt. - File a postmortem comment on #1568 describing what failed and why.
- Do not re-attempt the prod cutover until the root cause is identified and resolved on staging.
Section 7 — Post-cutover monitoring
7.1 Active watch (first 4 hours)
Keep a terminal with heroku logs -a raxx-api-prod --tail running.
Check every 15–30 minutes for the first 4 hours:
- [ ] No
OperationalErrororProgrammingErrorin logs - [ ] No
permission denied for table(raptor_app grant drift) - [ ] No
sqlite3.OperationalErrorordatabase is locked - [ ] Health endpoint still returns
db: postgresql - [ ] Sentry shows zero new unresolved issues in
raxx-backendproduction
# Quick 4-hour check (run each time):
heroku logs -a raxx-api-prod --since 30m | grep -i "error\|sqlite\|permission denied\|OperationalError" | tail -20
curl -fsS https://api.raxx.app/healthz
7.2 Dashboard watch (first 24 hours)
At the 8-hour and 24-hour marks:
# Postgres connection health
heroku pg:info -a raxx-api-prod
# Watch: Connection Count (should not be near max)
# Watch: Status (must be Available)
# Log scan for SQLite remnants
heroku logs -a raxx-api-prod --since 24h | grep -i "sqlite\|OperationalError" | wc -l
# Expected: 0
# Table count stable
heroku pg:psql -a raxx-api-prod -c "SELECT count(*) FROM information_schema.tables WHERE table_schema='public';"
7.3 Escalation triggers
Escalate immediately (Slack #ops-alerts and direct ping to Kristerpher) if:
- Any
permission denied for tableappears in prod logs (raptor_app grant drift) - Sentry fires a new
OperationalErrorin the production environment - Postgres
Statusshows anything other thanAvailableinpg:info - Health endpoint returns
db: sqliteat any point after cutover - Connection count approaches the Standard-0 limit (25 connections max on Standard-0)
7.4 RM-11 gate
Do not start RM-11 (#1569) until:
- 4-hour active watch is complete with no issues
- Sentry shows zero new errors in the production environment
- Operator has signed off in Section 8
Section 8 — Cutover record and sign-off
Fill this in during execution. Copy-paste actual command output into each field. This record is the artifact for the RM-10 card close.
Cutover ID: RM-10-<YYYYMMDD-HHMM>UTC (e.g. RM-10-20260516-1430UTC)
Operator: ____________________________
Maintenance mode ON at (UTC): ____________________________
Step 4.1 alembic upgrade head output (last line):
___________________________________________________________________
Step 4.2 \dt row count: ____________________________
Step 4.3 users count: ____________________________
Step 4.4 pg:credentials:create output:
___________________________________________________________________
Step 4.5 RAPTOR_APP_DATABASE_URL prefix (first 40 chars):
___________________________________________________________________
Step 4.6 raptor_app rolcanlogin: ____________________________
Step 4.8 restart timestamp (UTC): ____________________________
"dialect=postgresql" seen in logs: [ ] yes [ ] no (if no — rollback)
Step 4.9 maintenance mode OFF at (UTC): ____________________________
Step 4.10 alembic current revision:
___________________________________________________________________
Section 5.1 healthz db field: ____________________________
Section 5.5 Sentry clear (y/n): ____________________________
4-hour watch ended at (UTC): ____________________________
4-hour watch result: [ ] clean [ ] issues found (see note)
Notes:
___________________________________________________________________
___________________________________________________________________
Operator sign-off (pre-cutover dry-read):
I have read this runbook end-to-end. Pre-flight checks are complete. RM-9 staging soak is verified clean.
Operator: ____ Date (UTC): _____
Ops sign-off (post-cutover, 4-hour watch complete):
Cutover is stable. Sentry clean. Health endpoint returns db: postgresql.
RM-11 gate is open.
Signed: ____ Date (UTC): _______
Connects to
- Staging cutover SOP:
docs/ops/runbooks/raptor-postgres-staging-cutover.md(RM-9) - Roles runbook:
docs/ops/runbooks/raptor-postgres-roles.md(used in Section 4.4–4.6) - Feedback:
feedback_heroku_pg_rds_password_gotcha.md(rds_password restriction) - Feedback:
feedback_heroku_config_set_echoes_secrets.md(stdout silencing) - RM-11 (role separation): #1569 — blocked on this SOP
- Epic: #1556
- ADR-0069 (SQLAlchemy 2.x + psycopg2-binary)
- ADR-0070 (pytest-postgresql for test fixtures)