Raptor Staging Postgres Cutover SOP

Card: #1567 (RM-9) Epic: #1556 Design doc: docs/architecture/raptor-postgres-migration/migration-plan.md (Phase 3) Target window: 2026-05-17 UTC (v1 launch 2026-05-23 UTC)

Scope

Cut Raptor on raxx-api-staging from SQLite to Heroku Postgres (Standard-0, addon postgresql-adjacent-27271). After this lands, staging Raptor reads + writes Postgres. SQLite remains the fallback only on the local dev path.

Out of scope:

Prod cutover (RM-10)
FLAG_RAPTOR_APP_ROLE_SEPARATION enable (RM-11)
Timescale hypertable creation on trace_events / trace_workflows (SC-11)

Prerequisites

[ ] RM-1 through RM-8 merged on main; CI green
[ ] heroku pg:info -a raxx-api-staging shows Plan: Standard 0, addon postgresql-adjacent-27271, status Available
[ ] heroku config:get DATABASE_URL -a raxx-api-staging returns a postgresql:// URL (Heroku auto-set on attach)
[ ] Operator has terminal access + Heroku CLI authenticated to raxx-app team
[ ] Sentry alerts active on staging (verify on sentry.io project page)

Pre-checks (~5 min)

Run these and confirm output before touching anything:

# 1. Verify Postgres tier + addon
heroku pg:info -a raxx-api-staging
# Expect: Plan = Standard 0, addon = postgresql-adjacent-27271, status = Available

# 2. Verify DATABASE_URL points at Postgres
heroku config:get DATABASE_URL -a raxx-api-staging
# Expect: postgresql://...amazonaws.com:5432/...
# If sqlite+pysqlite:///... → STOP, addon not attached as primary; escalate

# 3. Apply alembic baseline (if not already applied during Phase 1 smoke)
heroku run alembic upgrade head -a raxx-api-staging
# Expect: "Running migrations: ... -> 0001_raptor_baseline, raptor baseline"
# If error "Can't locate revision identified by ..." → STOP, alembic env broken

# 4. Verify schema present
heroku pg:psql -a raxx-api-staging -c "\dt"
# Expect: ~35 tables present (users, sessions, paper_orders, historical_bars, ...)

# 5. Verify zero customer rows (pre-launch invariant)
heroku pg:psql -a raxx-api-staging -c "SELECT count(*) FROM users;"
# Expect: 0

Cutover (~3 min)

# 6. Restart dyno to pick up any cached DB engine references
heroku restart -a raxx-api-staging

# 7. Tail logs in a second terminal
heroku logs -a raxx-api-staging --tail
# Watch for: "Database engine initialized: dialect=postgresql"
# Watch against: any "sqlite3.OperationalError" or "database is locked"

# 8. Run staging smoke suite (after dyno is back up — ~30s)
curl -fsS https://api-staging.raxx.app/healthz
# Expect: {"status":"ok","db":"postgresql","version":"..."}

Smoke validation (~10 min)

Run the staging-smoke script:

scripts/ci/run_smoke.sh --env=staging

Or manually:

[ ] POST /api/auth/register with a test passkey → 200
[ ] POST /api/auth/login with that passkey → 200, session cookie issued
[ ] GET /api/historical-data/AAPL?range=1d → 200 with bars
[ ] GET /api/billing/snapshot → 200 (may be empty pre-launch)
[ ] POST /api/auth/logout → 200, session revoked
[ ] Re-query: heroku pg:psql -a raxx-api-staging -c "SELECT count(*) FROM users;" → 1

Soak (72 h before RM-10 prod cutover)

Leave staging running. Check at 24 / 48 / 72 h marks:

[ ] Sentry shows zero new error events from Raptor staging
[ ] heroku logs -a raxx-api-staging --since 24h | grep -i sqlite returns nothing
[ ] heroku pg:info -a raxx-api-staging shows healthy connection count
[ ] Manual smoke pass at each 24 h mark (auth + historical-data + billing)

Rollback (only valid pre-launch, while customer rows = 0)

# Revert DATABASE_URL to SQLite path
heroku config:set DATABASE_URL="sqlite+pysqlite:///./raptor.db" -a raxx-api-staging >/dev/null 2>&1
heroku restart -a raxx-api-staging

# Verify
heroku config:get DATABASE_URL -a raxx-api-staging
# Expect: sqlite+pysqlite:///./raptor.db

# Run pre-checks again to confirm Raptor runs on SQLite fallback
curl -fsS https://api-staging.raxx.app/healthz
# Expect: {"status":"ok","db":"sqlite"}

Note: Heroku dyno filesystem is ephemeral. Reverting to SQLite means the file is lost on any dyno restart. Rollback is acceptable only pre-launch (customer rows = 0). Once real customer data lives on Postgres, this rollback path is closed — fix-forward instead.

Common failures + fixes

Symptom	Root cause	Fix
`alembic upgrade head` errors with "Can't locate revision"	Alembic env not properly initialized in Phase 1	`heroku run alembic stamp 0001_raptor_baseline -a raxx-api-staging` then retry
`psycopg2.OperationalError: SSL SYSCALL error`	Heroku Postgres SSL bundle mismatch	Verify `sslmode=require` in URL; restart dyno
`sqlite3.OperationalError: no such table` in logs after cutover	Some callsite still using `sqlite3.connect` directly	Audit `git grep "sqlite3.connect" backend_v2/` — must be empty; revert + fix
Healthcheck returns `db: sqlite` after cutover	DATABASE_URL still points to SQLite	Verify `heroku config:get DATABASE_URL`; re-set with `postgresql://` URL from `heroku pg:credentials:url`
Session creation fails post-cutover	`customer_session_service.py` jsonb mismatch	Check Sentry; likely RM-4 port missed a column type

Connects to

Design doc: docs/architecture/raptor-postgres-migration/design.md
Migration plan: docs/architecture/raptor-postgres-migration/migration-plan.md
ADR-0069 (SQLAlchemy 2.x + psycopg2-binary)
ADR-0070 (pytest-postgresql for test fixtures)
Blocks: RM-10 (prod cutover), RM-11 (role separation), SC-11 (Timescale hypertable)
Staging-as-runtime-dup principle: docs/architecture/principles/staging-is-a-runtime-dup.md