Raxx · internal docs

internal · gated

Raptor Staging Postgres Cutover SOP

Card: #1567 (RM-9) Epic: #1556 Design doc: docs/architecture/raptor-postgres-migration/migration-plan.md (Phase 3) Target window: 2026-05-17 UTC (v1 launch 2026-05-23 UTC)

Scope

Cut Raptor on raxx-api-staging from SQLite to Heroku Postgres (Standard-0, addon postgresql-adjacent-27271). After this lands, staging Raptor reads + writes Postgres. SQLite remains the fallback only on the local dev path.

Out of scope:

Prerequisites

Pre-checks (~5 min)

Run these and confirm output before touching anything:

# 1. Verify Postgres tier + addon
heroku pg:info -a raxx-api-staging
# Expect: Plan = Standard 0, addon = postgresql-adjacent-27271, status = Available

# 2. Verify DATABASE_URL points at Postgres
heroku config:get DATABASE_URL -a raxx-api-staging
# Expect: postgresql://...amazonaws.com:5432/...
# If sqlite+pysqlite:///... → STOP, addon not attached as primary; escalate

# 3. Apply alembic baseline (if not already applied during Phase 1 smoke)
heroku run alembic upgrade head -a raxx-api-staging
# Expect: "Running migrations: ... -> 0001_raptor_baseline, raptor baseline"
# If error "Can't locate revision identified by ..." → STOP, alembic env broken

# 4. Verify schema present
heroku pg:psql -a raxx-api-staging -c "\dt"
# Expect: ~35 tables present (users, sessions, paper_orders, historical_bars, ...)

# 5. Verify zero customer rows (pre-launch invariant)
heroku pg:psql -a raxx-api-staging -c "SELECT count(*) FROM users;"
# Expect: 0

Cutover (~3 min)

# 6. Restart dyno to pick up any cached DB engine references
heroku restart -a raxx-api-staging

# 7. Tail logs in a second terminal
heroku logs -a raxx-api-staging --tail
# Watch for: "Database engine initialized: dialect=postgresql"
# Watch against: any "sqlite3.OperationalError" or "database is locked"

# 8. Run staging smoke suite (after dyno is back up — ~30s)
curl -fsS https://api-staging.raxx.app/healthz
# Expect: {"status":"ok","db":"postgresql","version":"..."}

Smoke validation (~10 min)

Run the staging-smoke script:

scripts/ci/run_smoke.sh --env=staging

Or manually:

Soak (72 h before RM-10 prod cutover)

Leave staging running. Check at 24 / 48 / 72 h marks:

Rollback (only valid pre-launch, while customer rows = 0)

# Revert DATABASE_URL to SQLite path
heroku config:set DATABASE_URL="sqlite+pysqlite:///./raptor.db" -a raxx-api-staging >/dev/null 2>&1
heroku restart -a raxx-api-staging

# Verify
heroku config:get DATABASE_URL -a raxx-api-staging
# Expect: sqlite+pysqlite:///./raptor.db

# Run pre-checks again to confirm Raptor runs on SQLite fallback
curl -fsS https://api-staging.raxx.app/healthz
# Expect: {"status":"ok","db":"sqlite"}

Note: Heroku dyno filesystem is ephemeral. Reverting to SQLite means the file is lost on any dyno restart. Rollback is acceptable only pre-launch (customer rows = 0). Once real customer data lives on Postgres, this rollback path is closed — fix-forward instead.

Common failures + fixes

Symptom Root cause Fix
alembic upgrade head errors with "Can't locate revision" Alembic env not properly initialized in Phase 1 heroku run alembic stamp 0001_raptor_baseline -a raxx-api-staging then retry
psycopg2.OperationalError: SSL SYSCALL error Heroku Postgres SSL bundle mismatch Verify sslmode=require in URL; restart dyno
sqlite3.OperationalError: no such table in logs after cutover Some callsite still using sqlite3.connect directly Audit git grep "sqlite3.connect" backend_v2/ — must be empty; revert + fix
Healthcheck returns db: sqlite after cutover DATABASE_URL still points to SQLite Verify heroku config:get DATABASE_URL; re-set with postgresql:// URL from heroku pg:credentials:url
Session creation fails post-cutover customer_session_service.py jsonb mismatch Check Sentry; likely RM-4 port missed a column type

Connects to