Raptor Staging Postgres Cutover SOP
Card: #1567 (RM-9)
Epic: #1556
Design doc: docs/architecture/raptor-postgres-migration/migration-plan.md (Phase 3)
Target window: 2026-05-17 UTC (v1 launch 2026-05-23 UTC)
Scope
Cut Raptor on raxx-api-staging from SQLite to Heroku Postgres (Standard-0, addon
postgresql-adjacent-27271). After this lands, staging Raptor reads + writes
Postgres. SQLite remains the fallback only on the local dev path.
Out of scope:
- Prod cutover (RM-10)
FLAG_RAPTOR_APP_ROLE_SEPARATIONenable (RM-11)- Timescale hypertable creation on
trace_events/trace_workflows(SC-11)
Prerequisites
- [ ] RM-1 through RM-8 merged on
main; CI green - [ ]
heroku pg:info -a raxx-api-stagingshowsPlan: Standard 0, addonpostgresql-adjacent-27271, statusAvailable - [ ]
heroku config:get DATABASE_URL -a raxx-api-stagingreturns apostgresql://URL (Heroku auto-set on attach) - [ ] Operator has terminal access + Heroku CLI authenticated to
raxx-appteam - [ ] Sentry alerts active on staging (verify on
sentry.ioproject page)
Pre-checks (~5 min)
Run these and confirm output before touching anything:
# 1. Verify Postgres tier + addon
heroku pg:info -a raxx-api-staging
# Expect: Plan = Standard 0, addon = postgresql-adjacent-27271, status = Available
# 2. Verify DATABASE_URL points at Postgres
heroku config:get DATABASE_URL -a raxx-api-staging
# Expect: postgresql://...amazonaws.com:5432/...
# If sqlite+pysqlite:///... → STOP, addon not attached as primary; escalate
# 3. Apply alembic baseline (if not already applied during Phase 1 smoke)
heroku run alembic upgrade head -a raxx-api-staging
# Expect: "Running migrations: ... -> 0001_raptor_baseline, raptor baseline"
# If error "Can't locate revision identified by ..." → STOP, alembic env broken
# 4. Verify schema present
heroku pg:psql -a raxx-api-staging -c "\dt"
# Expect: ~35 tables present (users, sessions, paper_orders, historical_bars, ...)
# 5. Verify zero customer rows (pre-launch invariant)
heroku pg:psql -a raxx-api-staging -c "SELECT count(*) FROM users;"
# Expect: 0
Cutover (~3 min)
# 6. Restart dyno to pick up any cached DB engine references
heroku restart -a raxx-api-staging
# 7. Tail logs in a second terminal
heroku logs -a raxx-api-staging --tail
# Watch for: "Database engine initialized: dialect=postgresql"
# Watch against: any "sqlite3.OperationalError" or "database is locked"
# 8. Run staging smoke suite (after dyno is back up — ~30s)
curl -fsS https://api-staging.raxx.app/healthz
# Expect: {"status":"ok","db":"postgresql","version":"..."}
Smoke validation (~10 min)
Run the staging-smoke script:
scripts/ci/run_smoke.sh --env=staging
Or manually:
- [ ]
POST /api/auth/registerwith a test passkey → 200 - [ ]
POST /api/auth/loginwith that passkey → 200, session cookie issued - [ ]
GET /api/historical-data/AAPL?range=1d→ 200 with bars - [ ]
GET /api/billing/snapshot→ 200 (may be empty pre-launch) - [ ]
POST /api/auth/logout→ 200, session revoked - [ ] Re-query:
heroku pg:psql -a raxx-api-staging -c "SELECT count(*) FROM users;"→ 1
Soak (72 h before RM-10 prod cutover)
Leave staging running. Check at 24 / 48 / 72 h marks:
- [ ] Sentry shows zero new error events from Raptor staging
- [ ]
heroku logs -a raxx-api-staging --since 24h | grep -i sqlitereturns nothing - [ ]
heroku pg:info -a raxx-api-stagingshows healthy connection count - [ ] Manual smoke pass at each 24 h mark (auth + historical-data + billing)
Rollback (only valid pre-launch, while customer rows = 0)
# Revert DATABASE_URL to SQLite path
heroku config:set DATABASE_URL="sqlite+pysqlite:///./raptor.db" -a raxx-api-staging >/dev/null 2>&1
heroku restart -a raxx-api-staging
# Verify
heroku config:get DATABASE_URL -a raxx-api-staging
# Expect: sqlite+pysqlite:///./raptor.db
# Run pre-checks again to confirm Raptor runs on SQLite fallback
curl -fsS https://api-staging.raxx.app/healthz
# Expect: {"status":"ok","db":"sqlite"}
Note: Heroku dyno filesystem is ephemeral. Reverting to SQLite means the file is lost on any dyno restart. Rollback is acceptable only pre-launch (customer rows = 0). Once real customer data lives on Postgres, this rollback path is closed — fix-forward instead.
Common failures + fixes
| Symptom | Root cause | Fix |
|---|---|---|
alembic upgrade head errors with "Can't locate revision" |
Alembic env not properly initialized in Phase 1 | heroku run alembic stamp 0001_raptor_baseline -a raxx-api-staging then retry |
psycopg2.OperationalError: SSL SYSCALL error |
Heroku Postgres SSL bundle mismatch | Verify sslmode=require in URL; restart dyno |
sqlite3.OperationalError: no such table in logs after cutover |
Some callsite still using sqlite3.connect directly |
Audit git grep "sqlite3.connect" backend_v2/ — must be empty; revert + fix |
Healthcheck returns db: sqlite after cutover |
DATABASE_URL still points to SQLite | Verify heroku config:get DATABASE_URL; re-set with postgresql:// URL from heroku pg:credentials:url |
| Session creation fails post-cutover | customer_session_service.py jsonb mismatch |
Check Sentry; likely RM-4 port missed a column type |
Connects to
- Design doc:
docs/architecture/raptor-postgres-migration/design.md - Migration plan:
docs/architecture/raptor-postgres-migration/migration-plan.md - ADR-0069 (SQLAlchemy 2.x + psycopg2-binary)
- ADR-0070 (pytest-postgresql for test fixtures)
- Blocks: RM-10 (prod cutover), RM-11 (role separation), SC-11 (Timescale hypertable)
- Staging-as-runtime-dup principle:
docs/architecture/principles/staging-is-a-runtime-dup.md