Raptor Postgres Migration — Phased Rollout Plan
Epic: #1556
Design doc: docs/architecture/raptor-postgres-migration/design.md
Target: v1 launch 2026-05-23 UTC
Total budget: 7 dev-days across Phases 1–4
Timeline Overview
2026-05-10 UTC Phase 0 — this design PR
2026-05-12 UTC Phase 1 — alembic + engine (RM-1, RM-2, RM-3)
2026-05-14 UTC Phase 2 — callsite ports (RM-4…RM-7) + fixtures (RM-8)
2026-05-17 UTC Phase 3 — staging cutover (RM-9), then prod cutover (RM-10)
2026-05-18 UTC Phase 4 — SC-A1 reactivation (RM-11)
2026-05-19 UTC Buffer — soak + final audit-v2/RBAC-v2/Queue unblock
2026-05-23 UTC v1 launch gate
Phase 0 — Design Pass (Today, 2026-05-10 UTC)
Owner: architect
Dev-days: 0 (design only)
Deliverables:
docs/architecture/raptor-postgres-migration/design.mddocs/architecture/raptor-postgres-migration/migration-plan.mddocs/architecture/raptor-postgres-migration/sub-cards-to-file.md- ADR-0069, ADR-0070
- GitHub sub-cards RM-1 through RM-11 filed (via PM agent)
Rollback: N/A (no code changes)
Smoke-test checklist:
- [ ] Design PR merges; docs readable on GitHub
- [ ] ADRs present in
docs/architecture/adr/ - [ ] Sub-cards filed and linked to #1556
Phase 1 — Alembic + SQLAlchemy + Connection Layer (~2 dev-days)
Cards: RM-1 (alembic init + engine module), RM-2 (psycopg2-binary dep), RM-3 (schema baseline migration)
Target merge: 2026-05-12 UTC
What lands
psycopg2-binaryadded tobackend_v2/requirements.txt(RM-2)SQLAlchemy>=2.0added tobackend_v2/requirements.txt(RM-1)alembic>=1.13added tobackend_v2/requirements.txt(RM-1)- New
backend_v2/alembic.ini+backend_v2/alembic/env.pythat readsDATABASE_URL(RM-1) backend_v2/db/database.pyrewritten:Databaseclass uses SQLAlchemy engine;connect()returnssqlalchemy.engine.Connectioncontext manager; SQLite pragma block applies only whenengine.dialect.name == "sqlite"(RM-1)backend_v2/db/__init__.pyupdated:init_persistence()constructs engine fromDATABASE_URLenv var or falls back to SQLite (RM-1)- Alembic baseline migration
backend_v2/alembic/versions/0001_raptor_baseline.py: full Postgres-typed DDL for all 35 tables (RM-3) - Old
.sqlmigration files inbackend_v2/db/migrations/remain; they are still used by the SQLite dev path. No deletion in this phase. - Updated
DEVELOPMENT.md(or equivalent) documentingDATABASE_URLlocal-dev switch (RM-2)
Dependent cards
- RM-1 must land before RM-3 (alembic env needed to write the migration)
- RM-2 can land with RM-1 (same PR)
- RM-3 can be a separate PR to keep diff size manageable
Rollback path
DATABASE_URL is not yet set on staging. The old SQLite path (database.py
fallback) still works. If the new engine module breaks the SQLite path: revert
the database.py PR. No data migration has occurred.
Smoke-test checklist
- [ ]
pytest backend_v2/passes on SQLite (no DATABASE_URL set) - [ ]
alembic upgrade headruns on a local/CI Postgres without error - [ ]
alembic downgrade basedrops all tables cleanly - [ ]
heroku run alembic upgrade head -a raxx-api-stagingsucceeds (dry run; tables created on staging PG) - [ ]
heroku pg:psql -a raxx-api-staging -c "\dt"shows all expected tables
Phase 2 — Callsite Migration (~2–3 dev-days)
Cards: RM-4 (auth routes), RM-5 (trade routes), RM-6 (market data routes), RM-7 (settings + admin), RM-8 (test fixtures)
Target merge: 2026-05-14–15 UTC
Parallelizable: RM-4, RM-5, RM-6, RM-7 can be worked in parallel by different devs; RM-8 can be drafted alongside
What lands per card
Each card follows the same pattern:
1. Replace SQLite ?-style params with SQLAlchemy text("... :p") + dict params
2. Replace conn.cursor() calls with conn.execute(text(...))
3. Replace sqlite3.Row dict-access with row._mapping[col] or row.col
4. Remove any connection.commit() calls within service functions — SQLAlchemy
connection context manager handles commit/rollback
5. Replace INTEGER boolean comparisons (== 1) with Python bool checks
6. Replace CURRENT_TIMESTAMP string defaults with datetime.now(timezone.utc)
RM-4 auth routes (auth.py, auth_email.py, webauthn_service.py,
backup_codes_service.py, bootstrap_token_service.py, customer_session_service.py):
Highest-touch. webauthn_service.py uses cursor() extensively — must be rewritten.
RM-5 trade routes (trading.py, paper_order_service.py, trading_runtime.py):
Medium-touch.
RM-6 market data routes (historical_data.py, symbols.py, market_data.py,
backtest.py, repositories.py): HistoricalBarRepository uses direct SQL; rewrite
to text() + SQLAlchemy. repositories.py is the main target.
RM-7 settings + admin (settings.py, admin_customers.py, onboarding.py,
billing.py, billing_alert_config.py, visual_feedback_settings.py,
marketing.py, support.py, support_customer_map_service.py,
postmark_inbound.py, postmark_delivery.py, status_webhook.py):
Lighter touch; these are mostly simple INSERT/SELECT patterns.
RM-8 test fixtures: conftest.py updated to detect DATABASE_URL. If
pointing to Postgres, provision pytest-postgresql engine + run Alembic baseline.
If absent, keep SQLite fixture path. CI job updated to install Postgres and set
DATABASE_URL=postgresql://localhost/raptor_test.
Rollback path
Each PR can be reverted independently. The connection module negotiates dialect — pre-cutover (DATABASE_URL still pointing to SQLite), any callsite revert is a safe no-op. After staging cutover (Phase 3), rollback requires also reverting DATABASE_URL.
Smoke-test checklist
- [ ] All existing
pytest backend_v2/tests pass on SQLite path - [ ]
pytest backend_v2/passes on Postgres path (CI withDATABASE_URLset) - [ ]
webauthn_service.pyintegration test: register → login → logout cycle - [ ] Auth route smoke:
POST /api/auth/register,POST /api/auth/login - [ ] Session validation smoke: cookie issued, validated, revoked
- [ ]
HistoricalBarRepositoryinsert + query test - [ ] Billing route smoke: vendor_billing_snapshots insert reads back correctly
Phase 3 — Staging Cutover (~1 dev-day)
Card: RM-9 (staging cutover SOP)
Target: 2026-05-17 UTC
Who runs it: operator (Kristerpher) following RM-9 runbook
Staging steps (documented in RM-9)
- Confirm all Phase 2 PRs merged and CI green.
- Confirm
alembic upgrade headwas run againstpostgresql-adjacent-27271(done in Phase 1 smoke test — verify tables exist). - Run
heroku pg:psql -a raxx-api-staging -c "SELECT count(*) FROM users;"— confirm 0 rows (pre-launch; no data to migrate). heroku config:set DATABASE_URL="$(heroku config:get RAPTOR_APP_DATABASE_URL -a raxx-api-staging)" -a raxx-api-staging >/dev/null 2>&1— wait, this is the wrong URL. The ownerDATABASE_URLis set by Heroku when the addon was provisioned; it is already pointing to Postgres. Verify:heroku config:get DATABASE_URL -a raxx-api-staging. If it already returnspostgresql://, the cutover is already done at the env level; only the Raptor code needed to support it (Phase 2).heroku restart -a raxx-api-stagingto pick up config.- Run smoke suite against staging: auth flow, historical data fetch, billing query.
- Soak 72 h.
Prod upgrade steps (documented in RM-10)
heroku addons:upgrade postgresql-<addon-name> standard-0 -a raxx-api-prod— upgrades Essential-0 to Standard-0. Heroku shows a maintenance window warning; pre-launch with no customers, this is acceptable.- Wait for Heroku addon state =
created. heroku pg:credentials:create DATABASE --name raptor_app -a raxx-api-prodheroku pg:credentials:url DATABASE --name raptor_app -a raxx-api-prod— copy URL; set asRAPTOR_APP_DATABASE_URL:heroku config:set RAPTOR_APP_DATABASE_URL="<URL>" -a raxx-api-prod >/dev/null 2>&1heroku run alembic upgrade head -a raxx-api-prod— creates fresh schema on prod Postgres.heroku restart -a raxx-api-prod- Run smoke suite against prod.
Rollback path
Pre-launch (zero customer rows): revert DATABASE_URL in Heroku config to
sqlite+pysqlite:///./raptor.db. The Heroku filesystem is ephemeral — this
means the SQLite file is lost on dyno restart — but pre-launch that is
acceptable. Post-launch, this rollback path closes (Postgres becomes the only
option once real rows exist).
Smoke-test checklist
- [ ]
heroku config:get DATABASE_URL -a raxx-api-stagingreturns postgresql:// URL - [ ]
heroku pg:psql -a raxx-api-staging -c "\dt"shows all expected tables - [ ] Staging health endpoint returns
{"status": "ok"} - [ ] Auth flow: register new passkey → login → revoke session
- [ ] Historical data: fetch bars for a symbol (Alpaca paper API)
- [ ] Billing: vendor_billing_snapshots readable
- [ ] No SQLite errors in Heroku logs (
heroku logs -a raxx-api-staging --tail)
Phase 4 — SC-A1 Reactivation (~1 dev-day)
Card: RM-11
Target: 2026-05-18 UTC
Depends on: Phase 3 complete, staging soak passed
What lands
This reactivates PR #1501 (SC-A1) which was paused at the staging cutover:
- Run GRANT/REVOKE block on staging Postgres:
-- Run via: heroku pg:psql -a raxx-api-staging
GRANT CONNECT ON DATABASE <dbname> TO raptor_app;
GRANT USAGE ON SCHEMA public TO raptor_app;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO raptor_app;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO raptor_app;
REVOKE ALL ON TABLE alembic_version FROM raptor_app;
-- schema_migrations does not exist on Postgres; skip
- Verify:
heroku pg:psql -a raxx-api-staging→SET ROLE raptor_app; SELECT 1 FROM users LIMIT 1; - Set flag:
heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=1 -a raxx-api-staging >/dev/null 2>&1 - Soak 24 h on staging.
- Repeat GRANT/REVOKE + flag for prod.
Rollback path
heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=0 -a raxx-api-staging >/dev/null 2>&1
The restricted role still exists but the app falls back to the owner credential.
Smoke-test checklist
- [ ]
FLAG_RAPTOR_APP_ROLE_SEPARATION=1on staging; no errors in logs - [ ] Auth flow still works (raptor_app can SELECT/INSERT users, webauthn_credentials)
- [ ]
raptor_appcannot ALTER alembic_version (verify:SET ROLE raptor_app; DROP TABLE alembic_version;→ permission denied) - [ ] Repeat all Phase 3 smoke checks
Phase 5 — Downstream Unblock
Target: 2026-05-19 UTC
No new migration cards — these are existing cards that were blocked on Raptor Postgres:
| Card | Status | Notes |
|---|---|---|
| SC-A2 (audit table writer) | Blocked on Phase 3 | Can begin porting once Postgres live |
| RBAC-V2 role assignment | Blocked on Phase 3 | Role grants need Postgres GRANT DDL |
| Queue Phase 1 | Blocked on Phase 3 | Co-located in Raptor; needs Postgres |
| SC-11 Timescale hypertable | Blocked on Phase 3 | Can't run on SQLite; sequence after Phase 3 |
| PR #1465 audit-v2 assumptions | Valid post-Phase 3 | ADR-0058/0059 now accurately describe Raptor |
Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
audit_log schema conflict (004a vs 004b) |
High | Low | Baseline migration must use column union; dev must inspect live SQLite schema first |
webauthn_service.py cursor() rewrite breaks auth |
Medium | Critical | Explicit integration test (register → login cycle) required in RM-4 |
| Heroku Essential-0 prod upgrade has maintenance window | Low | Low | Pre-launch; no customers; operator schedules during off-hours |
Phase 2 callsites miss one ? placeholder |
Medium | Medium | CI Postgres test run in RM-8 catches this before merge |
| 13-day timeline slips if Phase 2 is serialized | High | High | RM-4, RM-5, RM-6, RM-7 must be parallelized across devs |
| Timescale SC-11 jumps ahead of migration | Low | High | SC-11 is explicitly blocked until Phase 3; add dependency in GitHub |