Raxx · internal docs

internal · gated ↑ index

Raptor Postgres Migration — Phased Rollout Plan

Epic: #1556
Design doc: docs/architecture/raptor-postgres-migration/design.md
Target: v1 launch 2026-05-23 UTC
Total budget: 7 dev-days across Phases 1–4


Timeline Overview

2026-05-10 UTC  Phase 0 — this design PR
2026-05-12 UTC  Phase 1 — alembic + engine (RM-1, RM-2, RM-3)
2026-05-14 UTC  Phase 2 — callsite ports (RM-4…RM-7) + fixtures (RM-8)
2026-05-17 UTC  Phase 3 — staging cutover (RM-9), then prod cutover (RM-10)
2026-05-18 UTC  Phase 4 — SC-A1 reactivation (RM-11)
2026-05-19 UTC  Buffer — soak + final audit-v2/RBAC-v2/Queue unblock
2026-05-23 UTC  v1 launch gate

Phase 0 — Design Pass (Today, 2026-05-10 UTC)

Owner: architect
Dev-days: 0 (design only)
Deliverables:

Rollback: N/A (no code changes)

Smoke-test checklist:


Phase 1 — Alembic + SQLAlchemy + Connection Layer (~2 dev-days)

Cards: RM-1 (alembic init + engine module), RM-2 (psycopg2-binary dep), RM-3 (schema baseline migration)
Target merge: 2026-05-12 UTC

What lands

Dependent cards

Rollback path

DATABASE_URL is not yet set on staging. The old SQLite path (database.py fallback) still works. If the new engine module breaks the SQLite path: revert the database.py PR. No data migration has occurred.

Smoke-test checklist


Phase 2 — Callsite Migration (~2–3 dev-days)

Cards: RM-4 (auth routes), RM-5 (trade routes), RM-6 (market data routes), RM-7 (settings + admin), RM-8 (test fixtures)
Target merge: 2026-05-14–15 UTC
Parallelizable: RM-4, RM-5, RM-6, RM-7 can be worked in parallel by different devs; RM-8 can be drafted alongside

What lands per card

Each card follows the same pattern: 1. Replace SQLite ?-style params with SQLAlchemy text("... :p") + dict params 2. Replace conn.cursor() calls with conn.execute(text(...)) 3. Replace sqlite3.Row dict-access with row._mapping[col] or row.col 4. Remove any connection.commit() calls within service functions — SQLAlchemy connection context manager handles commit/rollback 5. Replace INTEGER boolean comparisons (== 1) with Python bool checks 6. Replace CURRENT_TIMESTAMP string defaults with datetime.now(timezone.utc)

RM-4 auth routes (auth.py, auth_email.py, webauthn_service.py, backup_codes_service.py, bootstrap_token_service.py, customer_session_service.py): Highest-touch. webauthn_service.py uses cursor() extensively — must be rewritten.

RM-5 trade routes (trading.py, paper_order_service.py, trading_runtime.py): Medium-touch.

RM-6 market data routes (historical_data.py, symbols.py, market_data.py, backtest.py, repositories.py): HistoricalBarRepository uses direct SQL; rewrite to text() + SQLAlchemy. repositories.py is the main target.

RM-7 settings + admin (settings.py, admin_customers.py, onboarding.py, billing.py, billing_alert_config.py, visual_feedback_settings.py, marketing.py, support.py, support_customer_map_service.py, postmark_inbound.py, postmark_delivery.py, status_webhook.py): Lighter touch; these are mostly simple INSERT/SELECT patterns.

RM-8 test fixtures: conftest.py updated to detect DATABASE_URL. If pointing to Postgres, provision pytest-postgresql engine + run Alembic baseline. If absent, keep SQLite fixture path. CI job updated to install Postgres and set DATABASE_URL=postgresql://localhost/raptor_test.

Rollback path

Each PR can be reverted independently. The connection module negotiates dialect — pre-cutover (DATABASE_URL still pointing to SQLite), any callsite revert is a safe no-op. After staging cutover (Phase 3), rollback requires also reverting DATABASE_URL.

Smoke-test checklist


Phase 3 — Staging Cutover (~1 dev-day)

Card: RM-9 (staging cutover SOP)
Target: 2026-05-17 UTC
Who runs it: operator (Kristerpher) following RM-9 runbook

Staging steps (documented in RM-9)

  1. Confirm all Phase 2 PRs merged and CI green.
  2. Confirm alembic upgrade head was run against postgresql-adjacent-27271 (done in Phase 1 smoke test — verify tables exist).
  3. Run heroku pg:psql -a raxx-api-staging -c "SELECT count(*) FROM users;" — confirm 0 rows (pre-launch; no data to migrate).
  4. heroku config:set DATABASE_URL="$(heroku config:get RAPTOR_APP_DATABASE_URL -a raxx-api-staging)" -a raxx-api-staging >/dev/null 2>&1 — wait, this is the wrong URL. The owner DATABASE_URL is set by Heroku when the addon was provisioned; it is already pointing to Postgres. Verify: heroku config:get DATABASE_URL -a raxx-api-staging. If it already returns postgresql://, the cutover is already done at the env level; only the Raptor code needed to support it (Phase 2).
  5. heroku restart -a raxx-api-staging to pick up config.
  6. Run smoke suite against staging: auth flow, historical data fetch, billing query.
  7. Soak 72 h.

Prod upgrade steps (documented in RM-10)

  1. heroku addons:upgrade postgresql-<addon-name> standard-0 -a raxx-api-prod — upgrades Essential-0 to Standard-0. Heroku shows a maintenance window warning; pre-launch with no customers, this is acceptable.
  2. Wait for Heroku addon state = created.
  3. heroku pg:credentials:create DATABASE --name raptor_app -a raxx-api-prod
  4. heroku pg:credentials:url DATABASE --name raptor_app -a raxx-api-prod — copy URL; set as RAPTOR_APP_DATABASE_URL: heroku config:set RAPTOR_APP_DATABASE_URL="<URL>" -a raxx-api-prod >/dev/null 2>&1
  5. heroku run alembic upgrade head -a raxx-api-prod — creates fresh schema on prod Postgres.
  6. heroku restart -a raxx-api-prod
  7. Run smoke suite against prod.

Rollback path

Pre-launch (zero customer rows): revert DATABASE_URL in Heroku config to sqlite+pysqlite:///./raptor.db. The Heroku filesystem is ephemeral — this means the SQLite file is lost on dyno restart — but pre-launch that is acceptable. Post-launch, this rollback path closes (Postgres becomes the only option once real rows exist).

Smoke-test checklist


Phase 4 — SC-A1 Reactivation (~1 dev-day)

Card: RM-11
Target: 2026-05-18 UTC
Depends on: Phase 3 complete, staging soak passed

What lands

This reactivates PR #1501 (SC-A1) which was paused at the staging cutover:

  1. Run GRANT/REVOKE block on staging Postgres:
-- Run via: heroku pg:psql -a raxx-api-staging
GRANT CONNECT ON DATABASE <dbname> TO raptor_app;
GRANT USAGE ON SCHEMA public TO raptor_app;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO raptor_app;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO raptor_app;
REVOKE ALL ON TABLE alembic_version FROM raptor_app;
-- schema_migrations does not exist on Postgres; skip
  1. Verify: heroku pg:psql -a raxx-api-stagingSET ROLE raptor_app; SELECT 1 FROM users LIMIT 1;
  2. Set flag: heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=1 -a raxx-api-staging >/dev/null 2>&1
  3. Soak 24 h on staging.
  4. Repeat GRANT/REVOKE + flag for prod.

Rollback path

heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=0 -a raxx-api-staging >/dev/null 2>&1 The restricted role still exists but the app falls back to the owner credential.

Smoke-test checklist


Phase 5 — Downstream Unblock

Target: 2026-05-19 UTC
No new migration cards — these are existing cards that were blocked on Raptor Postgres:

Card Status Notes
SC-A2 (audit table writer) Blocked on Phase 3 Can begin porting once Postgres live
RBAC-V2 role assignment Blocked on Phase 3 Role grants need Postgres GRANT DDL
Queue Phase 1 Blocked on Phase 3 Co-located in Raptor; needs Postgres
SC-11 Timescale hypertable Blocked on Phase 3 Can't run on SQLite; sequence after Phase 3
PR #1465 audit-v2 assumptions Valid post-Phase 3 ADR-0058/0059 now accurately describe Raptor

Risk Register

Risk Likelihood Impact Mitigation
audit_log schema conflict (004a vs 004b) High Low Baseline migration must use column union; dev must inspect live SQLite schema first
webauthn_service.py cursor() rewrite breaks auth Medium Critical Explicit integration test (register → login cycle) required in RM-4
Heroku Essential-0 prod upgrade has maintenance window Low Low Pre-launch; no customers; operator schedules during off-hours
Phase 2 callsites miss one ? placeholder Medium Medium CI Postgres test run in RM-8 catches this before merge
13-day timeline slips if Phase 2 is serialized High High RM-4, RM-5, RM-6, RM-7 must be parallelized across devs
Timescale SC-11 jumps ahead of migration Low High SC-11 is explicitly blocked until Phase 3; add dependency in GitHub