Raptor SQLite → Postgres Migration — Design Doc

Epic: #1556
Status: Design locked 2026-05-10 UTC
Milestone: #6 — raxx.app v1 (due 2026-05-23 UTC)
Target branch: design/raptor-postgres-migration

1. Context

Raptor (backend_v2) has used sqlite3.connect since inception. The audit-v2, RBAC-v2, and Queue designs (ADRs 0058–0068, PRs #1465, #1501, #1510) all assumed a Heroku Postgres backend that did not exist for Raptor. SC-A1 (PR #1501) exposed this: the raptor_app credential role, GRANT/REVOKE append-only enforcement, and RLS policies are all inert against SQLite.

The operator locked Path B on 2026-05-10 UTC: migrate Raptor to Postgres as part of v1, before launch. Staging Postgres (Standard-0, addon postgresql-adjacent-27271, credential raptor_app) is already provisioned. Prod is still on Essential-0 and needs an upgrade.

This document covers the schema port strategy, connection layer, local-dev path, test fixture selection, alembic adoption, and migration sequencing.

2. Invariants

The following non-negotiable constraints govern every decision in this design:

No stored credentials. Postgres connection strings live in Heroku config vars or SSM — never in source files.
No stored passwords even in the migration itself. Role creation uses heroku pg:credentials:create (Heroku's RDS-aware path). Never CREATE ROLE … WITH LOGIN PASSWORD '…' via psql — RDS rejects it.
Audit trail for every state change. The migration itself is audited via schema_migrations (SQLite) transitioning to alembic_version (Postgres). Both tables are revoked from raptor_app.
Paper-first gating is unaffected. The migration changes the wire; it does not touch paper/live decision logic.
GDPR by default. PII columns (email, session hashes, ip prefixes) are carried forward with the same retention semantics. No new PII columns are introduced by the migration.
Standard-0 tier required. Essential-tier Postgres does not support pg:credentials:create. The SC-A1 raptor_app role-separation design is inoperable on Essential-tier. Prod must be upgraded before cutover.
Test suites must continue to pass against SQLite in CI during the transition, and against Postgres after Phase 2 lands. The abstraction must not break the existing test harness.

3. Schema Port Strategy

3.1 Type Map

SQLite type	Postgres type	Notes
`INTEGER PRIMARY KEY AUTOINCREMENT`	`BIGSERIAL PRIMARY KEY`	All `id` PKs
`TEXT` (general)	`TEXT`	Postgres TEXT is unbounded; no performance difference
`TEXT` (ISO-8601 datetime)	`TIMESTAMPTZ`	See §3.2
`TEXT` (JSON blob)	`JSONB`	See §3.3
`REAL` (price/amount)	`DOUBLE PRECISION`	financial display precision acceptable per ADR-0058
`NUMERIC(10,4)`	`NUMERIC(10,4)`	Postgres supports natively; no change
`INTEGER` (boolean 0/1)	`BOOLEAN`	Alembic migration uses `sa.Boolean()`
`BLOB` (public_key in webauthn_credentials)	`BYTEA`	Binary safe
`INTEGER` (general counts/IDs)	`BIGINT`	Future-proof; no cost on Postgres

3.2 Datetime Columns

Every column currently stored as TEXT ISO-8601 becomes TIMESTAMPTZ in Postgres. Alembic migration sets server_default=sa.func.now() to replace SQLite CURRENT_TIMESTAMP. Application code that inserts explicit UTC strings must be updated to pass Python datetime objects (SQLAlchemy handles the conversion).

Columns affected across all tables:

created_at, updated_at, applied_at — present in nearly every table
at_utc (customer_audit_events)
ts_emitted, ts_received (trace_events)
ts_started, ts_ended (trace_workflows)
fetched_at, period_start, period_end — vendor_billing_snapshots (period_start/end stay DATE not TIMESTAMPTZ; they are date-only values)
sent_at (vendor_billing_alert_log)
used_at, invalidated_at, generated_at (backup_codes)
expires_at, consumed_at, last_used_at (login_challenges, webauthn_credentials)
revoked_at, last_seen_at (customer_sessions)
minted_at, revoked_at (bootstrap_tokens_*)

3.3 JSONB Columns

Columns that store serialized JSON in SQLite become JSONB in Postgres.

Table	Column	Rationale
`customer_audit_events`	`target_resource`, `before_state`, `after_state`	Diff state, query-able
`trace_events`	`context_json`, `data_mask_json`	Sanitized context blobs
`onboarding_state`	`completed_steps`, `skipped_steps`, `step_data`	Per-step JSON arrays
`vendor_billing_snapshots`	`raw_payload_json`	Vendor response escape hatch
`postmark_delivery_events`	`payload_json`	Full webhook payload
`*_history`	`row_data`	Shadow row snapshots
`dsr_log`	`tables_affected`	JSON array of table names

Note: columns in trace_events currently annotated -- TIMESCALE: use TIMESTAMPTZ are covered by §3.2. The TIMESCALE hypertable promotion (SC-11) is out of scope for this migration.

3.4 Boolean Columns

SQLite stores booleans as INTEGER (0/1). Postgres has native BOOLEAN. The Alembic migration maps these directly. Application-layer code that compares against 1 or 0 must be updated to use True / False after the port.

Affected: tradable (symbols), backup_eligible, backup_state (webauthn_credentials), enabled (vendor_billing_alert_config), success (support_audit_log).

3.5 Partial Indexes

SQLite supports WHERE clauses on indexes. Postgres also supports partial indexes but uses different planner statistics. All partial indexes from the SQLite migrations must be recreated verbatim in the Alembic migration. No behavior changes — the partial index DDL translates directly.

3.6 Table Inventory

Every table created in migrations 001–019, plus schema_migrations:

Migration	Tables
001	`historical_bars`
002	`symbols`
003	`users`, `webauthn_credentials`
004a	`bootstrap_tokens_used`, `audit_log`
004b	`customer_sessions`, `audit_log`
005a	`postmark_inbound_dedup`
005b	`support_customer_map`
006	`support_audit_log`
007	`support_pending_submissions`
008a	`onboarding_state`
008b	`postmark_delivery_events`
008c	`trace_events`, `trace_workflows`
009	`bootstrap_tokens_pending`, `bootstrap_tokens_revoked`
010	ALTER `users` (add columns)
011	`vendor_billing_snapshots`, `vendor_billing_fixed`
012	`vendor_billing_alert_config`
013	`vendor_billing_alert_log`
014	ALTER `customer_sessions`, `login_challenges`
015	Postgres role setup (raptor_app) — operator runbook only
016	`customer_audit_events`, `freescout_ticket_cache`, `*_history` × 6, `audit_archival_runs`, `audit_integrity_log`, `dsr_log`
017	`backup_codes`
018	`email_verifications`, ALTER `users` (add `email_verified_at_utc`)
019	`quebec_waitlist`

Total: ~35 distinct tables/views plus schema_migrations (retired after alembic adoption).

Note: migrations 004a and 004b both try to create audit_log. The Alembic baseline must deduplicate — only one audit_log table exists at runtime (the 004b version is the one applied since it comes later and uses IF NOT EXISTS).

4. Connection Layer

4.1 Driver: psycopg2-binary

Choice: psycopg2-binary (not psycopg3 / psycopg).

Rationale:

SQLAlchemy's primary Postgres dialect is psycopg2. The existing Console (raxx-console-*) already ships psycopg2-binary — consistent dep footprint.
psycopg2-binary bundles libpq, so no system-level libpq install is required on Heroku dynos or CI.
psycopg3 is stable but SQLAlchemy 2.x asyncio-first patterns are not needed here (Raptor is a WSGI Flask app). Migrating to psycopg3 later is a one-line URL swap (postgresql+psycopg://).
psycopg2-binary is MIT-compatible. No licensing conflict with the TradeMasterAPI MIT license.

See ADR-0069 for the full driver comparison.

4.2 SQLAlchemy Version

Choice: SQLAlchemy 2.x (2.0.x series).

The connection layer is new — no legacy callsites to break. SQLAlchemy 2.x create_engine() + Connection.execute(text(...)) is the target API. The existing callsite pattern (with db.connect() as conn: conn.execute(...)) maps directly to SQLAlchemy's connection-as-context-manager idiom.

4.3 New `database.py` Contract

The new module (backend_v2/db/database.py) replaces the current sqlite3-based Database class with a SQLAlchemy-backed engine wrapper.

Public surface (unchanged from callers' perspective):

class Database:
    def connect(self) -> ContextManager[sqlalchemy.engine.Connection]:
        ...
    def apply_migrations(self) -> None:
        # For SQLite local dev: runs .sql files via text() (backward compat)
        # For Postgres: no-op (alembic runs in release dyno)
        ...

The connect() method returns a SQLAlchemy Connection context manager. Callers already use with db.connect() as conn: conn.execute(...) — no callsite signature changes are needed, but the parameter style changes from SQLite ? placeholders to SQLAlchemy text("... :param") named params.

4.4 Local Dev Path

DATABASE_URL drives everything:

Not set (default local dev): DATABASE_URL is absent → engine uses sqlite+pysqlite:///./dev.db (SQLite file in project root). No Postgres required for local dev.
Set to Postgres URL (CI, staging, prod): SQLAlchemy routes to psycopg2.
Set to SQLite URL explicitly: sqlite+pysqlite:///./test.db — used by test fixtures.

The init_persistence() function in backend_v2/db/__init__.py reads app.config["DATABASE_URL"] (resolved from env) and constructs the engine. SQLite pragma wiring (WAL, cache_size, etc.) applies only when the dialect is sqlite. This is detected via engine.dialect.name == "sqlite".

No local-dev developer needs Postgres. The test suite defaults to SQLite unless DATABASE_URL points elsewhere.

4.5 Alembic Adoption

Raptor gets its own alembic env, parallel to Console's:

backend_v2/
  alembic.ini
  alembic/
    env.py
    script.py.mako
    versions/
      0001_raptor_baseline.py   ← full schema from migrations 001-019
      0002_...                  ← future incremental

The alembic/env.py reads DATABASE_URL from the environment. On Heroku, Alembic runs in the release dyno using the owner credential (DATABASE_URL), never RAPTOR_APP_DATABASE_URL (the restricted role).

schema_migrations (the old hand-rolled table) is not migrated to Postgres. It exists in SQLite only. Alembic's alembic_version table is the canonical migration-state record on Postgres.

The baseline migration (0001_raptor_baseline.py) creates all 35 tables in one shot from the Postgres-typed schema. On a fresh Postgres DB, only this migration runs. On the already-provisioned staging DB, Alembic stamps the baseline version (no DDL run because tables already exist — operator applies alembic stamp as part of the cutover SOP).

Migration naming: NNNN_<slug>.py (4-digit zero-padded), independent of the old NNN_<slug>.sql files which remain in db/migrations/ for SQLite local-dev backward compat during the transition window.

5. Test Fixtures

Choice: pytest-postgresql (ephemeral Postgres, no container daemon).

Rationale:

pytest-postgresql spins up a temporary PostgreSQL process per test session using the locally installed pg_ctl. No Docker daemon required in CI.
GitHub Actions ubuntu-latest images ship with PostgreSQL; pg_ctl is available after sudo apt-get install postgresql (one CI step).
testcontainers requires Docker-in-Docker which adds complexity and pull-rate limits in CI. Heroku PR-app approach requires a Heroku add-on per PR (cost + provisioning latency — unacceptable for a 13-day sprint).
The existing SQLite-based tests continue to work: conftest.py detects DATABASE_URL — if absent, uses SQLite fixtures; if pointing to Postgres, uses pytest-postgresql fixtures. Same test bodies run in both modes.
Card #1552 (Postgres-aware migration CI) is the sub-card that wires this up.

See ADR-0070 for the full comparison.

6. Sequence Diagram — Cutover Flow

sequenceDiagram
    participant Dev as feature-dev
    participant CI as CI (GitHub Actions)
    participant Heroku as Heroku
    participant PG as Postgres (staging)

    Note over Dev,PG: Phase 1 — Alembic + SQLAlchemy lands
    Dev->>CI: PR: alembic init + engine module
    CI->>CI: run tests (SQLite path)
    CI->>Heroku: merge → release dyno
    Heroku->>PG: alembic upgrade head (baseline schema)
    Note over Heroku,PG: tables created on staging Postgres

    Note over Dev,PG: Phase 2 — Callsite ports land (one PR per service)
    Dev->>CI: PR: port auth routes
    CI->>CI: run tests (SQLite + Postgres)
    CI->>Heroku: merge → staging

    Note over Dev,PG: Phase 3 — Staging cutover
    Dev->>Heroku: heroku config:set DATABASE_URL=postgresql://... -a raxx-api-staging
    Heroku->>Heroku: dyno restart (picks up DATABASE_URL)
    Heroku->>PG: alembic upgrade head (no-op if baseline already ran)
    Dev->>Dev: smoke tests pass

    Note over Dev,PG: Phase 3b — Prod cutover
    Dev->>Heroku: upgrade raxx-api-prod to Standard-0
    Heroku->>Heroku: pg:credentials:create raptor_app
    Dev->>Heroku: heroku config:set DATABASE_URL=postgresql://... -a raxx-api-prod >/dev/null 2>&1
    Heroku->>PG: alembic upgrade head (fresh Postgres — all migrations run)
    Dev->>Dev: smoke tests pass

    Note over Dev,PG: Phase 4 — SC-A1 reactivation
    Dev->>Heroku: GRANT/REVOKE on all tables TO raptor_app
    Dev->>Heroku: heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=1 >/dev/null 2>&1

7. Migrations

Phase 0 (this PR): Design only. No schema changes.

Phase 1: Alembic baseline + engine module

One Alembic migration: 0001_raptor_baseline.py. Creates all tables in Postgres-typed DDL (see §3 type map). Idempotent — running against an empty DB creates everything; running against the staging DB where tables already exist requires alembic stamp head (operator SOP step, documented in migration-plan.md).

Rollback: alembic downgrade base drops all tables. Safe pre-launch because SQLite data is the live path until DATABASE_URL is pointed to Postgres.

Phase 2: Callsite ports

No schema changes. Pure Python — replace conn.execute("SELECT ...", (?, ?)) with conn.execute(text("SELECT ... :param"), {"param": val}). Each PR is independently rollbackable by reverting to SQLite URL.

Phase 3: Cutover

No schema changes. Environment variable flip (DATABASE_URL → Postgres). Rollback: revert DATABASE_URL to SQLite path (pre-launch, no data loss risk).

Phase 4: raptor_app role + flag flip

Post-migration DDL: GRANT/REVOKE per SC-A1 runbook. Rollback: set FLAG_RAPTOR_APP_ROLE_SEPARATION=0. The restricted role still exists but is not used.

8. Rollout Plan

dark    → Phase 1 lands; SQLite still active locally and on staging
         (DATABASE_URL not set to Postgres yet)
flag    → Phase 2 callsite ports land; staging DATABASE_URL flipped to Postgres
         (staging soak: 72 h)
beta    → Phase 3 prod cutover; prod DATABASE_URL set
ga      → Phase 4 raptor_app role + audit-v2 / RBAC-v2 / Queue unblock

No feature flag gates the migration itself — the switch is the DATABASE_URL value. The callsite PRs are safe to merge before the URL is flipped because the new SQLAlchemy code handles both dialects.

9. Security Considerations

9.1 PII

The migration carries forward all existing PII (email addresses, ip_prefix fields, session hashes). No new PII columns are added. Retention windows are unchanged. GDPR deletion paths (ON DELETE CASCADE chains) remain intact — FK constraints now enforced at the Postgres layer rather than application layer.

9.2 Credentials

DATABASE_URL (owner credential) lives in Heroku config only. Never in files.
RAPTOR_APP_DATABASE_URL (restricted credential) lives in Heroku config only. Created via heroku pg:credentials:create (not CREATE ROLE … WITH PASSWORD).
Neither URL appears in source, CI logs, or alembic migration files.

9.3 Audit Trail

alembic_version replaces schema_migrations as the migration-state record on Postgres. Both are revoked from raptor_app — the restricted runtime role cannot modify migration state.

Every state change to money-affecting or permission-affecting tables remains audited via customer_audit_events (SC-A3, post-Phase 4).

9.4 Kill Switch

If the Postgres cutover causes regressions: revert DATABASE_URL to sqlite+pysqlite:///./raptor.db. Pre-launch, no customer data is lost (SQLite and Postgres are functionally equivalent at zero rows). The Heroku config rollback takes effect on the next dyno restart (~30 s).

9.5 Breach Notification

No new breach surface introduced. Postgres-at-rest encryption is included in Standard-0 (AES-256 EBS). Postgres TLS in transit is enforced by Heroku's sslmode=require default in DATABASE_URL.

9.6 Secrets Rotatable Without Redeploy

Postgres password rotation: heroku pg:credentials:rotate DATABASE -a raxx-api-* rotates the owner credential; heroku pg:credentials:rotate DATABASE --name raptor_app rotates the restricted credential. Both update the Heroku config var automatically. App picks up on next dyno restart.

10. Open Questions

audit_log table dedup: Migrations 004a and 004b both CREATE TABLE IF NOT EXISTS audit_log with slightly different schemas (004a adds ip_prefix; 004b does not). At runtime only one version exists (whichever ran second). The Alembic baseline must reconcile these into a single canonical schema. Feature-dev must inspect the live SQLite schema before writing the baseline migration. Recommend: use the union of all columns (both schemas merged) marked nullable where added later.
webauthn_service.py uses conn.cursor() directly: The service uses cursor = db_connection.cursor() and cursor.execute(). SQLAlchemy connections do not have a .cursor() method in the 2.x API. The callsite port (RM-4, auth routes) must rewrite these to conn.execute(text(...)). This is the highest-touch change in Phase 2 — flag for careful review.
Timescale hypertable promotion (SC-11): trace_events and trace_workflows have -- TIMESCALE: annotations throughout. SC-11 targets these tables. This migration does NOT enable Timescale. SC-11 should be sequenced after the Raptor Postgres migration lands (SC-11 cannot run on SQLite at all). Confirm SC-11 is sequenced post-Phase 3.