Raxx · internal docs

internal · gated ↑ index

Queue — Migration Plan

Status: Design v1 Owner: software-architect Date: 2026-05-09 UTC Milestone: #6 — raxx.app v1 — first non-operator user (due 2026-05-23 UTC)


Phase 1 — Queue Shell + API Surface (v1, ~5 dev-days)

Goal: Queue codebase exists with all endpoints implemented, backed by the existing Raptor Postgres DB via queue_ namespaced tables. No Raptor callers switched yet.

Scope: - Create queue/ directory structure (Flask app factory, blueprints, services, middleware) - Write Queue DB migrations: queue_customers, queue_webauthn_credentials, queue_sessions, queue_email_verifications, queue_backup_codes, queue_webauthn_challenges, queue_customer_roles - Port/refactor existing implementations from open PRs into Queue's module structure (see in-flight PR disposition doc) - Implement all endpoints in api-contract.md - Implement service-to-service auth middleware (QUEUE_SERVICE_TOKEN_*) - Implement JWT mint + RS256 signing (QUEUE_JWT_SIGNING_KEY from SSM) - Mount Queue blueprint in Raptor's app factory at /api/v1/ behind FLAG_QUEUE_V1 - All endpoints dark (FLAG_QUEUE_V1=off on staging and prod)

Migrations (additive only; no drops): - queue/db/migrations/001_queue_core_tables.sql — creates all queue_* tables - queue/db/migrations/002_queue_customer_roles.sqlqueue_customer_roles + seed antlers-user default - These run as part of the normal Raptor migration job (same DB, same migration runner)

Rollback: Migrations are additive. Rolling back means dropping queue_* tables (no data yet — pre-cutover) and disabling the flag. Zero risk to existing Raptor data.

Dependent PRs: - #1502 (audit schema): port customer_audit_events migration to run under Queue's migration runner - #1503 (WebAuthn login): port to queue/api/routes/auth.py, adapt to queue_* tables - #1505 (session v2): port to queue/api/services/session_service.py - #1506 (audit writer): port audit_writer_service.py to Queue, update to use QUEUE_SERVICE_TOKEN_* auth - #1507 (backup codes): port to queue/api/routes/auth.py - #1508 (email verification): port to queue/api/routes/auth.py

Estimated dev-days: 5


Phase 2 — Cutover: Raptor + Console Point at Queue (v1, ~3 dev-days)

Goal: All Raptor and Antlers auth callsites use Queue endpoints. Raptor's legacy auth blueprints disabled. Dual-mode middleware validates parity during transition.

Scope: - Deploy dual-mode middleware in Raptor: on every auth request, call both old Raptor auth path and new Queue path, log disagreements, but use old path's result. Provides a parity soak period. - Once parity confirmed (48h soak on staging), flip FLAG_QUEUE_V1=on to make Queue the authoritative path. - Disable Raptor legacy auth blueprints (FLAG_RAPTOR_AUTH_LEGACY=off) — all /api/auth/* in Raptor return 404. - Console reads RBAC via Queue's /api/v1/rbac/* endpoints instead of direct DB queries. - Update Antlers: all auth API calls point to Queue's /api/v1/auth/* paths. - Verify: Raptor's validate_session() now verifies Queue-issued JWTs offline (RS256 public key in env).

Data migration (in-place rename, not copy): - At cutover, run queue/db/migrations/003_backfill_existing_customers.sql: - INSERT INTO queue_customers SELECT ... FROM customers WHERE ... (copy existing rows) - INSERT INTO queue_webauthn_credentials SELECT ... FROM webauthn_credentials WHERE ... - INSERT INTO queue_sessions SELECT ... FROM customer_sessions WHERE revoked_at IS NULL - Mark old rows as migrated via migrated_to_queue=true column (added in migration 003) - Dual-read period: Queue reads from queue_* tables; Raptor legacy reads from old tables. Both are live briefly. - After 24h soak: Raptor legacy reads disabled; old tables remain (not dropped) for 30 days.

Rollback: Flip FLAG_QUEUE_V1=off and FLAG_RAPTOR_AUTH_LEGACY=on. Old tables still contain data. Re-migration is not required.

Estimated dev-days: 3

v1 total dev-days (Phase 1 + 2): ~8


Phase 3 — DB Extraction (post-v1, ~5 dev-days)

Goal: queue_* tables move from Raptor's Postgres to Queue's own Postgres instance. Queue becomes DB-independent.

Scope: - Provision raxx-queue-db Heroku Postgres add-on - DATABASE_URL_QUEUE env var on Queue's dyno - Run Queue schema migrations on new DB (from scratch, queue-owned) - Bulk migrate data: queue_customers, queue_sessions, queue_webauthn_credentials, etc. using a fenced migration script with row-count validation - Blue/green cutover: Queue reads from old DB with writes to both; flip to new DB read+write; verify; remove old writes - customer_audit_events moves in this phase (see OQ-1)

Rollback: Flip Queue's DATABASE_URL env var back to Raptor's DB. Old tables still present.

Estimated dev-days: 5


Phase 4 — Extract to Own Heroku App (post-v1, ~3 dev-days)

Goal: Queue runs as a standalone Heroku app (raxx-queue) rather than a blueprint inside Raptor's dyno.

Scope: - New Heroku app raxx-queue with Procfile and requirements.txt - queue.raxx.app DNS entry and Heroku domain config - Service-to-service auth: Raptor, Console, Velvet, Reasonator all call https://queue.raxx.app/api/v1/ - Remove Queue blueprint from Raptor's app factory - Update QUEUE_BASE_URL env on all callers

Rollback: Re-mount Queue blueprint in Raptor, revert QUEUE_BASE_URL.

Estimated dev-days: 3


Phase 5 — SAML + Multi-Tenancy (future, unscoped)

Goal: Queue becomes the SAML SP for enterprise customers. Multi-tenant customer records.

Scope (not yet designed): - SAML assertion parsing + group-to-role mapping - Tenant isolation (tenant_id FK on all Queue tables) - Multi-tenant RBAC (roles scoped per tenant)

Estimated dev-days: TBD — requires its own design doc.


Summary Table

Phase Label Key deliverable Dev-days Timeline
1 Queue shell All Queue endpoints dark; schema migrated 5 v1 (2026-05-09 to ~2026-05-15 UTC)
2 Cutover Raptor + Antlers fully on Queue 3 v1 (2026-05-15 to 2026-05-23 UTC)
3 DB extraction Queue owns its own Postgres 5 post-v1
4 App extraction Queue is standalone Heroku app 3 post-v1
5 SAML Enterprise IdP integration TBD future