Raxx · internal docs

internal · gated ↑ index

Queue — Identity, Session, RBAC, and Customer Service

Status: Design v1 — sub-cards pending filing Owner: software-architect Date: 2026-05-09 UTC Milestone: #6 — raxx.app v1 — first non-operator user (due 2026-05-23 UTC) Related ADRs: ADR-0065, ADR-0066, ADR-0067, ADR-0068 Sibling docs: - docs/architecture/rbac-v2/design.md — RBAC V2 (now Queue's responsibility) - docs/architecture/customer-audit-unified/design.md — audit chain (Queue owns the writer for customer audit dimensions) - docs/architecture/auth.md — original auth design (superseded at the boundary by this doc) - docs/architecture/session-engine.md — session engine (now Queue's implementation) Refs: operator decision 2026-05-09 UTC — project_queue_identity_service.md


1. Context

Raxx today is a single-operator developer tool. Milestone #6 introduces the first non-operator customer account. That step requires a clear owner for identity, sessions, passkey credentials, RBAC, and customer records — data that is currently scattered across Raptor's SQLite (backend_v2/) and the Console's Postgres DB.

The operator chose Option C — a dedicated identity/customer/RBAC service called Queue on 2026-05-09 UTC. Queue is the single source of truth for:

Queue does not own: trades, positions, orders (Raptor), rotation jobs (Velvet), sentiment scoring (Reasonator), or Console operator UI/console audit.


2. Invariants

The following are non-negotiable and override any design choice below.

# Invariant
I-1 No stored credentials. Queue never stores passwords, plaintext recovery tokens, TOTP seeds, or any value that could replay a user secret. WebAuthn public keys and COSE keys are not credentials in this sense. Recovery codes are stored as one-way HMAC hashes.
I-2 Passkeys / WebAuthn only. No password path, no SMS OTP, no email OTP as auth factor. Queue enforces this at the schema level — no column exists for a password hash.
I-3 Email is the single contact channel, and only after verification. No phone, no SMS, no push.
I-4 GDPR by default. Customer PII (email, display name, IP prefix, geoblock metadata) has explicit retention periods, DSR erasure paths, portability export, and DPA-ready audit logging.
I-5 Audit trail for every state change that affects money, permissions, or data access. Every grant, revoke, session mint, session revoke, registration, erasure, and passkey change writes an audit row.
I-6 Paper-first gating is enforced by Queue. Queue issues session tokens that carry the paper_first_gate claim. Live-trading paths check this claim; Queue never issues a live-enabled token without the gate being satisfied.
I-7 Credentials into infra, not into code. All Queue secrets (DB URL, signing keys, KMS key ARN, service-to-service tokens) live in SSM (AWS workloads) or Infisical (vendor tokens), never in repo files.
I-8 Quebec geo-block at signup. Queue enforces the geo-block at the POST /api/v1/customers (registration) endpoint, rejecting jurisdiction=QC with a friendly 422. The block is a configurable env flag so it can be lifted when fr-CA launches.
I-9 Fail-closed on Queue outage. If Queue is unreachable, Raptor must reject all authenticated requests (return 503) rather than fail-open. There is no credential cache in Raptor that grants access when Queue is down.
I-10 All timestamps UTC.

3. Greenfield vs Strangler-Fig

Recommendation: Strangler-Fig for v1, with an explicit extraction roadmap

Decision: Queue ships in v1 as a Flask blueprint bundle (queue/) deployed into the same Heroku app as Raptor (backend_v2), backed by Raptor's existing Postgres DB via a queue_ namespace prefix on new tables. Existing Raptor auth tables (customer_sessions, webauthn_credentials, customers, etc.) are migrated-in-place with renamed columns to match Queue's schema contract. Queue exposes /api/v1/ identity endpoints. Raptor's own auth blueprints are deprecated (feature-flagged to return 404) once Queue endpoints are live.

Why not Greenfield: - A new Heroku app + new DB requires 5-7 dev-days for infrastructure alone, consuming the entire remaining v1 timeline. - Data migration from Raptor's SQLite/Postgres to a new DB across two apps in a 14-day window is high-risk. - PRs #1502, #1503, #1505, #1506, #1507, #1508 are all open and mostly done — their work is not lost; it becomes Queue's internal implementation.

Why Strangler-Fig works: - Queue is architecturally real: it has a defined API surface, service-to-service auth, and owns its schema namespace. - Data extraction to a standalone Heroku app is Phase 4 (post-v1), with a clean migration plan. - The contract layer is what matters for iOS, Antlers, and SAML — not the physical hosting.

See ADR-0065 for the full decision record.


4. Queue Codebase Location

queue/                       ← sibling to backend_v2/, console/
  app.py                     ← Flask application factory
  api/
    routes/
      auth.py                ← WebAuthn register/login, sessions
      customers.py           ← customer records + GDPR DSR
      rbac.py                ← grants, permission checks
      audit.py               ← customer audit event writer
    services/
      webauthn_service.py
      session_service.py
      rbac_service.py
      audit_writer_service.py
      email_service.py
    middleware/
      service_auth.py        ← validates inbound service tokens
      rate_limiter.py
  db/
    migrations/              ← queue_* table migrations
  tests/

Queue runs as a Flask app on the same Heroku dyno as Raptor in v1 (different port, same process group, or as a Blueprint mounted at /api/v1/ in Raptor's app factory — see ADR-0066).


5. Data Model

All Queue-owned tables are prefixed queue_ to make namespace ownership unambiguous during the co-location phase. Post-extraction they are renamed to their canonical names.

5.1 Core tables

-- Customers (replaces/renames existing 'customers' table)
CREATE TABLE queue_customers (
    id                  UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    email               TEXT        UNIQUE NOT NULL,
    email_verified_at   TIMESTAMPTZ NULL,
    display_name        TEXT        NULL,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
    deleted_at          TIMESTAMPTZ NULL,       -- soft-delete for DSR audit
    jurisdiction        TEXT        NULL,        -- 'US' | 'CA' | 'CA-QC' (blocked at signup)
    geo_block_reason    TEXT        NULL,        -- e.g. 'QC_PRE_FR_LAUNCH'
    paper_first_cycles  INTEGER     NOT NULL DEFAULT 0,
    paper_first_gate_met BOOLEAN    NOT NULL DEFAULT false,
    schema_version      INTEGER     NOT NULL DEFAULT 1
);

-- WebAuthn credentials (replaces existing webauthn_credentials)
CREATE TABLE queue_webauthn_credentials (
    id              TEXT        PRIMARY KEY,    -- credential_id, base64url
    customer_id     UUID        NOT NULL REFERENCES queue_customers(id) ON DELETE CASCADE,
    public_key      BYTEA       NOT NULL,       -- COSE key; not a secret
    sign_count      INTEGER     NOT NULL DEFAULT 0,
    transports      TEXT        NULL,           -- csv: 'usb,nfc,internal'
    aaguid          TEXT        NULL,
    device_label    TEXT        NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    last_used_at    TIMESTAMPTZ NULL,
    backup_eligible BOOLEAN     NOT NULL DEFAULT false,
    backup_state    BOOLEAN     NOT NULL DEFAULT false
);

-- Sessions (replaces/extends existing customer_sessions)
CREATE TABLE queue_sessions (
    id                  TEXT        PRIMARY KEY,   -- 256-bit random, stored hashed
    customer_id         UUID        NOT NULL REFERENCES queue_customers(id) ON DELETE CASCADE,
    credential_id       TEXT        REFERENCES queue_webauthn_credentials(id),
    token_hash          TEXT        NOT NULL,      -- SHA-256 of bearer; the raw token is never stored
    issued_at           TIMESTAMPTZ NOT NULL DEFAULT now(),
    idle_timeout_secs   INTEGER     NOT NULL DEFAULT 1800,   -- 30 min idle
    absolute_expires_at TIMESTAMPTZ NOT NULL,                -- 12h hard ceiling
    revoked_at          TIMESTAMPTZ NULL,
    fresh_until         TIMESTAMPTZ NOT NULL,                -- step-up expiry
    last_seen_at        TIMESTAMPTZ NULL,
    ip_prefix           TEXT        NULL,          -- /24 IPv4 or /48 IPv6 (minimized PII)
    user_agent          TEXT        NULL
);

-- Email verifications
CREATE TABLE queue_email_verifications (
    id          UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID        NOT NULL REFERENCES queue_customers(id) ON DELETE CASCADE,
    email       TEXT        NOT NULL,
    token_hash  TEXT        NOT NULL,    -- SHA-256 of single-use link; raw token never stored
    expires_at  TIMESTAMPTZ NOT NULL,   -- 15 min
    consumed_at TIMESTAMPTZ NULL,
    purpose     TEXT        NOT NULL CHECK (purpose IN ('initial','recovery','rectification')),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Backup / recovery codes
CREATE TABLE queue_backup_codes (
    id          UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID        NOT NULL REFERENCES queue_customers(id) ON DELETE CASCADE,
    code_hmac   TEXT        NOT NULL,   -- HMAC-SHA-256 of raw code; raw code never stored
    batch_id    UUID        NOT NULL,   -- all codes in one generate() call share a batch_id
    used_at     TIMESTAMPTZ NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_queue_backup_codes_customer ON queue_backup_codes(customer_id);

-- WebAuthn registration challenges (short-lived, TTL-enforced)
CREATE TABLE queue_webauthn_challenges (
    challenge_hash  TEXT        PRIMARY KEY,   -- SHA-256 of raw challenge
    customer_id     UUID        NULL,          -- null during registration (pre-user)
    purpose         TEXT        NOT NULL CHECK (purpose IN ('register','login','add_device')),
    expires_at      TIMESTAMPTZ NOT NULL,      -- 60s
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

5.2 RBAC tables

Queue owns all RBAC tables. The existing Console Postgres tables (rbac_groups, rbac_roles, etc. from migration 0021) are the current source of truth for operator-side RBAC. In Phase 3 these tables are extracted to Queue's DB alongside customer RBAC. In Phase 1, Queue reads/writes them directly (co-located DB).

Queue introduces customer-facing RBAC (queue_customer_roles) that is separate from operator RBAC:

-- Customer product-tier roles (antlers-user, antlers-founders, antlers-pro)
CREATE TABLE queue_customer_roles (
    id          UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID        NOT NULL REFERENCES queue_customers(id) ON DELETE CASCADE,
    role        TEXT        NOT NULL,    -- 'antlers-user' | 'antlers-founders' | 'antlers-pro'
    granted_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    granted_by  TEXT        NOT NULL,   -- 'system' | 'operator:<admin_id>'
    revoked_at  TIMESTAMPTZ NULL,
    CONSTRAINT chk_queue_customer_role CHECK (
        role IN ('antlers-user','antlers-founders','antlers-pro','antlers-support-readonly')
    )
);

Operator RBAC (groups, roles, permissions, grants, ticket-scoped) remains in the existing rbac_* tables until Phase 3 extraction.


6. Session Token Strategy

Decision: Signed JWT issued by Queue; Raptor verifies offline. (See ADR-0067.)

This is the critical performance decision. Two options:

Chosen: Signed JWT (RS256, 15-min TTL). Rationale:


7. Service-to-Service Auth

Raptor, Console, Velvet, and Reasonator call Queue using per-service HMAC-signed tokens:


8. Failure Modes

Failure Behavior
Queue service down (co-located same dyno) Process crash takes both Raptor and Queue down. Returns 503 to all clients. Session JWTs already in flight continue to be valid for up to 15 minutes (offline verification).
Queue DB unreachable Queue returns 503 to all new auth requests. Existing valid JWTs continue to work until TTL expires. No fail-open.
KMS unreachable (audit HMAC) Audit writes queue a retry (in-memory or Postgres job table). Auth and session operations continue normally. KMS failure does not block login.
JWT signing key rotation mid-flight Dual-accept window of 5 minutes. Raptor accepts tokens signed by either the current or previous key during the window.

Queue down = all new sessions blocked. This is intentional. The alternative (fail-open) risks unauthenticated access. The outage surface is mitigated by co-location (same dyno, same availability as Raptor today).


9. Customer Audit Events — Queue vs Raptor

Decision: Queue owns the audit writer for all three dimensions of customer audit events.

Rationale: audit events are fundamentally about customer identity and access. The audit writer in PR #1506 lives in Raptor today only because Queue did not exist. Once Queue is live, POST /api/internal/v1/audit/event in Queue replaces the Raptor audit endpoint. Raptor becomes a caller of Queue's audit writer, not the owner.

The customer_audit_events table (schema from PR #1502/migration 016) stays in Raptor's Postgres DB during Phase 1-2 (co-location). It moves to Queue's DB in Phase 3.


10. Console Scope After Queue

Console becomes a thin operator UI layer. Queue's admin surface (/api/v1/admin/*) exposes customer management, RBAC admin, and audit query. Console calls Queue's admin endpoints rather than reading the DB directly. Console retains:

Console loses: direct customer-table reads, direct RBAC table writes. These move to Queue's admin API.


11. Registration + Signup Sequence

sequenceDiagram
    participant U as User (Antlers)
    participant Q as Queue
    participant R as Raptor
    participant M as Postmark

    U->>Q: POST /api/v1/auth/webauthn/register/begin {email, jurisdiction}
    Q->>Q: Check jurisdiction (reject QC with 422)
    Q->>Q: Generate WebAuthn challenge, store queue_webauthn_challenges
    Q-->>U: PublicKeyCredentialCreationOptions

    U->>U: Browser prompts Face ID / YubiKey
    U->>Q: POST /api/v1/auth/webauthn/register/complete {attestation}
    Q->>Q: py-webauthn verify attestation
    Q->>Q: INSERT queue_customers + queue_webauthn_credentials
    Q->>Q: INSERT queue_customer_roles (antlers-user)
    Q->>Q: Emit customer_audit_events (customer.registered)
    Q->>M: Send verification email (single-use token)
    Q-->>U: {customer_id, needs_email_verification: true}

    U->>Q: POST /api/v1/auth/email/verify {code}
    Q->>Q: Stamp email_verified_at_utc
    Q->>Q: Emit customer_audit_events (email.verified)
    Q-->>U: {verified: true}

12. Login + Session Mint Sequence

sequenceDiagram
    participant U as User (Antlers)
    participant Q as Queue
    participant R as Raptor

    U->>Q: POST /api/v1/auth/webauthn/login/begin
    Q-->>U: PublicKeyCredentialRequestOptions (allowCredentials:[])

    U->>U: Browser shows passkey picker
    U->>Q: POST /api/v1/auth/webauthn/login/complete {assertion}
    Q->>Q: Verify assertion, update sign_count
    Q->>Q: Mint queue_sessions row + sign JWT (RS256, 15-min TTL)
    Q->>Q: Emit customer_audit_events (session.issued)
    Q-->>U: Set-Cookie (HttpOnly) + {jwt, customer_id, roles}

    U->>R: GET /api/v1/portfolio (Authorization: Bearer <jwt>)
    R->>R: Verify JWT offline (RS256 public key)
    R-->>U: Portfolio data

13. Rollout Plan

Phase Gate Description
Dark FLAG_QUEUE_V1=off Queue code deployed; all endpoints return 404. Migrations run.
Internal FLAG_QUEUE_V1=staging Queue endpoints live on staging. Raptor auth blueprints remain active. Dual-mode middleware logs both paths.
Beta FLAG_QUEUE_V1=beta Queue endpoints live on prod. Raptor auth blueprints still active (fallback).
Cutover FLAG_QUEUE_V1=on, FLAG_RAPTOR_AUTH_LEGACY=off Raptor auth blueprints return 404. Queue is the sole auth surface.
Cleanup (post-v1) Remove legacy Raptor auth blueprints, migrate tables to Queue's own DB.

Each FLAG_QUEUE_* flag must have a console_flag_promotions row before being promoted to production.


14. Security Considerations

PII collected: - email (registration, verification) — verified email only - display_name (optional, customer-supplied) - ip_prefix (/24 or /48; minimized) — on sessions only - jurisdiction — country/province code for geo-block

Retention: - Customer records: active until DSR erasure request + 30-day cooling period - Sessions: purged at absolute_expires_at; audit shadow retained 2 years - Audit events: 2 years (GDPR Art. 30 obligation) - Backup codes: purged with customer record on DSR erasure - WebAuthn challenges: 60s TTL enforced at application layer + nightly cleanup job

DSR erasure path: - Soft-delete queue_customers.deleted_at - Purge email and display_name after 30-day cooling period - Pseudonymize customer_id in audit rows (replace with dsr_pseudonym_<hash>) - Revoke all active sessions - Invalidate all passkeys - Emit customer.erased audit event (retained with pseudonym for 2 years)

Audit trail: Every Queue endpoint that mutates customer, session, credential, or RBAC state emits to customer_audit_events. Audit writes use HMAC-SHA-256 + AWS KMS (alias/raxx-audit-hmac, ARN in SSM at /raxx/audit/hmac-key-arn) per ADR-0058 and the KMS budget approved 2026-05-09 UTC.

Breach response: - Any breach.* action in customer_audit_events triggers the GitHub Actions breach-notification pipeline (per auth.md §8). - 72-hour GDPR Art. 33 clock starts on first breach audit write. - ops@raxx.app paged within 15 minutes. - Per-service tokens revocable without redeploy by rotating SSM params and restarting dynos.

Kill-switch: - FLAG_QUEUE_V1=off disables all Queue endpoints. - QUEUE_REVOKE_ALL=1 revokes all customer sessions (writes audit row per session). - AUTH_DISABLED=1 (existing Raptor flag) returns 503 on all auth attempts.

No stored credentials: enforced at schema level — no columns named password, secret, otp_seed, recovery_token exist in Queue tables. CI grep (scripts/ci/check_no_credential_fields.sh) covers queue/ directory.


15. Open Questions

These require operator decision before the corresponding sub-cards can be claimed.

OQ-1 — Audit events physical location (post-extraction) Queue logically owns audit events. Currently customer_audit_events lives in Raptor's Postgres (migration 016, PR #1502). Should extraction to Queue's own DB (Phase 3) also move the audit table, or keep audit in Raptor's DB (closer to the writers) with Queue as the API owner only? Recommendation: move to Queue's DB in Phase 3 for clean ownership. If kept in Raptor's DB, the audit API is still Queue's but Queue queries cross-DB.

OQ-2 — Console scope shrinkage When Queue owns customers, Console's customer-admin endpoints become proxies to Queue's admin API. Is the operator comfortable with Console having no direct DB reads for customer data? This is the correct long-term posture but requires that Queue's admin API expose sufficient query depth for Console's operator workflows.

OQ-3 — iOS (#167) timing Does iOS v1 launch with Queue API or the older direct-Raptor pattern? If iOS launches simultaneously with Queue, all iOS auth work must target Queue's endpoints. If iOS launches post-v1, it can ignore this design for now. Decision needed before filing iOS sub-cards.

OQ-4 — Session token revocation window The JWT approach means a revoked session remains technically valid for up to 15 minutes (JWT TTL). Is this acceptable, or does the operator require instant revocation? Instant revocation requires Raptor to call Queue on every request (adds latency) or uses a short-lived token blocklist (adds Redis dependency). Recommendation: accept 15-minute window for v1; add a blocklist if a security incident demands it.