Raxx · internal docs

internal · gated

Queue Phase 1 — C++ Foundation + Billing

Status: Design v1 Owner: software-architect Date: 2026-05-11 UTC Governing ADR: ADR-0076 (C++ language selection, timeline assessment, stack picks) Milestone: raxx.app v1 — first paid customer Refs: - docs/architecture/queue/design.md — full Queue identity service (Phase 2+) - docs/architecture/stripe-customer-billing.md v3 — billing schema (unchanged) - ADR-0071 — Queue as billing authority - ADR-0076 — C++ selection + timeline


1. Context

This document governs Queue Phase 1: the minimum C++ service to host billing safely. It is not the full Queue identity service. That is Phase 2 (queue/design.md).

Why Phase 1 is billing-only: The operator's 2026-05-11 UTC decisions established two things simultaneously — Queue ships in C++, and billing ships in Queue for v1. Queue's full identity service (WebAuthn, sessions, JWT, RBAC, audit consolidation) is architecturally correct but out of scope for Phase 1. Phase 1 establishes:

  1. The C++ build infrastructure (Dockerfile, CMake, vcpkg, Heroku container stack)
  2. The Postgres schema for billing (6 tables, unchanged from stripe-customer-billing.md v3)
  3. The minimum HTTP surface to process Stripe webhooks, fan out to Raptor's mirror, and serve Console reads

Raptor's Python auth layer continues handling customer authentication during Phase 1. Phase 2 brings WebAuthn and session management into Queue.


2. Invariants

All TradeMasterAPI invariants apply. The following are specifically material to this service:

# Invariant
I-1 No stored credentials. STRIPE_RESTRICTED_KEY and STRIPE_WEBHOOK_SECRET are read from Infisical at process startup, held in memory, never written to the database or any log.
I-2 All timestamps UTC. Every timestamp column is TIMESTAMPTZ. Every log line includes an ISO 8601 UTC timestamp.
I-3 Audit trail for every money-state change. All mutations to billing_customer, billing_subscription, billing_invoice emit a row to billing_action_log with KMS HMAC chain integrity.
I-4 GDPR by default. Billing PII has a 7-year retention floor, DSR anonymization path, and breach-notification coverage.
I-5 Fail-closed. FLAG_QUEUE_BILLING=false returns 503 on all billing routes. FLAG_BILLING_AUDIT_WRITES=false halts new billing_action_log writes on chain break.
I-6 Memory safety discipline. No raw new/delete. All resources RAII-managed. No raw char* for PII. AddressSanitizer and UBSan enabled in CI debug build.
I-7 No inline secrets. All secrets from Infisical at startup. scripts/ci/check_no_credential_fields.sh grep covers queue/.
I-8 Stripe is authoritative for subscription state. Queue's DB reflects Stripe state via webhook upsert + nightly reconciler. Queue never overrides Stripe except via explicit operator action logged to billing_action_log.

3. Repository Layout

queue/
  CMakeLists.txt              ← root build file; vcpkg toolchain included
  vcpkg.json                  ← dependency manifest: drogon, libpqxx, nlohmann-json,
                                 jwt-cpp, spdlog, sentry-native, curl
  Dockerfile                  ← multi-stage: build stage (gcc:13-bookworm),
                                 runtime stage (debian:bookworm-slim)
  Procfile                    ← not used in Heroku container mode (CMD in Dockerfile)
  .dockerignore

  src/
    main.cpp                  ← Drogon app startup; load env; set routes; app().run()
    config/
      app_config.hpp          ← typed config struct; populated from env at startup
      secrets.cpp / .hpp      ← Infisical fetch at startup; no secrets in global state
    controllers/
      health_controller.cpp   ← GET /health
      billing_customer_controller.cpp
      billing_subscription_controller.cpp
      billing_webhook_controller.cpp
      billing_internal_controller.cpp ← /api/internal/* (mirror-sync, console reads)
    services/
      stripe_client.cpp / .hpp    ← libcurl wrapper for Stripe REST API
      webhook_processor.cpp / .hpp ← HMAC verify, dedup, upsert, mirror fan-out
      billing_action_log.cpp / .hpp ← KMS HMAC chain writer
    middleware/
      internal_auth_filter.cpp ← Bearer token validation for /api/internal/*
      flag_gate_filter.cpp     ← Returns 503 when FLAG_QUEUE_BILLING=false
    db/
      db_client.hpp            ← Thin wrapper around Drogon's DbClient; RAII transactions
    util/
      hmac_util.cpp / .hpp     ← OpenSSL EVP HMAC-SHA-256 for webhook signature verify

  include/
    queue/                    ← public headers (types, error codes, response shapes)

  tests/
    unit/
      test_hmac_util.cpp
      test_webhook_processor.cpp
      test_stripe_client_mock.cpp
    integration/
      test_billing_webhook_integration.cpp
      docker-compose.test.yml  ← postgres:16 + queue binary against real DB

  migrations/
    sqitch/
      sqitch.conf
      sqitch.plan
      deploy/
        01-billing-schema.sql
        02-billing-subscription-mirror.sql
        03-billing-action-log.sql
        04-billing-processed-events.sql
        05-billing-reconcile-log.sql
        06-billing-reliability-view.sql
      revert/
        (reverse order of deploy/)
      verify/
        (assert tables exist + check indexes)

  ops/
    backfill_billing_from_raptor.py   ← post-launch data migration script (Python OK for one-shot ops)
    reconciler_check.sh               ← calls /api/internal/billing/reconcile; run by GH Actions cron

  sops/
    rotation/
      stripe-restricted-key.md
      queue-service-tokens.md

4. Data Model

Schema is unchanged from stripe-customer-billing.md v3. Restated here for completeness with C++-specific notes.

Migration chain (sqitch)

Migration Tables Notes
01-billing-schema billing_customer, billing_subscription, billing_invoice Core billing tables
02-billing-subscription-mirror billing_subscription_mirror PII-free; Raptor reads this
03-billing-action-log billing_action_log Money-state audit; KMS HMAC chain
04-billing-processed-events processed_stripe_events Idempotency dedup table
05-billing-reconcile-log billing_reconcile_log Nightly reconciler drift log
06-billing-reliability-view v_customer_payment_reliability Derived view; see stripe-customer-billing.md §4.6

All migrations are additive in Phase 1. sqitch revert drops them in reverse order. No data loss on revert (pre-cutover; no real billing data yet).

C++ model types

Each billing table maps to a plain C++ struct:

// src/models/billing_customer.hpp
struct BillingCustomer {
    std::string id;                // UUID string
    std::string queue_customer_id;
    std::string stripe_customer_id;
    std::string billing_email;     // PII; never logged
    std::optional<std::string> billing_name;  // PII; optional
    std::optional<std::string> address_line1; // PII
    std::optional<std::string> address_line2;
    std::optional<std::string> address_city;
    std::optional<std::string> address_state;
    std::optional<std::string> address_postal_code;
    std::optional<std::string> address_country; // retained post-erasure
    std::optional<std::string> default_pm_last4;
    std::optional<std::string> default_pm_brand;
    std::string customer_segment; // enum: 'founders'|'organic'|...
    std::optional<std::string> acquisition_source;
    std::string stripe_created_at; // ISO 8601 UTC
    std::string created_at;
    std::string updated_at;
};

PII fields are annotated in comments. The logging layer has a log_safe() variant that omits all PII-annotated fields.


5. API Surface (Phase 1)

All responses are JSON. All errors:

{ "error": { "code": "machine_readable", "message": "human readable" } }

Health

GET /health

No auth. Returns 200 within 1 second or Heroku restarts the dyno.

{ "status": "ok", "service": "queue", "version": "0.1.0" }

If DB connection fails: returns 503 {"error":{"code":"db_unavailable","message":"..."}}.

Billing — public surface (called by Raptor/Console via service token)

POST /api/v1/billing/customers

Auth: Bearer service token (QUEUE_SERVICE_TOKEN_RAPTOR or QUEUE_SERVICE_TOKEN_CONSOLE)

Body: Full BillingCustomer fields (JSON).

Response 201: {"id":"<uuid>"} — creates billing_customer row; emits billing_action_log entry.

GET /api/v1/billing/customers/:id

Auth: Bearer service token.

Response 200: BillingCustomer JSON (all fields, including PII — caller is Console). Returns 404 if not found.

GET /api/v1/billing/subscriptions/:queue_customer_id

Auth: Bearer service token.

Response 200:

{
  "subscription": { /* BillingSubscription */ },
  "plan_tier": "founders",
  "status": "active",
  "current_period_end": "2026-06-11T21:00:00Z"
}

Returns 404 if no active subscription. Returns most recent active/trialing row.

Stripe Webhook Receiver

POST /api/v1/billing/webhook

Auth: None (Stripe calls this). Stripe-Signature header verified via HMAC-SHA-256 (STRIPE_WEBHOOK_SECRET).

Body: Stripe event JSON.

Processing pipeline (all within a DB transaction): 1. Verify Stripe-Signature header → 400 immediately on failure (security event; Sentry CRIT) 2. Parse event JSON (nlohmann/json) 3. Check processed_stripe_events for event.id → 200 immediately if already seen (idempotent) 4. Route by event type: customer.*, customer.subscription.*, invoice.* 5. Upsert billing row (LWW guard on updated_at) 6. Detect tier downgrade; set feature_locked_at if new tier < previous tier 7. Insert processed_stripe_events row 8. Insert billing_action_log row (KMS HMAC chain) 9. Fan out mirror sync to Raptor (POST /api/internal/billing/mirror-sync via libcurl) 10. Return 200 to Stripe

Response 200 (always, even on partial success): Stripe retries on 5xx. Returning 200 after DB write failure and before Stripe retry would be incorrect — the transaction is atomic. If the transaction fails, return 500 so Stripe retries.

Handled event types: customer.created, customer.updated, customer.deleted, customer.subscription.created, customer.subscription.updated, customer.subscription.deleted, invoice.created, invoice.updated, invoice.payment_succeeded, invoice.payment_failed, invoice.voided

Internal surface (mTLS not in Phase 1; Bearer token)

POST /api/internal/billing/mirror-sync

Auth: Bearer QUEUE_SERVICE_TOKEN_RAPTOR

Purpose: Called by Queue's own webhook processor after a subscription upsert; also callable by the nightly reconciler. Updates Raptor's billing_subscription_mirror.

Body:

{
  "queue_customer_id": "<uuid>",
  "plan_tier": "founders",
  "status": "active",
  "current_period_end": "2026-06-11T21:00:00Z",
  "updated_at": "2026-05-11T21:00:00Z"
}

Response 204 on success. Queue fans out via libcurl POST to Raptor's RAPTOR_BASE_URL/api/internal/billing/mirror-sync with the same payload. Failure is logged (Sentry WARN) but does not fail the webhook transaction.

POST /api/internal/billing/reconcile

Auth: Bearer service token (GH Actions bot token or Console service token)

Purpose: Triggers nightly reconciliation. Calls Stripe API, compares to DB, writes billing_reconcile_log rows. Does not auto-correct.

Response 200:

{ "mismatches_found": 0, "checked_subscriptions": 47 }

6. Webhook Sequence

sequenceDiagram
    participant S as Stripe
    participant WH as Queue /api/v1/billing/webhook
    participant QDB as Queue-DB (Postgres)
    participant KMS as AWS KMS
    participant R as Raptor /api/internal/billing/mirror-sync

    S->>WH: POST event (Stripe-Signature header)
    WH->>WH: HMAC verify (OpenSSL EVP + STRIPE_WEBHOOK_SECRET)
    alt Signature invalid
        WH-->>S: 400 (security event; Sentry CRIT)
    else Signature valid
        WH->>QDB: BEGIN TRANSACTION
        WH->>QDB: SELECT FROM processed_stripe_events WHERE event_id = ?
        alt Already processed
            WH->>QDB: ROLLBACK
            WH-->>S: 200 (idempotent)
        else New event
            WH->>QDB: UPSERT billing_* row (LWW guard on updated_at)
            WH->>KMS: GenerateMac(previous_hash || row_payload)
            KMS-->>WH: hmac_hash
            WH->>QDB: INSERT billing_action_log (hmac_chain_hash)
            WH->>QDB: INSERT processed_stripe_events
            WH->>QDB: COMMIT
            WH->>R: POST /api/internal/billing/mirror-sync (libcurl; fire-and-log)
            WH-->>S: 200
        end
    end

7. Internal Auth Model

Phase 1 uses Bearer tokens, not mTLS. Each calling service has a dedicated token:

Service Token env var (Queue side) Set on service
Raptor QUEUE_SERVICE_TOKEN_RAPTOR Raptor env: QUEUE_BEARER_TOKEN=<same>
Console QUEUE_SERVICE_TOKEN_CONSOLE Console env: QUEUE_BEARER_TOKEN=<same>
GH Actions reconciler QUEUE_SERVICE_TOKEN_CRON GH Actions secret

Tokens are loaded at startup into an in-memory std::unordered_set<std::string>. The internal_auth_filter Drogon middleware checks the Authorization: Bearer <token> header against this set before routing to any /api/internal/* handler.

Token rotation: update Infisical, restart raxx-queue-{prod,staging} dyno. No redeploy required.


8. Build + Deploy Pipeline

Dockerfile (multi-stage)

Stage 1 — build (gcc:13-bookworm)
  - apt-get: cmake, ninja, libssl-dev, libcurl4-openssl-dev, libpq-dev, uuid-dev, git
  - vcpkg install (from vcpkg.json manifest) — layer-cached in CI via GitHub Actions cache
  - cmake configure + ninja build
  - strip binary

Stage 2 — runtime (debian:bookworm-slim)
  - apt-get: libssl3, libcurl4, libpq5 (runtime deps only)
  - COPY --from=build /app/queue_server /usr/local/bin/queue_server
  - CMD ["queue_server", "--port", "$PORT"]

Heroku stack:set container is required. The heroku.yml declares a single web process type.

GH Actions workflow (.github/workflows/queue-deploy.yml)

Trigger: push to main (path filter: queue/**)

Jobs:
1. build-test:
   - Restore vcpkg cache (key: vcpkg-${{ hashFiles('queue/vcpkg.json') }})
   - docker build --target build-stage (outputs test binary)
   - Run unit tests (ctest)
   - Run integration tests (docker-compose.test.yml)
   - Save vcpkg cache

2. build-release (needs: build-test):
   - docker build --target runtime-stage
   - docker push heroku.com/raxx-queue-staging/web
   - heroku container:release web -a raxx-queue-staging
   - Wait for health check: curl https://raxx-queue-staging.herokuapp.com/health

3. promote-to-prod (needs: build-release, manual approval gate):
   - docker tag ... heroku.com/raxx-queue-prod/web
   - heroku container:release web -a raxx-queue-prod
   - Wait for health check

9. Migrations Deployment

sqitch runs in the Heroku release phase. Add to heroku.yml:

release:
  command:
    - sqitch deploy --verify db:pg:$DATABASE_URL

sqitch is installed in the runtime Docker image. Migration failures abort the release (Heroku release phase contract: non-zero exit = rollback to previous slug).

Rollback: sqitch revert is available via heroku run sqitch revert db:pg:$DATABASE_URL. This drops all Phase 1 tables — only safe pre-cutover with no real billing data. Post-cutover rollback is a data migration, not a sqitch revert.


10. Rollout Plan

Phase Gate What changes
Dark FLAG_QUEUE_BILLING=false (default) Queue deployed; all billing routes return 503. Migrations applied. Health check responds.
Internal flag FLAG_QUEUE_BILLING=true on staging only Billing routes live on raxx-queue-staging. Stripe test-mode webhook pointed at staging.
Integration test Stripe test-mode events replay cleanly; webhook idempotency confirmed; mirror sync to Raptor staging verified
Beta FLAG_QUEUE_BILLING=true on prod Live webhook; Stripe live-mode endpoint registered to raxx-queue-prod. Console reads prod Queue API.
GA 48h soak with no P0/P1 billing incidents Flag gate removed; always-on.

11. Security Considerations

Question Answer
What PII does this collect? billing_email, billing_name, address fields in billing_customer. See stripe-customer-billing.md §7.1.
What is the retention period? 7 years post-customer-deletion (SOC2/tax compliance floor). After 7 years: anonymize in-place.
How is it deleted on DSR? Anonymize in-place: billing_email → tombstone token; address fields → NULL. Invoice rows retained for tax. Tracked in #1630.
What is logged for audit? All money-state mutations in billing_action_log with KMS HMAC chain. Stripe event dedup in processed_stripe_events. spdlog INFO for every successful webhook event (no PII in log lines; only event_id, event_type, stripe_customer_id).
Does any part store a credential that could be replayed? No. STRIPE_RESTRICTED_KEY and STRIPE_WEBHOOK_SECRET read from Infisical at startup; held in process memory only; never written to DB or logs.
What happens on breach? 72h GDPR Art. 33 notification. Queue-DB billing tables added to breach-scope inventory. Existing breach-notification automation path handles the notification.
Where are secrets? Infisical /Raxx/Queue/Billing/Stripe/ for Stripe keys; /Raxx/Queue/ for service tokens. All rotatable without redeploy.
Kill-switch? FLAG_QUEUE_BILLING=false returns 503 on all billing routes. FLAG_BILLING_AUDIT_WRITES=false halts KMS chain writes on chain break (W-KMS scenario).
Memory safety? AddressSanitizer + UBSan in CI debug build. No raw new/delete. RAII throughout. No raw char* for PII strings.

12. Open Questions

OQ-1 — Language confirmation after timeline numbers: The honest estimate for C++ Phase 1 billing from scratch is 22–32 days. 2026-05-23 UTC is not achievable. Operator said timeline is a target; this is the real number. Does the operator confirm: proceed with C++ and accept the slip? (Operator's 21:27 UTC statement points to yes; confirmed in ADR-0076 OQ-1.)

OQ-2 — Phase 1 identity scope: This design defers WebAuthn and sessions to Phase 2. Raptor's Python auth handles customer authentication during Phase 1. Is the operator comfortable with this split?

OQ-3 — Postgres instance for Queue: Does Queue share Raptor's Postgres instance (Heroku Standard-0 add-on) in Phase 1, or does it get its own? Sharing is simpler for Phase 1 but couples the two services at the DB layer. Own instance is cleaner but costs ~$50/mo more. Recommendation: share in Phase 1 (billing tables live in Queue's schema namespace); own instance in Phase 3 (per queue/migration-plan.md).

OQ-4 — DSR and retention for v1 launch:

1630 (DSR) and #1631 (retention) can be deferred post-launch only if the privacy policy includes an explicit carve-out ("billing DSR available via support@raxx.app; automated self-service in development"). BLR and PM must confirm before Queue goes to prod-beta.