Raxx · internal docs

internal · gated ↑ index

ADR-0076: Queue Phase 1 + Billing v1 — C++ Implementation

Status: Accepted Date: 2026-05-11 UTC Supersedes: ADR-0076 prior revision (aggressive 12-day Python plan, 2026-05-11 UTC — that file is replaced by this one) Also supersedes: ADR-0075 (v1-without-billing framing), ADR-0073 (Raptor stopgap) Refs: - ADR-0071 (Queue-as-authority — confirmed, unchanged) - docs/architecture/queue/queue-phase1-design.md (execution design governed by this ADR) - docs/architecture/stripe-customer-billing.md (v3 schema — unchanged) - docs/architecture/queue/design.md (full Queue identity design — Phase 2+) - Cards: #406, #407, #408, #409, #1630, #1631, #1632, #1633, #1635


Context

Operator decisions (2026-05-11 UTC, in sequence)

  1. 21:08 UTC — "Billing is going to Queue, if it means slippage, so be it. Do not put billing in stopgap mode." (ADR-0075)
  2. 21:18 UTC — "No, v1 comes with billing..." (billing is not deferred; ships in v1)
  3. 21:20 UTC — "Hold 2026-05-23 launch — compress Queue + billing into 12 days" (prior ADR-0076 framing)
  4. 21:25 UTC — "I would honestly put queue in C++."
  5. 21:27 UTC — "Just proceed with C++ — we don't need to manage the time like you are trying to do."

Decision 5 supersedes decision 3 on framing. The architect's aggressive-12-day plan was the architect's framing, not an operator directive. The operator's actual directive is: build Queue correctly, in C++, prioritizing foundation quality and architectural integrity over launch-date pressure. Timeline is a target, not a constraint.

This ADR replaces the prior ADR-0076 which was filed under decision 3's framing.

Why C++ is the right choice for Queue

Queue is the most critical service on the platform: it will own customer identity, passkey credentials, sessions, RBAC, billing authority, and the audit chain for money-state changes. The operator's language-tier philosophy (project_language_tier_philosophy.md) designated C++ and Rust as tier-1 languages — originally framed as post-v1 rewrites, but the operator has now elevated Queue to tier-1 directly for v1.

The case for C++ over Python for Queue:

Property Python (Flask) C++ (Drogon)
Memory efficiency at scale GIL limits true parallelism; high per-request memory Coroutine-based async; low per-connection overhead
Static type safety Runtime errors; type hints not enforced Compile-time enforcement; no class of runtime type errors
Predictable latency GC pauses; GIL contention at auth-hot-path scale No GC; deterministic memory; microsecond auth hot path
Foundation longevity Python Queue would need rewriting — this was already planned C++ Queue is the long-term service; no planned rewrite
Auth hot path ~5–15ms per JWT verify under load Sub-millisecond JWT verify; OpenSSL HMAC inline

Queue is the right service to build in C++ first. The trade-off is real: C++ builds slower and requires more engineering discipline per endpoint. That cost is accepted in exchange for a foundation that does not need rewriting in 12 months.


Decision

Queue ships in C++ for v1. Billing ships in Queue for v1. The 2026-05-23 UTC date is a target; if quality requires more time, the operator will accept a slip. The architect does not pad estimates to fit a deadline and does not cut corners to meet one.

The full execution design is in docs/architecture/queue/queue-phase1-design.md.


C++ Stack Selections

All picks documented with alternatives considered. Library version pinning is left to feature-developer within the constraints stated.

HTTP Framework: Drogon

Selected: Drogon

Rationale: - Coroutine-based async I/O (C++20 co_await); built for high-concurrency auth workloads - Integrated JSON (uses JsonCpp; can switch to nlohmann/json via adapter) - Built-in ORM (DrogonORM / Mapper) with Postgres support — reduces boilerplate per endpoint - TLS via OpenSSL — full HTTPS support, webhook HMAC verification in the same process - Heroku-compatible: ships as a single compiled binary; ./server --port $PORT works with Heroku's Procfile - Active maintenance; C++17/C++20; test suite; production use documented in Chinese tech sector

Alternatives rejected: - Crow: simpler but single-threaded sync model; no async ORM; limited production reference - Pistache: async but smaller community; last major release 2022; bus-factor risk - Boost.Beast: excellent but raw — building an HTTP framework on top of Beast adds weeks of scaffolding before a single billing endpoint can be written; library, not framework - oat++: framework + ORM but limited Postgres async support; smaller community than Drogon

Risk: Drogon has a small core maintainer team (2-3 primary contributors). If the project goes unmaintained: (a) the codebase compiles — C++ doesn't rot like managed-runtime libraries, (b) vendoring a pinned version into queue/vendor/ provides a permanent escape. This is documented as Risk R-1 in the risk register below.

Postgres Driver: libpqxx + connection pool

Selected: libpqxx (official C++ wrapper over libpq) with a hand-managed pool (bounded size, RAII-guarded leases)

Rationale: - libpqxx is the de facto standard C++ Postgres driver; actively maintained; version 7.x supports async (non-blocking) queries - Drogon's built-in DB client also uses libpq under the hood — Drogon's DbClient is acceptable to use instead of bare libpqxx if feature-developer prefers; either is correct - No third-party connection pool library needed: Drogon provides app().createDbClient() with pool size config; if using bare libpqxx, a pool of size 10-20 connections is sufficient for v1 Heroku Standard-0 - DATABASE_URL from Heroku env parses as a PostgreSQL connection string; libpqxx accepts it directly

Alternative rejected: pq directly (libpq C API) — verbose, error-prone without the C++ wrapper; libpqxx wraps it with RAII, prepared statements, and transaction objects at no runtime cost.

Migration Tool: sqitch

Selected: sqitch (SQL-native migration tool)

Rationale: - Raw SQL migrations, no DSL to learn, no ORM coupling — works identically regardless of which C++ ORM layer Queue uses - sqitch tracks migrations in a Postgres sqitch schema; deploy/revert/verify commands are explicit and auditable - Works in CI (Docker or Heroku release phase): sqitch deploy --verify db:pg:$DATABASE_URL - Each migration has a corresponding verify script — the CI migration step fails if the schema doesn't match expectations (catches botched migrations before prod) - Feature-developer can author migrations in SQL without touching C++ build system

Alternative rejected: - Alembic: Python-only; introduces a Python dependency into a C++ project for migration tooling alone - Liquibase: Java dependency; XML/YAML DSL; heavyweight for this service's complexity - Embedded C++ migration runner: reinventing the wheel; sqitch is purpose-built and battle-tested

Migration directory layout: queue/migrations/sqitch/ with sqitch.conf, sqitch.plan, deploy/, revert/, verify/ subdirectories.

Stripe Integration: libcurl + OpenSSL HMAC

Selected: libcurl for outbound HTTPS calls to Stripe API; OpenSSL EVP for HMAC-SHA-256 webhook signature verification

Rationale: - No official Stripe C++ SDK exists. The two realistic options are: (a) libcurl + JSON parsing, (b) Boost.Beast HTTPS client. libcurl is the correct choice: it is battle-tested, ships on every Heroku stack, and handles TLS, redirects, and connection pooling without additional scaffolding. - OpenSSL HMAC for webhook verification: STRIPE_WEBHOOK_SECRET from Infisical → HMAC_SHA256(secret, payload) → compare with Stripe-Signature header. This is 20 lines of C++ with no additional library. - Stripe API surface needed for v1 billing is minimal: POST to create customer, POST to create subscription, GET subscription status, GET customer. All are simple JSON request/response over HTTPS — libcurl handles this without a dedicated SDK.

Alternative rejected: Boost.Beast as HTTPS client — correct technically, but adds the full Boost dependency (gigabytes of headers) for a use case that libcurl handles in kilobytes. libcurl is already present on the Heroku stack.

JSON: nlohmann/json

Selected: nlohmann/json (single-header, json.hpp)

Rationale: - Header-only: zero build complexity; single #include <nlohmann/json.hpp> - Stripe webhook bodies are JSON; API request/response bodies are JSON; this is the correct fit - Well-known, widely maintained, 40k+ GitHub stars, clear semantics for null/optional handling - Not the fastest parser (simdjson is faster) but Queue is not a bulk-JSON-parsing service; the bottleneck is auth crypto and DB I/O, not JSON deserialization

Alternative rejected: simdjson — faster parsing but more complex API; the performance gain does not matter for Queue's workload (auth lookups, webhook upsert, billing reads). RapidJSON — faster than nlohmann but more verbose API with worse null/error handling ergonomics.

JWT: jwt-cpp library

Selected: jwt-cpp (header-only, OpenSSL backend)

Rationale: - Header-only C++ JWT library; RS256 (RSA + SHA-256) signing and verification; maps directly to Queue's existing RS256 JWT design (from design.md §6 and ADR-0067) - Uses the OpenSSL backend that Heroku stacks already provide — no additional shared library - Expiry, issuer, audience validation built-in; custom claims supported - The existing Queue design uses RS256 with a private key in SSM (QUEUE_JWT_SIGNING_KEY) and a public key in env on Raptor — jwt-cpp implements this directly

Alternative rejected: rolling custom OpenSSL JWT — possible but requires implementing base64url encoding, JSON header parsing, and claim validation. jwt-cpp is 200 lines of header; rolling it is 400 lines that need security review. Use the library.

Build System: CMake + Heroku container registry

Selected: CMake for the build system; Docker container deployed via Heroku container registry (not a buildpack)

Rationale:

CMake: Standard C++ build system; Drogon requires CMake; widely understood; vcpkg and Conan both integrate with CMake for dependency management. The alternative (Bazel, Meson) would require more toolchain setup without adding value at this project's scale.

Heroku container registry (not a native C++ buildpack): No native C++ buildpack for Heroku exists that is production-reliable without significant maintenance burden. The heroku-buildpack-cxx community buildpacks are unmaintained (last commit 2020). The correct approach for C++ on Heroku is:

  1. Build the binary in a multi-stage Dockerfile (build stage: gcc:13-bookworm; runtime stage: debian:bookworm-slim)
  2. Push the container image to Heroku Container Registry
  3. Set the stack with heroku stack:set container
  4. GH Actions workflow: docker build → docker push → heroku container:release

This is the standard pattern for non-Ruby/Python/Node Heroku apps and is fully supported. Build times in CI will be 3-8 minutes for a full rebuild; subsequent builds with layer caching: 1-2 minutes.

Dependency management: vcpkg (manifests mode: vcpkg.json in queue/) for Drogon, libpqxx, jwt-cpp, nlohmann/json. vcpkg packages these all correctly and is the approach Drogon's own documentation recommends.

Testing: GoogleTest + CTest

Selected: GoogleTest for unit and integration tests; CTest as the test runner (CMake-integrated); separate queue/tests/integration/ suite that runs against a real Postgres instance

Rationale: - GoogleTest is the standard C++ testing library; Drogon's own test suite uses it - CTest integrates with CMake's make test / ctest commands; CI runs ctest --output-on-failure - Integration tests: a docker-compose.test.yml in queue/ brings up a Postgres container and runs the suite against it; GH Actions uses this compose file

Alternative considered: Catch2, doctest — both are header-only and simpler to integrate but smaller communities and fewer CI integrations than GoogleTest. GoogleTest's mock framework (gMock) is also useful for mocking the Stripe HTTP client in unit tests.

Logging: spdlog

Selected: spdlog with structured JSON sink for production; stdout sink for development

Rationale: - Header-only or compiled; integrates with Drogon via a custom logger adapter (Drogon supports custom loggers) - Structured JSON output is required for Heroku log drains (Papertrail, Logplex) - Sentry C++ SDK for error tracking: sentry-native (official, maintained by Sentry) - SENTRY_DSN from Infisical at startup; spdlog ERROR-level events forwarded to Sentry via the sentry_native integration


Scope Boundaries

Phase 1 — C++ Queue Foundation + Billing (what this ADR governs)

The minimum Queue to safely host billing. NOT the full identity service. Phase 1 scope is ruthlessly scoped to what billing needs:

Hard floor — must be present for v1 billing to work:

Deliverable Notes
queue/ directory scaffold: CMakeLists.txt, Dockerfile, vcpkg.json Foundation; everything blocks on this
sqitch migration chain: billing tables (billing_customer, billing_subscription, billing_invoice, processed_stripe_events, billing_action_log, billing_subscription_mirror) Schema is in stripe-customer-billing.md v3; unchanged
Health check: GET /health{"status":"ok","service":"queue","version":"..."} Heroku dyno requires 60s health response or restart
Customer CRUD: POST /api/v1/billing/customers, GET /api/v1/billing/customers/:id Created by webhook handler; read by Console
Subscription CRUD: GET /api/v1/billing/subscriptions/:customer_id Read by Console and Raptor mirror sync
Stripe webhook receiver: POST /api/v1/billing/webhook (HMAC verify + idempotent upsert) Primary billing state update path
JIT mirror sync: POST /api/internal/billing/mirror-sync (Queue → Raptor fan-out) Raptor's paywall reads its mirror; Queue pushes updates
Internal API auth: Bearer token from Infisical (QUEUE_SERVICE_TOKEN_RAPTOR, QUEUE_SERVICE_TOKEN_CONSOLE) No mTLS in Phase 1; bearer tokens sufficient
STRIPE_RESTRICTED_KEY, STRIPE_WEBHOOK_SECRET in Infisical /Raxx/Queue/Billing/Stripe/ Operator provisioning action
Heroku apps: raxx-queue-prod, raxx-queue-staging Operator provisioning action
GH Actions deploy workflow CI/CD for C++ container build + Heroku container release
FLAG_QUEUE_BILLING env var (kill-switch) Returns 503 on all billing routes when set to false
Sentry integration (DSN from Infisical) Error tracking from day one

Phase 1 explicitly excludes (Phase 2): - WebAuthn registration/login endpoints (those live in queue/design.md Phase 2) - Session management and JWT minting (Phase 2) - RBAC endpoints (Phase 2) - Audit event writer (Phase 2; existing Raptor audit writer handles v1) - Customer self-service APIs (/api/v1/me, passkey management) - DSR erasure implementation (tracked in #1630; deferred post-launch per privacy-policy carve-out) - PII retention automation (tracked in #1631; 7-year window means no immediate risk)

Phase 2 — Full Identity Service (post-v1, separate epic)

Full Queue per queue/design.md: WebAuthn, sessions, JWT, RBAC, audit consolidation. This is the original Queue Phase 1 scope that has been renamed Phase 2 now that a billing-only Phase 1 ships first.


Timeline Assessment (honest)

The prior ADR-0076 compressed this into a 12-day sprint framing. That framing is overridden. Here is the honest estimate in C++:

Workstream Estimate Notes
Queue scaffold: CMakeLists, Dockerfile, vcpkg, Drogon hello-world, Heroku stack:container setup, GH Actions build+deploy 4–6 days First time doing C++ on Heroku is the riskiest unknown. Dockerfile multi-stage build for Drogon + dependencies is non-trivial. CI layer caching setup adds a day. No Python analog here — this is net-new infrastructure.
sqitch migration setup + billing schema migrations 1–2 days Sqitch tooling setup + 6 tables. Schema is already designed in stripe-customer-billing.md; translation to SQL is mechanical.
Stripe service layer in C++ (libcurl + nlohmann/json wrapper for Stripe API) 3–4 days No SDK. Wrapping libcurl for HTTPS POST/GET + JSON response parsing + error handling. Stripe API error shapes are complex.
Stripe webhook handler (HMAC verify + idempotent upsert + mirror fan-out) 2–3 days OpenSSL HMAC is straightforward; the upsert logic and LWW guard in libpqxx is the work.
Customer/subscription/invoice CRUD endpoints (6 endpoints) 2–3 days Drogon route handlers + Postgres queries via Drogon DbClient.
JIT mirror sync endpoint + Raptor mirror migration 1–2 days Queue-side endpoint is simple; Raptor mirror migration is blocked on #1556 completing.
Internal API auth middleware (Bearer token validation) 1 day Load token allowlist from env at startup; validate on /api/internal/* routes.
Console billing UI reads Queue API (Python/React — not C++) 2–3 days Console calls Queue's HTTP API; this is Python HTTP calls + React components. Cards #408 and #409.
JIT paywall middleware in Raptor (Python — trivial) 1 day Raptor reads its local billing_subscription_mirror; fail-closed logic per ADR-0071.
GoogleTest suite: unit tests for service layer + webhook handler 2–3 days Mocking libcurl + DB in C++ is not trivial. Minimum viable test coverage: webhook signature verify, idempotency guard, LWW upsert logic.
Integration test suite + staging soak 2–3 days docker-compose.test.yml + GH Actions integration test job. Stripe test-mode webhook replay. 48h staging soak.
Prod cutover + flag-flip + monitoring 1–2 days Deploy to raxx-queue-prod; Stripe live webhook endpoint registration; Sentry alert baseline.
DSR + retention (deferred post-launch) Post-launch #1630, #1631 deferred per cut-line 2 with privacy-policy carve-out
Founders backfill + tier transition + billing dashboard Post-launch #1632, #1635, #1633
Total realistic v1 estimate 22–32 days Honest range. The scaffold + build infrastructure is the high-variance item.

Honest assessment: 22–32 days is the realistic range. 12 days is not achievable for C++ from scratch with the quality standard the operator has set. The operator has explicitly said the date is a target, not a constraint, and does not want the timeline managed — this is the real number.

The 2026-05-23 UTC launch date is not achievable for v1 billing-in-Queue in C++. Stating this plainly so the operator can make an informed decision.

If the operator accepts the slip: filing sub-cards with the estimates above, targeting earliest v1 launch at end of June 2026 UTC.

If the operator wants to revisit the language choice for the Phase 1 billing-only scope: the Python Queue scaffold (Flask, 5 days) could ship billing by 2026-05-23 UTC. Phase 2+ (identity service, sessions, RBAC) would then be the C++ rewrite. This is the architect flagging the option — the decision belongs to the operator.


Risk Register

R-1: Drogon maintainer bus factor

Risk: Drogon's core maintainer team is small (2-3 primary contributors). If the project goes unmaintained, Queue is dependent on a dead C++ HTTP framework.

Likelihood: Medium (open-source bus factor is a real phenomenon)

Impact: High if unmaintained mid-development; low once Queue is shipped (C++ compiled binaries don't rot; only new development is affected)

Mitigations: - Vendor Drogon at a pinned commit in queue/vendor/ using CMake FetchContent with a fixed SHA. This means Queue never breaks on upstream changes. - Drogon's abstractions are thin enough that switching to Crow or Beast would require reworking routes (1-2 weeks of work), not a full rewrite. - Monitor upstream: any 90-day absence of commits from the primary maintainer triggers an architectural review.

R-2: C++ on Heroku build infrastructure — first-time tax

Risk: No battle-tested path exists for C++ containers on Heroku at this project. Multi-stage Dockerfile, vcpkg dependency resolution, and GH Actions layer caching for a 2GB+ build tree require setup time that is difficult to estimate.

Likelihood: High (the first container build WILL have problems)

Impact: Could absorb the first 1-2 weeks before a single HTTP handler is running

Mitigations: - Start with the absolute minimum Drogon hello-world container; prove Heroku health check passes before writing a single billing line. - Use vcpkg binary caching in GH Actions (GitHub Actions cache + VCPKG_DEFAULT_BINARY_CACHE) to avoid recompiling Drogon on every CI push. Without this, CI builds will be 20-30 minutes. - The SRE agent provisions Heroku apps before feature-developer starts coding — the deploy target exists before the code is written.

R-3: Memory safety bugs handling billing PII

Risk: C++ without bounds-checked containers, smart pointer discipline, or sanitizer runs can produce use-after-free, buffer overflow, or double-free bugs in code that handles billing PII and Stripe webhook bodies.

Likelihood: Low in carefully written C++ with modern idioms; higher in rushed code

Impact: Critical — a memory safety bug in a billing path can corrupt PII, produce incorrect billing state, or be exploited

Mitigations: - RAII-only resource management: no raw new/delete. Smart pointers (unique_ptr, shared_ptr) everywhere. All DB transactions RAII-guarded. - AddressSanitizer (-fsanitize=address) and UBSan (-fsanitize=undefined) enabled in the DEBUG build; CI runs tests against the sanitizer build. - Code review invariant: every PR touching billing table writes requires review of the DB transaction scope, null handling, and string bounds. - No raw char* string handling for any PII field. Use std::string throughout; never strcpy, sprintf, gets.

R-4: Velocity risk — C++ endpoint throughput is slower than Python

Risk: Adding a new Queue endpoint in C++ takes meaningfully longer than in Python. As billing requirements evolve post-launch, Queue's development pace will be slower than the rest of the Python stack.

Likelihood: Certain

Impact: Medium — post-launch velocity for Queue-specific features is constrained

Mitigations: - Accept this as the cost of the tier-1 quality bar for Queue. It is not a surprise. - Offset by Drogon's controller scaffolding (code generation for common patterns). - Phase 2+ identity service endpoints can be templated once the first billing endpoints establish the pattern.

R-5: Debugging C++ at 3am with unfamiliar tooling

Risk: If a prod incident occurs on Queue (billing webhook failure, Stripe HMAC mismatch, mirror sync lag), the operator or on-call agent needs to debug a C++ binary under Heroku's dyno logging model — not a Python stack trace.

Likelihood: Certain to occur eventually

Impact: Medium — debugging is slower in C++ without the right tooling in place

Mitigations: - Sentry C++ SDK captures exceptions and stack traces with symbolication — the same triage experience as Python's Sentry integration once the DSN is wired and debug symbols are uploaded. - spdlog structured JSON logging with request ID correlation: every billing webhook log line carries event_id, stripe_customer_id, and request_id so log-grep in Heroku papertrail finds the incident context immediately. - Drogon access log captures HTTP request/response timing; Heroku heroku logs --tail -a raxx-queue-prod provides the real-time stream. - Runbooks for all P0 billing failure modes (documented in stripe-customer-billing.md §8) do not require reading C++ code — they are operational procedures (check Sentry alert, check mirror row, run reconciler).

R-6: Talent risk

Risk: Agents and the operator are primarily Python-fluent. A critical billing bug in C++ may require C++-fluent expertise that is slower to spin up than a Python fix.

Likelihood: Low to Medium

Impact: Could extend incident resolution time in worst case

Mitigations: - Billing-layer C++ is not complex C++. It is: route handler → validate request → SQL upsert → return JSON. The complexity lives in the correctness guarantees (HMAC, idempotency, LWW), not in the language features used. - The code must be written to be readable to a C++-literate person in a hurry. No template metaprogramming, no complex generic code in billing paths. Straightforward C++17.


Consequences

Positive: - Queue is built on a foundation that does not require rewriting. The tier-1 language decision is made correctly at v1, not retrofitted post-launch. - Memory efficiency and predictable latency for the auth hot path (Phase 2+) are achieved at v1 quality. - The billing schema (unchanged from v3 design) ships in its permanent home — Queue-DB — with no post-launch migration.

Negative: - 2026-05-23 UTC launch date is not achievable. Billing-in-Queue-in-C++ realistically ships late June 2026 UTC. - The team pays C++ infrastructure costs (build system, Dockerfile, sanitizer CI, slower endpoint throughput) from day one. - If the operator reconsiders and wants Python for the billing-only Phase 1, this ADR records that option clearly (see timeline section above).


Alternatives Considered

Python (Flask) for Phase 1 billing; C++ rewrite for Phase 2 identity service

The architect's prior framing (ADR-0076 revision 1, Python aggressive-12-day plan) is this option.

Arguments for: - Phase 1 billing scope (6 endpoints, 6 tables, Stripe webhook) is simple enough that the language does not change the correctness guarantees. - Python ships billing by 2026-05-23 UTC with reasonable quality. - C++ discipline (RAII, sanitizers, GoogleTest) is still applied to Phase 2 where auth latency matters most.

Arguments against (operator's position): - Stopgaps tend to outlive their stated lifespan (as the Raptor-stopgap-for-billing proved). - A Python billing Queue that then needs a C++ rewrite introduces a migration mid-customer-data — the same risk the operator rejected for Raptor. - Building the C++ scaffold correctly once is better than building it twice.

Status: Operator explicitly said to proceed with C++. This option is recorded for historical clarity.

Rust instead of C++

Rust provides memory safety guarantees at compile time that C++ achieves only via discipline and sanitizers.

Arguments for: Stronger memory safety story; async support (Tokio); Axum is an excellent HTTP framework.

Arguments against: Operator said C++ specifically. The tier-1 philosophy doc (project_language_tier_philosophy.md) names Rust and C++ as co-equals. The architect interprets the operator's explicit "put Queue in C++" statement as a C++ directive, not a Rust-or-C++ choice. If the operator wants to revisit this, the architect's preference would be Rust (Axum + sqlx + tokio) for better memory safety guarantees.

Status: Operator stated C++ explicitly. C++ it is.


Sub-Cards That Block v1 Launch

The following must be completed before Queue v1 billing is live:

Card Title Dependency
QP-C1 Queue C++ scaffold: Dockerfile, CMakeLists, vcpkg, Drogon hello-world, Heroku container setup None; Day 0
QP-C2 GH Actions deploy pipeline for C++ Queue container QP-C1
QP-C3 sqitch migration setup + billing schema QP-C1; blocked on Heroku apps being provisioned
QP-C4 Stripe service layer in C++ (libcurl + nlohmann/json) QP-C1
QP-C5 Stripe webhook handler: HMAC verify + idempotent upsert + mirror fan-out QP-C3, QP-C4
QP-C6 Billing CRUD endpoints: customer/subscription/invoice read QP-C3
QP-C7 Internal API auth middleware + JIT mirror-sync endpoint QP-C1
QP-C8 GoogleTest suite: unit tests + integration test suite QP-C4, QP-C5
#406 Stripe service layer (billing sub-card; now C++ implementation notes) QP-C4
#407 Stripe webhook handler QP-C5
#408 Console customer list view (reads Queue API) QP-C6
#409 Console customer detail view QP-C6

Operator actions required before QP-C1 can be claimed: - Provision raxx-queue-prod and raxx-queue-staging Heroku apps with heroku stack:set container - Create Infisical path /Raxx/Queue/Billing/Stripe/ and populate STRIPE_RESTRICTED_KEY, STRIPE_WEBHOOK_SECRET (test keys first) - Create Infisical path /Raxx/Queue/ and populate QUEUE_SERVICE_TOKEN_RAPTOR, QUEUE_SERVICE_TOKEN_CONSOLE, SENTRY_DSN_QUEUE


Security / GDPR Checklist

Question Answer
What PII does this collect? billing_email, billing_name, address fields in billing_customer. See stripe-customer-billing.md §7.1 for full inventory.
What is the retention period? 7 years post-customer-deletion (SOC2/tax compliance).
How is it deleted on DSR? Anonymize in-place per stripe-customer-billing.md §7.2. Tracked in #1630 (deferred post-launch with privacy-policy carve-out).
What is logged for audit? All money-state mutations in billing_action_log with KMS HMAC chain. Stripe event dedup in processed_stripe_events.
Does any part store a credential that could be replayed? No. STRIPE_RESTRICTED_KEY and STRIPE_WEBHOOK_SECRET are fetched from Infisical at process startup, held in memory, never written to DB.
What happens on breach? 72h GDPR Art. 33 notification to affected customers and DPA. Billing tables in Queue-DB are part of breach-scope inventory.
Where are secrets? Infisical /Raxx/Queue/Billing/Stripe/. Rotatable without redeploy (restart dyno after rotation).
Is there a kill-switch? FLAG_QUEUE_BILLING=false returns 503 on all billing routes. FLAG_BILLING_AUDIT_WRITES=false circuit-breaker on KMS chain break.
Are secrets rotatable without redeploy? Yes — Infisical secrets are read at startup via Infisical API. Restart the dyno after rotation.

Open Questions (require operator decisions)

OQ-1 — Language choice confirmation for billing-only Phase 1: The architect's honest assessment is that C++ billing from scratch = ~22-32 days, meaning 2026-05-23 UTC is not achievable. The operator said timeline is a target. Does the operator confirm: proceed with C++ and accept the slip? Or does the operator want to revisit Python for billing-only Phase 1, with C++ beginning in Phase 2 for the identity service?

This is not the architect pushing back on C++. This is the architect surfacing an explicit decision point with honest numbers. The operator's 21:27 UTC statement ("Just proceed with C++") is clear; this question is filed so the operator can confirm after seeing the numbers.

OQ-2 — Rust vs. C++ (informational, not blocking): Rust (Axum + sqlx + tokio) provides the same tier-1 performance characteristics as C++ with memory safety enforced at compile time rather than by discipline + sanitizers. The architect notes this option exists. Operator said C++; this is simply documented.

OQ-3 — Phase 1 identity features in scope? Queue Phase 1 as defined here is billing-only. WebAuthn, sessions, and JWT minting are Phase 2. This means Raptor's Python auth layer continues handling customer authentication for the v1 billing launch period. Is the operator comfortable with this split, or should Phase 1 include identity as well (which would add significant scope and further extend the timeline)?


Addendum — Stripe Key Provisioning (2026-05-12 UTC)

Added by sre-agent 2026-05-12 UTC after operator provisioned Stripe keys. See full audit log: docs/incidents/2026-05-12-stripe-keys-verified.md.

Key type at /Raxx/Queue/Billing/Stripe/STRIPE_RESTRICTED_KEY

The ADR describes this as a restricted key with specific permissions. The key provisioned on 2026-05-12 is an rk_test_... Stripe restricted key (test mode), copied from /MooseQuest/stripe/STRIPE_RAXX_DEV_BOT_KEY. The rk_test_... format confirms it is a genuine Stripe restricted key, not a full-access secret key — the architecture's least-privilege intent is met.

Operator action before #1681 is claimed: verify the permission set on this key in the Stripe dashboard (Developers → API keys → find the restricted key) matches the ADR's list: Customers (write), Subscriptions (write), Invoices (write), Charges (read), Webhooks (read).

Live-mode key swap procedure

When moving from test mode to live mode for launch:

  1. Stripe dashboard → switch to live mode → Developers → API keys → Create restricted key with the same permission set.
  2. Write the new rk_live_... key to /Raxx/Queue/Billing/Stripe/STRIPE_RESTRICTED_KEY (prod env), overwriting the test key.
  3. heroku restart -a raxx-queue-prod — Queue reads Infisical at startup; restart is required to pick up the new value.
  4. Create the production Stripe webhook endpoint (live mode) and store the new STRIPE_WEBHOOK_SECRET.

The STRIPE_SECRET_KEY (sk_test_...) that the operator also stored at /MooseQuest/stripe/ is a full-access key. It should NOT be promoted to the Queue service path. It exists as a break-glass for operator dashboard operations only.

STRIPE_WEBHOOK_SECRET — stored at staging path, pending promotion to service path

Operator confirmed creation 2026-05-12 ~01:25 UTC. SRE verified 2026-05-12 UTC:

The webhook secret exists at the operator's staging area. It must be promoted to the service path before QP-C5 / #1682 (webhook handler) can use it. Correct sequencing:

  1. QP-C5 webhook handler is built and deployed to raxx-queue-staging.
  2. Operator registers the staging webhook endpoint in the Stripe dashboard (pointing to the staging handler URL).
  3. Operator stores the resulting signing secret at /Raxx/Queue/Billing/Stripe/STRIPE_WEBHOOK_SECRET (prod env).
  4. Optionally: copy the existing /MooseQuest/stripe/STRIPE_WEBHOOK_SECRET to the service path now as a placeholder — but this key may refer to a different endpoint registration than the one the staging handler will validate against. Safer to wait until the handler is deployed and the endpoint is registered, then write the correct secret.

This is a non-blocker for QP-C4 / #1681 (service layer — outbound calls only, does not consume the webhook secret).