Raxx · internal docs

internal · gated

Stripe Billing Gap Analysis — 2026-05-13 UTC

Design topic: Gap analysis for #1682 (Stripe webhook handler) + full paid-subscription readiness for v1 launch. Parent epic: #403 — Billing console — operator-facing customer billing Governing ADR: [ADR-0076](https://internal-docs.raxx.app/architecture/adr/0076-queue-phase1-billing-v1-aggressive-12day.html) Date: 2026-05-13 UTC Author: architect-agent Refs: #1682, #1681, #1680, [ADR-0075](https://internal-docs.raxx.app/architecture/adr/0075-billing-stays-in-queue-operator-override.html), [ADR-0073](https://internal-docs.raxx.app/architecture/adr/0073-stripe-v1-home-decision.html), stripe-customer-billing.md


Invariants That Apply

These platform invariants from the project charter are load-bearing for this design:


Section A — What Is Actually Shipped

The following is verified by reading source code, not by trusting closed issue titles.

Queue C++ service (raxx-queue-prod / raxx-queue-staging)

Component File(s) Status Verified by
C++ scaffold (Drogon, CMakeLists, Dockerfile, vcpkg) queue/CMakeLists.txt, queue/Dockerfile, queue/vcpkg.json SHIPPED File presence + contents
Heroku container stack + heroku.yml queue/heroku.yml SHIPPED File presence
sqitch billing schema (6 migrations) queue/migrations/sqitch/deploy/01-*.sql through 06-*.sql with matching revert/verify SHIPPED All 6 migration files + verify scripts present
Stripe C++ service layer queue/src/stripe/stripe_client.cpp/.h, customer_service.cpp/.h, subscription_service.cpp/.h, invoice_service.cpp/.h, stripe_error.h SHIPPED Full implementation files present; closed #1681
Internal auth middleware queue/src/middleware/internal_auth_filter.cpp/.h SHIPPED File presence
Mirror-sync endpoint queue/src/handlers/mirror_sync.cpp/.h SHIPPED File presence
Reconcile endpoint queue/src/handlers/reconcile.cpp/.h SHIPPED File presence
Health handler queue/src/health_handler.cpp/.h SHIPPED File presence
Billing CRUD endpoints (customer/subscription/invoice read) queue/src/handlers/billing/customers_handler.*, subscriptions_handler.*, invoices_handler.* SHIPPED Handler files + route registrations present; closed #1683
Connection pool queue/src/db/connection_pool.cpp/.h SHIPPED File presence
DB schema + query headers queue/src/db/schema.h, queue/src/db/queries.h SHIPPED File presence
GoogleTest unit + integration suite scaffolding queue/tests/unit/test_hmac_util.cpp, queue/tests/unit/test_webhook_processor.cpp, queue/tests/integration/test_billing_webhook_integration.cpp SHIPPED (test scaffolding only — implementations are stubs) [PR #1830](https://github.com/raxx-app/TradeMasterAPI/pull/1830), closed #1685
GH Actions deploy pipeline closed #1679 SHIPPED Issue closed
FLAG_QUEUE_BILLING in feature_flags.yaml backend_v2/api/feature_flags.yaml line ~1994 SHIPPED File read
Stripe keys in Infisical /Raxx/Queue/Billing/Stripe/ ADR-0076 addendum 2026-05-12 SHIPPED (test mode, staging path) ADR-0076 addendum documents SRE verification

Console (operator-facing)

Component Status
Console billing blueprint (/billing, /billing/alert-config) SHIPPED — but these are infrastructure billing (epic #757), not customer billing (#403)
Console api_billing.py (GET /api/billing/summary) SHIPPED — infra billing only, not Stripe customer billing
Console customer list view (#408) NOT SHIPPED — blocked on Queue billing API
Console customer detail view (#409) NOT SHIPPED — blocked on Queue billing API

Antlers (customer-facing)

Component Status
Subscription/plan selection UI NOT SHIPPED — no Billing/ or Subscription/ page found under frontend/trademaster_ui/src/pages/
Stripe Checkout or Payment Element NOT SHIPPED
Customer billing snapshot view NOT SHIPPED
Cancellation flow NOT SHIPPED
PricingTeaser on getraxx landing SHIPPED — static teaser with "Free to start. Pro when it pays for itself." copy; links to #pricing hash, no live pricing page

Raptor (backend_v2)

Component Status
backend_v2/api/routes/subscription.py EXISTS — but this is a mock/stub with hardcoded plan data and no Stripe integration. Not the real billing path.
billing_subscription_mirror table + JIT paywall middleware NOT SHIPPED — no code found in Raptor for mirror table or paywall gate
Stripe webhook handler in Raptor NOT SHIPPED — the original area:raptor plan (ADR-0073) was rejected; webhook lives in Queue per ADR-0075/0076

Section B — #1682 Claim vs. Reality

#1682 claims: "Implement POST /api/v1/billing/webhook — Stripe webhook receiver."

#1682 AC Actual State
HMAC signature verification VERIFIED-MISSING. No hmac_util.h or webhook_handler.h in queue/src/. Test files test_hmac_util.cpp and test_webhook_processor.cpp explicitly say "these tests define the interface contract for a class that will live in webhook_processor.h — replace this stub when the real header lands." The stub is complete; the implementation is not.
Idempotency guard (dedup on event_id) VERIFIED-MISSING. Same — test stub exists, implementation does not.
LWW upsert with updated_at guard VERIFIED-MISSING. Test covers the algorithm; no production code path.
Tier downgrade detection + feature_locked_at VERIFIED-MISSING.
Mirror fan-out to Raptor PARTIAL — queue/src/handlers/mirror_sync.cpp exists (closed #1684), but the webhook handler that would call it does not exist yet.
Unit tests (HMAC, LWW, downgrade) PARTIAL — test scaffolding with stubs exists; full implementation tests are placeholders.
Integration test: full webhook replay against Postgres PARTIAL — integration test file exists with explicit GTEST_SKIP placeholders pending #1682.
STRIPE_WEBHOOK_SECRET in vault PARTIAL — key exists at /MooseQuest/stripe/STRIPE_WEBHOOK_SECRET; NOT yet promoted to /Raxx/Queue/Billing/Stripe/STRIPE_WEBHOOK_SECRET per ADR-0076 addendum.

Verdict on #1682: The card is NOT stale. It accurately identifies genuinely missing work. The 2-3 dev-day estimate is plausible for experienced C++ in this codebase given the test scaffolding already defines the interface contract and the Stripe service layer underneath is shipped.

What "several Stripe PRs already landed" actually means: Three closed Queue PRs (#1678, #1679, #1680, #1681, #1683, #1684, #1685) built the foundation layers below the webhook handler. The webhook handler itself (#1682) was intentionally sequenced last in the chain and is genuinely the remaining blocker.


Section C — Pre-Launch Slice (must land by 2026-05-23 UTC)

The minimum viable paid-subscription path per ADR-0076's hard floor:

  1. A customer can subscribe to a paid plan.
  2. Stripe charges them and creates a subscription.
  3. Queue processes the webhook and stores billing state.
  4. Feature gates flip on for that customer (via billing_subscription_mirror read in Raptor).
  5. Payment fails → grace period → suspension.
  6. Customer can cancel (Stripe Portal is acceptable for v1).
Gap Required for Hard Floor Notes
Stripe webhook handler (#1682) YES — without this, no billing state is written to Queue The single remaining Queue-side blocker
Subscribe endpoint (POST /api/v1/billing/subscribe) YES — customer must be able to start a subscription Not yet filed as a card; implied in ADR-0076 hard floor but no specific issue exists
billing_subscription_mirror in Raptor + JIT paywall middleware YES — Raptor must block un-subscribed customers from gated features No code exists in Raptor for this; no issue filed for Raptor side of mirror-sync
Stripe Portal session endpoint (POST /api/v1/billing/portal) YES (for cancellation — Portal is the v1 solution) No Queue-side handler exists; no issue filed
STRIPE_WEBHOOK_SECRET promoted to service path YES — webhook handler won't work without it Operator action required
Stripe products/prices created in test mode YES — stripe_price_id values must exist to create subscriptions Operator action; no tracking issue beyond #1632 (founders tier backfill post-launch)
Customer-facing subscribe UI in Antlers YES — customers need a path to start a subscription No file exists; no issue filed for v1
Payment failure handling YES for hard floor webhook handler (#1682) covers invoice.payment_failed event; dunning email is separate
Payment failure email via Postmark SHOULD — Postmark is approved out of sandbox No issue filed; implied by dunning strategy but post-v1 in ADR-0076
Console customer list view (#408) ADR-0076 says "ships if schedule holds" — v1 target, not hard floor Blocked on Queue billing API being stable
Console customer detail view (#409) Same — v1 target, not hard floor Blocked on #408
FLAG_QUEUE_BILLING flip procedure YES — kill-switch must be documented and tested Flag definition exists; flip SOP not documented

Section D — Post-Launch (defer)

These are confirmed post-launch per ADR-0076 cut-lines, or are not required for the minimum paid-subscription path:


These are pre-launch cards only. PM files them; do not create as GitHub issues from this document.

Card E-1: Stripe webhook handler (QP-C5) [#1682 — REWORK DESCRIPTION]

Title: feat(queue): Stripe webhook handler — HMAC verify, idempotent upsert, mirror fan-out Body: Implement queue/src/handlers/billing/webhook_handler.h/.cpp and queue/src/stripe/hmac_util.h/.cpp. The test stubs in queue/tests/unit/test_webhook_processor.cpp and queue/tests/unit/test_hmac_util.cpp already define the interface contract; replace the stub #include comments with the real implementation headers. Promote STRIPE_WEBHOOK_SECRET from /MooseQuest/stripe/ to /Raxx/Queue/Billing/Stripe/STRIPE_WEBHOOK_SECRET before claiming this card. AC: All ACs in the current #1682 body plus: timestamp tolerance check (5-minute window per ADR-0076 §6.3), billing_action_log row written per processed event, Sentry CRIT on HMAC failure. Size: M (2-3 days) Risk: HIGH — primary billing state update path; timing attack surface Dependencies: #1680 (schema, SHIPPED), #1681 (Stripe client, SHIPPED), STRIPE_WEBHOOK_SECRET in vault (operator action)

Card E-2: Subscribe endpoint

Title: feat(queue): POST /api/v1/billing/subscribe — create Stripe customer + subscription Body: Implement the customer-facing subscription creation endpoint in Queue. On POST with {plan_tier, stripe_payment_method_id}: (1) call CustomerService::create() to provision the Stripe customer, (2) call SubscriptionService::create() with the price ID for the selected tier, (3) insert billing_customer + billing_subscription rows, (4) return {client_secret, subscription_id} for Antlers to confirm payment. Price IDs (STRIPE_PRICE_ID_PRO, STRIPE_PRICE_ID_PRO_PLUS) read from env; never hardcoded. Idempotency: check for existing active subscription before creating — return 409 with SUBSCRIPTION_ALREADY_ACTIVE if found. AC: Happy path creates Stripe customer + subscription; 409 on duplicate active subscription; price IDs loaded from env; billing_customer + billing_subscription rows written to Queue-DB; billing_action_log row appended. Size: M (2-3 days) Risk: HIGH — money path; must be tested against Stripe test mode Dependencies: E-1 (webhook must exist to receive the subscription.created confirmation event), #1681 (Stripe client, SHIPPED)

Card E-3: Stripe Portal session endpoint

Title: feat(queue): POST /api/v1/billing/portal — Stripe Customer Portal session Body: Implement the endpoint that creates a Stripe Customer Portal session for the authenticated customer. Calls stripe.BillingPortal.Session.create({customer: stripe_customer_id, return_url: ...}). Returns the portal URL. This is the v1 mechanism for cancellation, payment method update, and invoice history — no native UI required for launch. AC: Authenticated customer can retrieve a portal URL; URL is valid for redirect to Stripe-hosted portal; billing_action_log row written on each portal session creation. Size: S (1 day) Risk: MEDIUM Dependencies: E-2 (customer must exist in Stripe), #1681 (SHIPPED)

Card E-4: billing_subscription_mirror in Raptor + JIT paywall middleware

Title: feat(raptor): billing_subscription_mirror table + paywall middleware Body: Add the Alembic migration for billing_subscription_mirror (4 columns: queue_customer_id, plan_tier, status, current_period_end, updated_at) to Raptor's Postgres chain. This is the PII-free read-local table Raptor uses for the paywall check. Add middleware that intercepts requests to gated routes, reads the mirror row for the authenticated customer, and returns 402 if status NOT IN ('active', 'trialing') or mirror row is missing (fail-closed). Mirror table is populated by Queue's mirror-sync fan-out after each webhook event. Unblocked from #1556 Raptor Postgres migration (which is confirmed shipped per project context). AC: Migration creates billing_subscription_mirror; paywall middleware returns 402 for non-active customers; fail-closed on missing row with Sentry CRIT alert billing.mirror.missing_row; active customers pass through; test coverage for active/inactive/missing states. Size: M (2-3 days) Risk: HIGH — paywall correctness is a revenue gate; fail-closed makes it customer-visible if broken Dependencies: E-1 (Queue must push mirror-sync on webhook events), Raptor Postgres migration (SHIPPED)

Card E-5: Customer-facing subscribe flow in Antlers

Title: feat(antlers): subscription plan selection + Stripe Payment Element checkout Body: Add a /app/subscribe page (or onboarding step) in Antlers that shows the Free/Pro/Pro+ tier cards per docs/marketing/pricing-v2.md ($0/$29/$79). On plan selection (Pro or Pro+): embed Stripe Payment Element using the client secret from Queue's POST /api/v1/billing/subscribe. On payment confirmation: redirect to dashboard with a success toast. If the customer is on Free, show the current plan; show Pro/Pro+ options (not grayed — per feedback_hide_dont_gray_unavailable_features.md, only show the upgrade options that apply). Feature-flag gated behind FLAG_ANTLERS_SUBSCRIBE. AC: Free → Pro upgrade flow completes a Stripe payment in test mode; Payment Element renders; subscription confirmation visible on dashboard; no hardcoded price IDs (read from env via API); correct tier reflected after webhook round-trip. Size: L (3-4 days) Risk: MEDIUM — Stripe Payment Element integration complexity; dependent on E-2 and E-1 Dependencies: E-1 (webhook), E-2 (subscribe endpoint)

Card E-6: Payment failure handling — grace period email

Title: feat(queue+postmark): invoice.payment_failed → grace period notification email Body: In the webhook handler (E-1 scope, or a follow-on card if scope is tight), on invoice.payment_failed: (1) write billing_invoice row with status=open, (2) send a transactional email via Postmark to billing_email with invoice link and retry instructions. Postmark is approved out of sandbox (2026-05-09). Grace period: customer retains access for 7 days after first payment failure; access suspends on second failure (set billing_subscription.status = 'past_due''unpaid'). Suspension: mirror-sync flips billing_subscription_mirror.status to past_due; paywall blocks at grace end. AC: invoice.payment_failed event triggers Postmark email to customer billing_email; grace period starts; second failure suspends access via mirror-sync; email never logs billing_email; test mode: no real emails sent (Postmark test mode). Size: M (2-3 days) Risk: MEDIUM — email deliverability and grace period logic; GDPR: billing_email treated as PII throughout Dependencies: E-1 (webhook handler), E-4 (paywall enforces suspension via mirror)

Card E-7: FLAG_QUEUE_BILLING flip SOP + staging soak

Title: ops(queue): FLAG_QUEUE_BILLING staging soak + flip SOP Body: Document and test the kill-switch procedure. Staging soak: with FLAG_QUEUE_BILLING=true on raxx-queue-staging, run Stripe CLI webhook replay against all 11 handled event types. Verify: HMAC rejection returns 400, idempotent replay returns 200, subscription row state matches expected, mirror-sync fan-out updates Raptor mirror. Produce docs/ops/runbooks/billing/flag-queue-billing-flip.md with prod flip procedure and rollback. Operator action: register staging webhook endpoint in Stripe dashboard (test mode) to get real HMAC events. AC: All 11 Stripe event types pass replay test; HMAC rejection verified; mirror row state verified after each event; SOP document exists at named path. Size: S (1 day, mostly ops) Risk: LOW after E-1 ships Dependencies: E-1, E-4

Sequencing for 2026-05-23 UTC

Day 1-3:    E-1 (webhook handler) ← critical path blocker; everything else depends on this
Day 2-4:    E-2 (subscribe endpoint) — can start in parallel with E-1 after Day 1
Day 2-3:    E-3 (portal session) — 1-day task, parallelize
Day 3-5:    E-4 (Raptor mirror + paywall) — starts after E-1 is merged to staging
Day 4-7:    E-5 (Antlers subscribe flow) — starts after E-2 is on staging
Day 4-6:    E-6 (payment failure email) — can start after E-1 is merged
Day 7-8:    E-7 (soak + SOP) — last; requires all above
Day 8-10:   Buffer + prod cutover + live-mode key swap (deferred until EIN)

With 1-2 feature-developers working in parallel: - E-1 on critical path: 2-3 days - E-2, E-3, E-4 in parallel: adds 2-3 days - E-5, E-6 in parallel: adds 3-4 days - E-7: 1 day - Total realistic: 8-11 days

This is within the 10-day window to 2026-05-23 UTC if E-1 starts today (2026-05-13 UTC) and both developers are dedicated. Zero slip budget on E-1.


Section F — Re-scope Recommendation for #1682

Recommendation: Keep #1682 open and update its description.

The card accurately describes the remaining work. What needs updating:

  1. Remove the blocked label. The blockers are resolved: #1680 is closed (schema shipped), #1681 is closed (Stripe client shipped). The STRIPE_WEBHOOK_SECRET vault path is the only remaining operator action; that is a same-day unblock once the operator promotes the secret.

  2. Add explicit file targets. Update the card body to name the files that must be created: - queue/src/stripe/hmac_util.h + queue/src/stripe/hmac_util.cpp - queue/src/handlers/billing/webhook_handler.h + queue/src/handlers/billing/webhook_handler.cpp - Replace stub #include comment in queue/tests/unit/test_hmac_util.cpp - Replace stub #include comment in queue/tests/unit/test_webhook_processor.cpp - Promote integration test GTEST_SKIP placeholders to real assertions

  3. Add timestamp tolerance AC. The current ACs do not mention the 5-minute timestamp tolerance window from ADR-0076 §6.3. Add: "Webhook events older than 5 minutes are rejected (configurable tolerance)."

  4. Add billing_action_log AC. The HMAC and idempotency ACs are present but the audit trail AC is missing. Add: "Each processed event writes a row to billing_action_log with KMS HMAC chain."

  5. Note the STRIPE_WEBHOOK_SECRET operator action explicitly. The PM grooming comment (2026-05-11) called this out as a blocker; it needs to be in the card body as a pre-flight requirement, not just in ADR comments.


Section G — Test-Mode Strategy

How the code handles test-mode vs. live-mode keys

The Queue StripeClient is constructed via StripeClient::fromEnv() which reads STRIPE_RESTRICTED_KEY from the process environment at startup. The key is rk_test_... format in staging; it will be rk_live_... format in prod after EIN lands. No code checks the prefix — the switch is purely an env-var swap.

The code does not hard-code sk_test_* or pk_test_* prefix checks. Verified by reading stripe_client.h: the constructor takes a raw api_key string with no prefix validation. The base URL (https://api.stripe.com/v1) is the same for both modes — Stripe routes to test vs. live based on the key, not the URL. No code changes required for the live-mode flip.

Migration when EIN lands

Stripe products and prices are mode-specific. The test-mode stripe_price_id values stored in billing_subscription rows will not exist in live mode. The migration procedure:

  1. Operator actions (one-time, at live-mode activation): a. Stripe dashboard → switch to live mode → Products → recreate Pro and Pro+ products with the same names and descriptions. b. Create a Price object under each product matching the pricing-v2.md values ($29/mo Pro, $79/mo Pro+). c. Note the live price_XXXX IDs. d. Write live price IDs to Infisical at /Raxx/Queue/Billing/Stripe/STRIPE_PRICE_ID_PRO and /Raxx/Queue/Billing/Stripe/STRIPE_PRICE_ID_PRO_PLUS. e. Create live-mode Stripe webhook endpoint pointing at raxx-queue-prod handler URL; copy the signing secret to /Raxx/Queue/Billing/Stripe/STRIPE_WEBHOOK_SECRET (prod env). f. Swap STRIPE_RESTRICTED_KEY at /Raxx/Queue/Billing/Stripe/ from rk_test_... to rk_live_.... g. heroku restart -a raxx-queue-prod (Queue reads secrets at startup).

  2. Code changes required: none. The stripe_price_id in billing_subscription rows created pre-live-mode are test rows; they will be superseded by new subscriptions created against the live price IDs. The founders tier backfill (#1632) runs post-live-mode-activation to populate live price IDs on any existing rows.

  3. One risk: test-mode webhook events after live-mode switch. If FLAG_QUEUE_BILLING=true on staging continues pointing at a test-mode webhook registration, staging will still receive test events on the old secret. This is correct — staging and prod use different webhook endpoint registrations with different secrets. The env-var paths are already scoped per-app (raxx-queue-staging vs raxx-queue-prod).

No hard-coded key prefix checks

Confirmed by source read: stripe_client.h constructor accepts any string as api_key. stripe_client.cpp (implementation) passes the key as a Bearer token header. No runtime check that the key begins with sk_test_ or rk_test_. The live-mode swap is a pure env-var operation.


Security + GDPR Checklist

Question Answer
What PII does this collect? billing_email, billing_name, address fields in billing_customer (Queue-DB). See stripe-customer-billing.md §7.1.
What is the retention period? 7 years post-customer-deletion (SOC2/tax).
How is it deleted on DSR? Anonymize in-place per stripe-customer-billing.md §7.2. DSR flow tracked in #1630 (deferred post-launch with privacy-policy carve-out).
What is logged for audit? All money-state mutations in billing_action_log with KMS HMAC chain. Stripe event dedup in processed_stripe_events.
Does any part store a credential that could be replayed? No. STRIPE_RESTRICTED_KEY and STRIPE_WEBHOOK_SECRET are fetched from Infisical at startup; held in process memory only.
What happens on breach? 72h GDPR Art. 33 notification to affected customers and DPA per existing automation. billing_customer PII is in scope — billing tables must be added to breach-scope inventory before launch.
Where are secrets? Infisical /Raxx/Queue/Billing/Stripe/. STRIPE_WEBHOOK_SECRET at staging path only; must be promoted to service path (operator action).
Is there a kill-switch? FLAG_QUEUE_BILLING=false on raxx-queue-prod returns 503 on all billing routes. FLAG_BILLING_AUDIT_WRITES=false circuit-breaker on KMS chain break.

Open Questions (require operator decision before sub-cards are claimed)

  1. OQ-1 (blocking E-5): Checkout integration shape. Does v1 use Stripe Checkout (hosted redirect) or Payment Element (embedded iframe)? Payment Element gives better UX continuity; Checkout is faster to implement. Decision needed before E-5 can be scoped.

  2. OQ-2 (blocking E-6): Grace period policy. Is the grace period 7 days, or different? Is it one failure or two before suspension? ADR-0076 and stripe-customer-billing.md are silent on the exact grace window. This affects the webhook handler's business logic.

  3. OQ-3 (blocking E-5, partially): Pricing page. Does the subscribe flow live inside Antlers at a known route (e.g. /subscribe), or is it a modal on the onboarding wizard, or something else? The getraxx/PricingTeaser.js links to #pricing which does not exist yet. Is the Antlers subscribe page the same as the getraxx.com pricing page, or separate?

  4. OQ-4 (blocking E-4): Raptor Postgres migration chain head. The billing_subscription_mirror Alembic migration must be filed as the next revision after the current chain head. Feature-developer must confirm the next revision number before claiming E-4.

  5. OQ-5 (informational): Billing tables in breach-scope inventory. stripe-customer-billing.md §7.5 calls for billing tables to be added to the breach-notification inventory before launch. Is there a documented inventory to update, or does this need a new card?

  6. OQ-6 (informational): docs/marketing/pricing-v2.md mentions SMS/webhook alerts for Pro+ tier but TradeMaster invariants say email is the single contact channel. The SMS line in the feature matrix (Email alerts | Yes | Yes | Yes + SMS/webhook) conflicts with the no-SMS invariant. This should be corrected in the pricing doc before the subscribe UI ships — otherwise the marketed feature set does not match what will be built.