ADR-0078: Queue Cloudflare Edge Protection
Status: Accepted
Date: 2026-05-11 UTC
Refs:
- Parent card: #1733
- Design doc: docs/architecture/queue-cf-edge-design.md
- WAF threat model: docs/security/waf-threat-model-2026-05-11.md HIGH-WAF-2
- ADR-0076: Queue C++ Phase 1
- ADR-0077: WAF strategy (sister ADR, just-merged)
Context
Queue (raxx-queue-{prod,staging}.herokuapp.com) is the centralized identity / RBAC / customer / audit service. As of 2026-05-11 UTC, the Heroku origin URL is directly routable from the public internet. No Cloudflare proxy sits in front of it.
The WAF threat model (HIGH-WAF-2) identified this as a critical architecture gap:
- All CF WAF managed rulesets are bypassed
- Cloudflare Bot Fight Mode is bypassed
- CF rate-limiting rules are bypassed
- Future CF Access policy (admin endpoints) is bypassed before it exists
- Stripe webhook delivery, backup-code redemption, and billing CRUD are all reachable via the direct Heroku URL
The existing Raptor service (api.raxx.app) has the same gap addressed by FLAG_ENFORCE_CF_ORIGIN + cloudflare_origin_guard.py. This ADR extends that pattern to Queue's C++ service.
Decision
Queue is placed behind Cloudflare as a proxied CNAME (queue.raxx.app → raxx-queue-prod.herokuapp.com, proxied=true). A C++ Drogon HttpFilter (CfOriginGuard) is added to Queue's middleware stack, gated by FLAG_ENFORCE_QUEUE_CF_ORIGIN (default false, flipped after soak). A Terraform module (terraform/queue/) manages the DNS records and WAF ruleset.
Three alternatives were evaluated:
| Alternative | Why rejected |
|---|---|
| CF Tunnel (cloudflared daemon in Heroku dyno) | Adds cloudflared binary dependency to Dockerfile; tunnel provisioning adds operator overhead; tunnel credentials add new secret path. CF Tunnel is correct for services that cannot be reached via public DNS — Heroku is reachable, so a CNAME is simpler and follows the established platform pattern. |
| No change (accept bypass) | Unacceptable. Queue owns identity, billing state, and audit chain. Direct-origin bypass means WAF and rate limits provide no protection on auth or billing endpoints. |
| CF Access service tokens for service-to-service | The BFM/service-token interaction (incident 2026-05-12, feedback_cf_access_does_not_bypass_bot_fight_mode.md) established that CF Access service tokens do not bypass Bot Fight Mode. A paired WAF skip rule is required either way. Using the existing InternalAuthFilter Bearer token as the WAF skip rule predicate is simpler: one WAF rule instead of CF Access + WAF rule, no new CF Access application to manage. |
Architecture Commitments
-
DNS:
queue.raxx.appandqueue-staging.raxx.appare CNAME records in theraxx.appCloudflare zone,proxied=true. Managed byterraform/queue/dns.tf. -
Origin guard:
queue/src/middleware/cf_origin_guard.cppimplements a DrogonHttpFilterthat checks forCF-Connecting-IPheader presence. Gated byFLAG_ENFORCE_QUEUE_CF_ORIGIN. Allowlisted path:/health. Runs beforeInternalAuthFilterin the filter chain. -
WAF ruleset:
terraform/queue/waf.tfmanages a Cloudflare ruleset on theraxx.appzone covering Queue's hostnames. Phase 1 rules: service-to-service BFM skip, Stripe webhook BFM skip, billing webhook rate limit, global rate limit, CF Managed Ruleset (Block). OWASP CRS in Log mode during Phase 3 soak, promoted to Block afterward. -
Bot Fight Mode / service-to-service: WAF Priority-1 skip rule covers
/api/internal/*paths whereAuthorizationheader is present. Stripe webhook paths use aStripe-Signatureheader skip. These are tighter than a zone-wide BFM disable.InternalAuthFilterremains the authoritative gate for service identity. -
CF Access: NOT applied in Phase 1. Queue has no human callers. Phase 2+ admin endpoints receive a CF Access policy when they ship (see OQ-2 in design doc).
-
Rollout: 5-phase plan (Phase 0 operator actions → Phase 4 guard ON prod) with two 48-hour soaks. No enforcement until Phase 4.
Consequences
Positive:
- Closes HIGH-WAF-2 from the WAF threat model.
- Queue auth endpoints (Phase 2) will be protected by CF WAF rate limits before they ship, not after.
- Consistent with the platform's established defense pattern (matches Raptor's FLAG_ENFORCE_CF_ORIGIN).
- Terraform-managed DNS and WAF rules are auditable and version-controlled.
Negative:
- Heroku-to-Heroku calls (Raptor → Queue) traverse the CF edge, adding ~5–20ms latency. This is accepted: Phase 1 billing endpoints are not latency-critical; Phase 2 JWT verification is offline (Raptor caches JWKS public key, so per-request auth does not call Queue).
- CF-Connecting-IP can be spoofed by a caller that directly targets the Heroku URL and fabricates the header. Phase 1 origin guard accepts this: the Raptor pattern uses header-only check. Phase 2 hardening adds CIDR validation against Cloudflare's published IP ranges.
Alternatives Considered
CF Tunnel (cloudflared)
cloudflared creates an outbound tunnel from the Heroku dyno to Cloudflare, eliminating the need for a public Heroku URL. Traffic flows: CF edge → tunnel → Heroku dyno (no inbound port needed).
Arguments for: Heroku origin URL would be unreachable by construction (tunnel replaces it). No need for CF-Connecting-IP guard — the only inbound path is the tunnel.
Arguments against:
- Requires cloudflared binary in the Queue Dockerfile (new non-audited dependency, adds to container image size and build time).
- Tunnel provisioning requires a CLOUDFLARE_TUNNEL_TOKEN secret (new Infisical path, new rotation concern).
- Operator must register the tunnel in the CF Zero Trust dashboard before the first deploy.
- cloudflared process management inside a Heroku dyno is non-standard. If the tunnel process crashes, the dyno becomes unreachable — restart behavior is different from a crashed web process.
- No operational precedent at Raxx. The CNAME proxy pattern is used by four other Heroku apps; the team knows how to operate it.
Verdict: Rejected. Correct for services with no public endpoint (e.g., a private DB proxy). Heroku apps have a public URL by design; the CNAME approach is simpler.
Status quo (no CF proxy)
Accepted for the billing-only Phase 1 window on the grounds that InternalAuthFilter rejects unauthenticated calls. Rejected as a permanent posture because:
- Stripe webhook endpoint (/api/v1/billing/webhook) is partially public (it must accept Stripe's delivery IPs).
- Phase 2 auth endpoints (backup-codes/redeem etc.) are password-equivalent and must have rate limiting before they ship.
- No WAF means no SQLi/XSS protection for JSON parsing paths.
Sequencing with ADR-0077 (WAF Strategy)
ADR-0077 established the 5-layer defense model for the platform. This ADR places Queue into that model at Layers 1–3. The two ADRs are sister decisions: ADR-0077 sets the framework, ADR-0078 applies it to the new service that the WAF threat model identified as uncovered.
The WAF Terraform for other surfaces (Raptor api.raxx.app, etc.) is managed by the WAF strategy rollout (separate sre-agent cards). The Queue Terraform module (terraform/queue/) is a standalone module to keep Queue's infra encapsulated and independently appliable.