ADR 0077 — Cloudflare WAF as Layer 1 of Raxx Layered Defense
Status: Accepted
Date: 2026-05-11 UTC
Deciders: Kristerpher (product owner), software-architect
Scope: All Raxx zones — raxx.app, getraxx.com, operator surfaces
Design doc: docs/architecture/waf-strategy.md
Refs: ADR-0031 (surface classes), ADR-0042 (auth unification, CF Access), ADR-0051 (layered controls), raxx-app-track-b.md (origin guard)
Context
As of 2026-05-11, Raxx has Cloudflare proxying all traffic to customer and operator surfaces but no WAF ruleset, no edge rate limiting, and no bot management policy. Application-layer rate limiting exists in Flask middleware (Layer 4), and the origin guard feature flag (FLAG_ENFORCE_CF_ORIGIN) exists but is disabled because there was no edge-level guarantee that all traffic arrives through CF.
The pre-launch window is the correct moment to add WAF:
- Adding WAF in log-only mode carries zero customer risk and provides 7+ days of baseline telemetry before launch.
- WAF deployment is the prerequisite for safely enabling
FLAG_ENFORCE_CF_ORIGIN— once all legitimate traffic is guaranteed to pass through CF edge rules, rejecting direct-Heroku requests at origin is safe. api.raxx.appcarries WebAuthn credential exchange and paper order submission, both high-value attack targets that benefit from edge-layer filtering before they reach Raptor dynos.- The Quebec registration geo-block (pre-launch compliance requirement,
project_quebec_geoblock_decision.md) is most cleanly implemented as a WAF custom rule rather than application-layer middleware, since it needs to run before any session is established.
Decision
D1: Cloudflare WAF is Layer 1 of the five-layer Raxx defense stack
The defense stack (detailed in waf-strategy.md §3) is:
- CF WAF (edge): managed rulesets + custom rules + rate limits
- CF Access (operator surfaces): Google Workspace IDP gate
- Heroku origin guard (
FLAG_ENFORCE_CF_ORIGIN): reject non-CF requests at origin - Flask app middleware: per-user rate limits, audit log, RBAC enforcement
- Postgres: row-level access, append-only audit chain
No single layer is load-bearing. F3 (CF outage) is the design test: app-layer must absorb abuse traffic without WAF.
D2: WAF is Terraform-managed in a new terraform/modules/cf-waf/ module
The module exposes per-surface variables for rule actions, rate limit thresholds, allow-lists, and Logpush destination. All production WAF changes go through Terraform — no direct CF dashboard edits after initial apply, to prevent drift (ADR-0051 pattern). The module is instantiated per zone (raxx.app, getraxx.com), not as a single monolithic stack.
The CF Terraform stack for WAF (terraform/waf/) is a separate state file from the CF Access stack (terraform/cf-access/), minimizing blast radius per stack.
D3: WAF rules follow a phased rollout: log → challenge → block
WAF rules must never be deployed directly to block mode on prod. The rollout sequence (documented in waf-strategy.md §8) is:
- Phase 1:
logon staging (7 days, <1% false-positive gate) - Phase 2:
challengeon staging (72h, zero legitimate-flow challenges) - Phase 3:
blockon staging (72h, confirmed attack simulations blocked) - Phase 4:
log→blockon prod (7-day soak per action level) - Phase 5:
FLAG_ENFORCE_CF_ORIGINflip (after Phase 4 complete)
Phase transitions require explicit operator sign-off. They are not automated.
D4: CF Access service tokens are explicitly skip-listed from bot management
WAF (Layer 1) runs before CF Access (Layer 2). Machine callers (Velvet, CI runners, internal automation) use CF Access service tokens with no browser fingerprint, which produces high bot scores. A custom WAF skip rule matches cf-access-client-id header presence and exempts those requests from bot challenges. This is the designated bypass — not a general bot rule weakening.
D5: Webhook callers (Postmark, future Stripe) use IP allow-list bypass + app-layer HMAC verification
Third-party webhook sources cannot use CF Access service tokens. They are allowed through WAF via an IP-range-based skip rule (managed as a Terraform variable). App-layer HMAC signature verification is the actual trust gate — WAF bypass is a performance optimization only. If Postmark IP ranges rotate without notice (Failure Mode F5), the WAF passes the request and app-layer signature verification rejects invalid payloads. This two-gate model ensures webhook processing is resilient to IP range drift.
D6: FLAG_ENFORCE_CF_ORIGIN is enabled only after Phase 4 WAF block mode is soaking on prod
This sequencing is an explicit decision, not an implementation detail. Flipping the origin guard before WAF block mode is established would give false confidence: without WAF in block mode, an attacker could rotate exit nodes to route around rate limits and still reach Raptor directly via CF-proxied paths.
After Phase 4 block mode has soaked for 7 days, the edge is enforcing coarse rate limits and attack signatures. At that point, origin guard adds meaningful defense: any request reaching raxx-api-prod.herokuapp.com directly is a bypass attempt, and Raptor should reject it.
D7: WAF Logpush excludes request body and cookie headers — always
Logpush field list (waf-strategy.md §4 logpush.tf) explicitly excludes ClientRequestBody and ClientRequestHeaders["cookie"]. These fields may contain WebAuthn attestation objects and session tokens respectively. Exporting them to an external destination (S3 or otherwise) would constitute credential-adjacent data exfiltration, violating ADR-0002 (no stored credentials). This exclusion is non-negotiable and must be verified in the terraform plan diff when the Logpush job is first created.
Consequences
Positive
- Attack signatures, OWASP Top-10 patterns, and credential-stuffing patterns are caught at the edge before consuming Raptor dynos.
- The Quebec registration geo-block (pre-launch compliance) is implemented cleanly at Layer 1 — no Flask middleware change needed for QC block.
FLAG_ENFORCE_CF_ORIGINcan be safely enabled after Phase 4, completing the origin hardening story from Track B.- WAF Logpush creates an audit record of every edge-blocked request, which feeds into the layered audit trail.
- Phased rollout means zero customer risk during the log-only phase, which spans the launch window.
Negative / risks
- False positive risk on OWASP rules for JSON API. OWASP CRS applies rules designed for form-based web apps; some rules trigger on valid JSON payloads with certain field names. Mitigation:
owasp_sensitivity = "low"on theapi.raxx.appzone, plus a 7-day log-only soak with explicit false-positive rate gate. - Terraform WAF management adds operational complexity. Every WAF change requires a Terraform PR and apply. This is intentional (drift prevention) but slower than dashboard edits. The emergency escape hatch is
rate_limit_action = "simulate"(single-variable rollback). - Postmark IP range drift is an ongoing operational dependency. The
postmark_ip_listvariable must be updated when Postmark publishes IP range changes. This is managed manually; a Velvet-style subscription model for third-party IP ranges is out of scope for v1. - CF outage reveals app-layer weaknesses. Flask rate limiting is effective but not as hardened as CF's Anycast network. During a CF outage, Raptor dynos are directly exposed to the internet. This is an accepted residual risk — the origin guard (once enabled) provides some protection by rejecting non-CF traffic, but Heroku's own edge is the fallback.
Neutral
- This decision does not change ADR-0001 (passkeys), ADR-0002 (no stored credentials), or ADR-0003 (GDPR). WAF is an additive perimeter layer.
- CF Access (ADR-0042) is unaffected. WAF and CF Access are complementary, not overlapping in function.
- AWS WAF on the email delivery API Gateway is a separate decision tracked in SC-WAF-08.
Alternatives Considered
Alternative A: App-layer-only defense (no CF WAF)
Rejected. App-layer rate limiting is per-user/session — it requires the request to reach Raptor. Volumetric attacks consume dyno capacity even when rate-limited at the app layer. CF WAF's edge-level filtering absorbs volumetric attacks before they consume origin resources. Additionally, OWASP signature matching at the app layer would require maintaining a WAF library in the Python codebase, adding ongoing CVE surface.
Alternative B: AWS WAF on API Gateway only
Rejected as the primary WAF. The email delivery API Gateway is not the customer-facing surface — api.raxx.app (CF-proxied) is. AWS WAF on a CF-fronted Heroku app would not intercept traffic at the edge; it would add an extra hop after Heroku receives it, which is not where volumetric attacks need to be absorbed. AWS WAF is appropriate for the bare API Gateway surface (SC-WAF-08 evaluation).
Alternative C: CF WAF managed via CF dashboard (no Terraform)
Rejected. Dashboard-driven WAF changes produce drift (ADR-0051 incident precedent: lockout on 2026-05-04 from dashboard drift on CF Access). Without Terraform state, rollback requires re-entering every rule manually. Drift detection (ADR-0051 Layer B/C pattern) cannot extend to WAF rules unless they are in a state-managed system.
Alternative D: Deploy WAF in block mode immediately on staging
Rejected. Log-only phase is not caution theater — it produces the false-positive baseline data needed to calibrate thresholds and OWASP sensitivity before any customer-visible action (challenge or block) is taken. The 7-day log-only soak is a gate criterion, not a suggestion. Skipping it risks blocking legitimate traffic at launch.
References
- docs/architecture/waf-strategy.md — full design doc
- [ADR-0002](https://internal-docs.raxx.app/architecture/adr/0002-no-stored-credentials.html) — no stored credentials (applies to Logpush field exclusions)
- [ADR-0003](https://internal-docs.raxx.app/architecture/adr/0003-gdpr-by-default.html) — GDPR by default (applies to WAF log PII)
- [ADR-0031](https://internal-docs.raxx.app/architecture/adr/0031-platform-auth-posture.html) — surface classes
- [ADR-0042](https://internal-docs.raxx.app/architecture/adr/0042-auth-unification-hybrid-model.html) — CF Access service tokens, hybrid auth
- [ADR-0051](https://internal-docs.raxx.app/architecture/adr/0051-drift-prevention-layered-controls.html) — layered controls, Terraform-only change policy
- raxx-app-track-b.md —
FLAG_ENFORCE_CF_ORIGINcontext docs/security/waf-threat-model-2026-05-12.md— security-agent threat model (cross-reference when landed)- Cloudflare WAF Managed Rules documentation
- OWASP Core Rule Set documentation
- Project memory:
reference_cloudflare_tokens.md,feedback_aws_workloads_use_ssm_not_vault.md