ADR 0077 — Cloudflare WAF as Layer 1 of Raxx Layered Defense

Status: Accepted Date: 2026-05-11 UTC Deciders: Kristerpher (product owner), software-architect Scope: All Raxx zones — raxx.app, getraxx.com, operator surfaces Design doc: docs/architecture/waf-strategy.md Refs: ADR-0031 (surface classes), ADR-0042 (auth unification, CF Access), ADR-0051 (layered controls), raxx-app-track-b.md (origin guard)

Context

As of 2026-05-11, Raxx has Cloudflare proxying all traffic to customer and operator surfaces but no WAF ruleset, no edge rate limiting, and no bot management policy. Application-layer rate limiting exists in Flask middleware (Layer 4), and the origin guard feature flag (FLAG_ENFORCE_CF_ORIGIN) exists but is disabled because there was no edge-level guarantee that all traffic arrives through CF.

The pre-launch window is the correct moment to add WAF:

Adding WAF in log-only mode carries zero customer risk and provides 7+ days of baseline telemetry before launch.
WAF deployment is the prerequisite for safely enabling FLAG_ENFORCE_CF_ORIGIN — once all legitimate traffic is guaranteed to pass through CF edge rules, rejecting direct-Heroku requests at origin is safe.
api.raxx.app carries WebAuthn credential exchange and paper order submission, both high-value attack targets that benefit from edge-layer filtering before they reach Raptor dynos.
The Quebec registration geo-block (pre-launch compliance requirement, project_quebec_geoblock_decision.md) is most cleanly implemented as a WAF custom rule rather than application-layer middleware, since it needs to run before any session is established.

Decision

D1: Cloudflare WAF is Layer 1 of the five-layer Raxx defense stack

The defense stack (detailed in waf-strategy.md §3) is:

CF WAF (edge): managed rulesets + custom rules + rate limits
CF Access (operator surfaces): Google Workspace IDP gate
Heroku origin guard (FLAG_ENFORCE_CF_ORIGIN): reject non-CF requests at origin
Flask app middleware: per-user rate limits, audit log, RBAC enforcement
Postgres: row-level access, append-only audit chain

No single layer is load-bearing. F3 (CF outage) is the design test: app-layer must absorb abuse traffic without WAF.

D2: WAF is Terraform-managed in a new `terraform/modules/cf-waf/` module

The module exposes per-surface variables for rule actions, rate limit thresholds, allow-lists, and Logpush destination. All production WAF changes go through Terraform — no direct CF dashboard edits after initial apply, to prevent drift (ADR-0051 pattern). The module is instantiated per zone (raxx.app, getraxx.com), not as a single monolithic stack.

The CF Terraform stack for WAF (terraform/waf/) is a separate state file from the CF Access stack (terraform/cf-access/), minimizing blast radius per stack.

D3: WAF rules follow a phased rollout: log → challenge → block

WAF rules must never be deployed directly to block mode on prod. The rollout sequence (documented in waf-strategy.md §8) is:

Phase 1: log on staging (7 days, <1% false-positive gate)
Phase 2: challenge on staging (72h, zero legitimate-flow challenges)
Phase 3: block on staging (72h, confirmed attack simulations blocked)
Phase 4: log → block on prod (7-day soak per action level)
Phase 5: FLAG_ENFORCE_CF_ORIGIN flip (after Phase 4 complete)

Phase transitions require explicit operator sign-off. They are not automated.

D4: CF Access service tokens are explicitly skip-listed from bot management

WAF (Layer 1) runs before CF Access (Layer 2). Machine callers (Velvet, CI runners, internal automation) use CF Access service tokens with no browser fingerprint, which produces high bot scores. A custom WAF skip rule matches cf-access-client-id header presence and exempts those requests from bot challenges. This is the designated bypass — not a general bot rule weakening.

D5: Webhook callers (Postmark, future Stripe) use IP allow-list bypass + app-layer HMAC verification

Third-party webhook sources cannot use CF Access service tokens. They are allowed through WAF via an IP-range-based skip rule (managed as a Terraform variable). App-layer HMAC signature verification is the actual trust gate — WAF bypass is a performance optimization only. If Postmark IP ranges rotate without notice (Failure Mode F5), the WAF passes the request and app-layer signature verification rejects invalid payloads. This two-gate model ensures webhook processing is resilient to IP range drift.

D6: `FLAG_ENFORCE_CF_ORIGIN` is enabled only after Phase 4 WAF block mode is soaking on prod

This sequencing is an explicit decision, not an implementation detail. Flipping the origin guard before WAF block mode is established would give false confidence: without WAF in block mode, an attacker could rotate exit nodes to route around rate limits and still reach Raptor directly via CF-proxied paths.

After Phase 4 block mode has soaked for 7 days, the edge is enforcing coarse rate limits and attack signatures. At that point, origin guard adds meaningful defense: any request reaching raxx-api-prod.herokuapp.com directly is a bypass attempt, and Raptor should reject it.

Logpush field list (waf-strategy.md §4 logpush.tf) explicitly excludes ClientRequestBody and ClientRequestHeaders["cookie"]. These fields may contain WebAuthn attestation objects and session tokens respectively. Exporting them to an external destination (S3 or otherwise) would constitute credential-adjacent data exfiltration, violating ADR-0002 (no stored credentials). This exclusion is non-negotiable and must be verified in the terraform plan diff when the Logpush job is first created.

Consequences

Positive

Attack signatures, OWASP Top-10 patterns, and credential-stuffing patterns are caught at the edge before consuming Raptor dynos.
The Quebec registration geo-block (pre-launch compliance) is implemented cleanly at Layer 1 — no Flask middleware change needed for QC block.
FLAG_ENFORCE_CF_ORIGIN can be safely enabled after Phase 4, completing the origin hardening story from Track B.
WAF Logpush creates an audit record of every edge-blocked request, which feeds into the layered audit trail.
Phased rollout means zero customer risk during the log-only phase, which spans the launch window.

Negative / risks

False positive risk on OWASP rules for JSON API. OWASP CRS applies rules designed for form-based web apps; some rules trigger on valid JSON payloads with certain field names. Mitigation: owasp_sensitivity = "low" on the api.raxx.app zone, plus a 7-day log-only soak with explicit false-positive rate gate.
Terraform WAF management adds operational complexity. Every WAF change requires a Terraform PR and apply. This is intentional (drift prevention) but slower than dashboard edits. The emergency escape hatch is rate_limit_action = "simulate" (single-variable rollback).
Postmark IP range drift is an ongoing operational dependency. The postmark_ip_list variable must be updated when Postmark publishes IP range changes. This is managed manually; a Velvet-style subscription model for third-party IP ranges is out of scope for v1.
CF outage reveals app-layer weaknesses. Flask rate limiting is effective but not as hardened as CF's Anycast network. During a CF outage, Raptor dynos are directly exposed to the internet. This is an accepted residual risk — the origin guard (once enabled) provides some protection by rejecting non-CF traffic, but Heroku's own edge is the fallback.

Neutral

This decision does not change ADR-0001 (passkeys), ADR-0002 (no stored credentials), or ADR-0003 (GDPR). WAF is an additive perimeter layer.
CF Access (ADR-0042) is unaffected. WAF and CF Access are complementary, not overlapping in function.
AWS WAF on the email delivery API Gateway is a separate decision tracked in SC-WAF-08.

Alternatives Considered

Alternative A: App-layer-only defense (no CF WAF)

Rejected. App-layer rate limiting is per-user/session — it requires the request to reach Raptor. Volumetric attacks consume dyno capacity even when rate-limited at the app layer. CF WAF's edge-level filtering absorbs volumetric attacks before they consume origin resources. Additionally, OWASP signature matching at the app layer would require maintaining a WAF library in the Python codebase, adding ongoing CVE surface.

Alternative B: AWS WAF on API Gateway only

Rejected as the primary WAF. The email delivery API Gateway is not the customer-facing surface — api.raxx.app (CF-proxied) is. AWS WAF on a CF-fronted Heroku app would not intercept traffic at the edge; it would add an extra hop after Heroku receives it, which is not where volumetric attacks need to be absorbed. AWS WAF is appropriate for the bare API Gateway surface (SC-WAF-08 evaluation).

Alternative C: CF WAF managed via CF dashboard (no Terraform)

Rejected. Dashboard-driven WAF changes produce drift (ADR-0051 incident precedent: lockout on 2026-05-04 from dashboard drift on CF Access). Without Terraform state, rollback requires re-entering every rule manually. Drift detection (ADR-0051 Layer B/C pattern) cannot extend to WAF rules unless they are in a state-managed system.

Alternative D: Deploy WAF in block mode immediately on staging

Rejected. Log-only phase is not caution theater — it produces the false-positive baseline data needed to calibrate thresholds and OWASP sensitivity before any customer-visible action (challenge or block) is taken. The 7-day log-only soak is a gate criterion, not a suggestion. Skipping it risks blocking legitimate traffic at launch.

References

docs/architecture/waf-strategy.md — full design doc
[ADR-0002](https://internal-docs.raxx.app/architecture/adr/0002-no-stored-credentials.html) — no stored credentials (applies to Logpush field exclusions)
[ADR-0003](https://internal-docs.raxx.app/architecture/adr/0003-gdpr-by-default.html) — GDPR by default (applies to WAF log PII)
[ADR-0031](https://internal-docs.raxx.app/architecture/adr/0031-platform-auth-posture.html) — surface classes
[ADR-0042](https://internal-docs.raxx.app/architecture/adr/0042-auth-unification-hybrid-model.html) — CF Access service tokens, hybrid auth
[ADR-0051](https://internal-docs.raxx.app/architecture/adr/0051-drift-prevention-layered-controls.html) — layered controls, Terraform-only change policy
raxx-app-track-b.md — FLAG_ENFORCE_CF_ORIGIN context
docs/security/waf-threat-model-2026-05-12.md — security-agent threat model (cross-reference when landed)
Cloudflare WAF Managed Rules documentation
OWASP Core Rule Set documentation
Project memory: reference_cloudflare_tokens.md, feedback_aws_workloads_use_ssm_not_vault.md