Raxx · internal docs

internal · gated

ADR 0077 — Cloudflare WAF as Layer 1 of Raxx Layered Defense

Status: Accepted Date: 2026-05-11 UTC Deciders: Kristerpher (product owner), software-architect Scope: All Raxx zones — raxx.app, getraxx.com, operator surfaces Design doc: docs/architecture/waf-strategy.md Refs: ADR-0031 (surface classes), ADR-0042 (auth unification, CF Access), ADR-0051 (layered controls), raxx-app-track-b.md (origin guard)


Context

As of 2026-05-11, Raxx has Cloudflare proxying all traffic to customer and operator surfaces but no WAF ruleset, no edge rate limiting, and no bot management policy. Application-layer rate limiting exists in Flask middleware (Layer 4), and the origin guard feature flag (FLAG_ENFORCE_CF_ORIGIN) exists but is disabled because there was no edge-level guarantee that all traffic arrives through CF.

The pre-launch window is the correct moment to add WAF:

  1. Adding WAF in log-only mode carries zero customer risk and provides 7+ days of baseline telemetry before launch.
  2. WAF deployment is the prerequisite for safely enabling FLAG_ENFORCE_CF_ORIGIN — once all legitimate traffic is guaranteed to pass through CF edge rules, rejecting direct-Heroku requests at origin is safe.
  3. api.raxx.app carries WebAuthn credential exchange and paper order submission, both high-value attack targets that benefit from edge-layer filtering before they reach Raptor dynos.
  4. The Quebec registration geo-block (pre-launch compliance requirement, project_quebec_geoblock_decision.md) is most cleanly implemented as a WAF custom rule rather than application-layer middleware, since it needs to run before any session is established.

Decision

D1: Cloudflare WAF is Layer 1 of the five-layer Raxx defense stack

The defense stack (detailed in waf-strategy.md §3) is:

  1. CF WAF (edge): managed rulesets + custom rules + rate limits
  2. CF Access (operator surfaces): Google Workspace IDP gate
  3. Heroku origin guard (FLAG_ENFORCE_CF_ORIGIN): reject non-CF requests at origin
  4. Flask app middleware: per-user rate limits, audit log, RBAC enforcement
  5. Postgres: row-level access, append-only audit chain

No single layer is load-bearing. F3 (CF outage) is the design test: app-layer must absorb abuse traffic without WAF.

D2: WAF is Terraform-managed in a new terraform/modules/cf-waf/ module

The module exposes per-surface variables for rule actions, rate limit thresholds, allow-lists, and Logpush destination. All production WAF changes go through Terraform — no direct CF dashboard edits after initial apply, to prevent drift (ADR-0051 pattern). The module is instantiated per zone (raxx.app, getraxx.com), not as a single monolithic stack.

The CF Terraform stack for WAF (terraform/waf/) is a separate state file from the CF Access stack (terraform/cf-access/), minimizing blast radius per stack.

D3: WAF rules follow a phased rollout: log → challenge → block

WAF rules must never be deployed directly to block mode on prod. The rollout sequence (documented in waf-strategy.md §8) is:

Phase transitions require explicit operator sign-off. They are not automated.

D4: CF Access service tokens are explicitly skip-listed from bot management

WAF (Layer 1) runs before CF Access (Layer 2). Machine callers (Velvet, CI runners, internal automation) use CF Access service tokens with no browser fingerprint, which produces high bot scores. A custom WAF skip rule matches cf-access-client-id header presence and exempts those requests from bot challenges. This is the designated bypass — not a general bot rule weakening.

D5: Webhook callers (Postmark, future Stripe) use IP allow-list bypass + app-layer HMAC verification

Third-party webhook sources cannot use CF Access service tokens. They are allowed through WAF via an IP-range-based skip rule (managed as a Terraform variable). App-layer HMAC signature verification is the actual trust gate — WAF bypass is a performance optimization only. If Postmark IP ranges rotate without notice (Failure Mode F5), the WAF passes the request and app-layer signature verification rejects invalid payloads. This two-gate model ensures webhook processing is resilient to IP range drift.

D6: FLAG_ENFORCE_CF_ORIGIN is enabled only after Phase 4 WAF block mode is soaking on prod

This sequencing is an explicit decision, not an implementation detail. Flipping the origin guard before WAF block mode is established would give false confidence: without WAF in block mode, an attacker could rotate exit nodes to route around rate limits and still reach Raptor directly via CF-proxied paths.

After Phase 4 block mode has soaked for 7 days, the edge is enforcing coarse rate limits and attack signatures. At that point, origin guard adds meaningful defense: any request reaching raxx-api-prod.herokuapp.com directly is a bypass attempt, and Raptor should reject it.

Logpush field list (waf-strategy.md §4 logpush.tf) explicitly excludes ClientRequestBody and ClientRequestHeaders["cookie"]. These fields may contain WebAuthn attestation objects and session tokens respectively. Exporting them to an external destination (S3 or otherwise) would constitute credential-adjacent data exfiltration, violating ADR-0002 (no stored credentials). This exclusion is non-negotiable and must be verified in the terraform plan diff when the Logpush job is first created.


Consequences

Positive

Negative / risks

Neutral


Alternatives Considered

Alternative A: App-layer-only defense (no CF WAF)

Rejected. App-layer rate limiting is per-user/session — it requires the request to reach Raptor. Volumetric attacks consume dyno capacity even when rate-limited at the app layer. CF WAF's edge-level filtering absorbs volumetric attacks before they consume origin resources. Additionally, OWASP signature matching at the app layer would require maintaining a WAF library in the Python codebase, adding ongoing CVE surface.

Alternative B: AWS WAF on API Gateway only

Rejected as the primary WAF. The email delivery API Gateway is not the customer-facing surface — api.raxx.app (CF-proxied) is. AWS WAF on a CF-fronted Heroku app would not intercept traffic at the edge; it would add an extra hop after Heroku receives it, which is not where volumetric attacks need to be absorbed. AWS WAF is appropriate for the bare API Gateway surface (SC-WAF-08 evaluation).

Alternative C: CF WAF managed via CF dashboard (no Terraform)

Rejected. Dashboard-driven WAF changes produce drift (ADR-0051 incident precedent: lockout on 2026-05-04 from dashboard drift on CF Access). Without Terraform state, rollback requires re-entering every rule manually. Drift detection (ADR-0051 Layer B/C pattern) cannot extend to WAF rules unless they are in a state-managed system.

Alternative D: Deploy WAF in block mode immediately on staging

Rejected. Log-only phase is not caution theater — it produces the false-positive baseline data needed to calibrate thresholds and OWASP sensitivity before any customer-visible action (challenge or block) is taken. The 7-day log-only soak is a gate criterion, not a suggestion. Skipping it risks blocking legitimate traffic at launch.


References