CF WAF Layered Defense Strategy

Status: Draft Date: 2026-05-11 UTC Owner: software-architect ADR: 0077 Refs: - [ADR-0031](https://internal-docs.raxx.app/architecture/adr/0031-platform-auth-posture.html) — surface classes - [ADR-0042](https://internal-docs.raxx.app/architecture/adr/0042-auth-unification-hybrid-model.html) — auth posture, CF Access service tokens - [ADR-0051](https://internal-docs.raxx.app/architecture/adr/0051-drift-prevention-layered-controls.html) — layered controls pattern - raxx-app-track-b.md — FLAG_ENFORCE_CF_ORIGIN origin guard context - docs/security/waf-threat-model-2026-05-12.md — security-agent threat model (cross-reference when landed) - Existing Terraform: terraform/modules/cf-access-getraxx/, terraform/cf-access/

1. Context

Raxx exposes three zone-level perimeters to the internet:

Zone	Primary surface	Surface class (ADR-0031)
`raxx.app`	Antlers SPA + `api.raxx.app` (Raptor)	Class 1 (customer-facing)
`getraxx.com`	Marketing / pre-launch site	Class 1 (pre-launch CF-Access gated)
Operator surfaces (`console.raxx.app`, `vault.raxx.app`, `tickets.raxx.app`)	Console, Infisical vault, FreeScout	Class 2/3 (operator, CF Access gated)

As of 2026-05-11, the edge has Cloudflare proxying, CF Access for operator surfaces, and a feature-flagged origin guard (FLAG_ENFORCE_CF_ORIGIN, currently OFF on raxx-api-prod). There is no CF WAF ruleset in place. Application-layer rate limiting exists in the Flask middleware, but the edge applies no coarse rate limiting, no bot management, no OWASP ruleset, and no geo-blocking beyond what CF Access provides on gated surfaces.

This gap is pre-launch-critical because:

api.raxx.app carries order-submission and WebAuthn credential exchange — both attract credential stuffing and brute-force attacks.
WAF deployment is the moment to safely flip FLAG_ENFORCE_CF_ORIGIN ON: if all legitimate traffic arrives through CF, rejecting direct-Heroku requests adds a hard layer with zero customer impact.
The paper-first gate (invariant) makes execution safety a design constraint, not just a post-launch concern.

2. Invariants

All project-level invariants apply to this design. The ones most relevant to WAF:

No stored credentials. WAF rules must not log request bodies that may contain WebAuthn attestation objects, auth tokens, or session cookies. Logpush must redact these fields at export time.
Paper-first gating. WAF is a perimeter control, not a substitute for execution safety. The paper-mode gate in Raptor is preserved regardless of WAF state. WAF failing open is acceptable (see F3); order-submission paths must still be paper-gated server-side.
Audit trail. Every WAF-triggered action (block, challenge, rate-limit) that touches an authenticated session must be attributable to an IP and timestamp. CF Logpush satisfies this requirement for edge events.
Credentials into infra, not code. CF API tokens used by Terraform are read from Infisical at apply time — never hardcoded. Logpush destination credentials (S3, etc.) live in SSM (per feedback_aws_workloads_use_ssm_not_vault.md).
GDPR by default. WAF logs carry IP addresses (PII). Retention period must be bounded. Logpush destination is DPA-ready. Breach notification applies if WAF log storage is compromised.
Security is a design constraint. WAF rules are designed in this document before implementation. Feature-developer does not make WAF policy choices; they implement the Terraform module against this spec.

3. Layered Defense Architecture

Internet (attacker / legitimate traffic)
        |
        v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 1: Cloudflare Edge — WAF + Bot Management + Rate Limit   │
│  • OWASP Core Rule Set (managed)                                │
│  • CF Managed Ruleset (CVE/threat-intel)                        │
│  • Custom rules: geo-block QC signup, webhook bypass            │
│  • Bot Fight Mode (configurable challenge threshold)             │
│  • Per-surface coarse rate limits (per-IP)                      │
│  • Log-only → challenge → block per rollout phase               │
└────────────────────────┬────────────────────────────────────────┘
                         │ (only non-blocked traffic passes)
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 2: CF Access — operator surfaces only                    │
│  • Class 2/3 surfaces: Google Workspace IDP + MFA gate          │
│  • Class 1 (raxx.app, getraxx.com): no CF Access gate           │
│  • Service tokens for machine callers (Velvet, CI)              │
│  NOTE: WAF runs before CF Access. Service tokens must be on     │
│  WAF skip-list to avoid bot-rule false positives.               │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 3: Heroku origin guard — FLAG_ENFORCE_CF_ORIGIN          │
│  • Raptor rejects requests lacking CF-Connecting-IP header      │
│  • Flip to ON only after WAF Phase 4 soak (see §8)             │
│  • Console (raxx-console-*) same pattern                        │
│  • Direct-Heroku URL access blocked at origin                   │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 4: App-layer — Flask middleware                          │
│  • Fine-grained rate limiting: per-user / per-session           │
│  • Idempotency keys on order submission                          │
│  • Audit log: every state change                                │
│  • WebAuthn RP origin validation (FIDO2 spec invariant)         │
│  • RBAC enforcement (Queue)                                     │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 5: Data — Postgres + RBAC                                │
│  • Row-level access checks via RBAC model ([ADR-0020](https://internal-docs.raxx.app/architecture/adr/0020-branch-promotion-soak-gate.html))            │
│  • Append-only audit table (KMS hash chain — [ADR-0022](https://internal-docs.raxx.app/architecture/adr/0022-event-log-append-only-hash-chain.html))          │
│  • No plaintext credentials ever persisted                      │
└─────────────────────────────────────────────────────────────────┘

What each layer catches and what passes through

Layer	Catches	Passes through (load on next layer)
1 — CF WAF	Known attack signatures, OWASP Top-10 patterns, volumetric attacks, geo-restricted signup, QC block, bot crawlers, webhook IP spoofing	Legitimate traffic, novel attacks not in managed rulesets, highly distributed low-volume attacks
2 — CF Access	Unauthenticated operator-surface access, non-allowlisted Google accounts	Machine callers with valid service tokens, authenticated operators
3 — Origin guard	Direct-to-Heroku requests bypassing CF edge (once flag is ON)	All CF-proxied traffic (the flag gates this layer)
4 — App middleware	Per-user abuse, session-level rate limits, malformed auth payloads, CSRF	Authenticated requests that pass rate limits
5 — DB	Privilege escalation attempts, unauthorized row access	Authorized data reads and writes

Design invariant: no single layer is load-bearing. F3 (CF outage) is the canonical test: Layer 4 rate limiting must absorb abuse-level traffic if Layer 1 disappears, even if it serves a degraded experience.

4. Terraform Module Design: `terraform/modules/cf-waf/`

Module structure

terraform/modules/cf-waf/
├── main.tf          # cloudflare_ruleset resources
├── rate_limits.tf   # cloudflare_rate_limit per surface
├── zone_settings.tf # cloudflare_zone_setting (security level, BIC, etc.)
├── logpush.tf       # cloudflare_logpush_job destination
├── variables.tf
├── outputs.tf
└── versions.tf

Resources

`main.tf` — Managed rulesets and custom rules

# Phase: http_request_firewall_managed — OWASP + CF Managed
resource "cloudflare_ruleset" "managed_waf" {
  zone_id     = var.zone_id
  name        = "Managed WAF rules — ${var.surface_name}"
  description = "OWASP CRS + CF Managed Ruleset for ${var.surface_name}"
  kind        = "zone"
  phase       = "http_request_firewall_managed"

  rules {
    # CF Managed Ruleset
    action = var.managed_ruleset_action   # "log" | "block"
    action_parameters { id = "efb7b8c949ac4650a09736fc376e9aee" }
    expression  = "true"
    description = "CF Managed Ruleset"
    enabled     = true
  }

  rules {
    # OWASP Core Rule Set
    action = var.owasp_action   # "log" | "block"
    action_parameters { id = "4814384a9e5d4991b9815dcfc25d2f1f" }
    expression  = "true"
    description = "OWASP Core Rule Set"
    enabled     = true
    action_parameters {
      overrides {
        # OWASP sensitivity: low for API surfaces, medium for SPA
        sensitivity_level = var.owasp_sensitivity
      }
    }
  }
}

# Phase: http_request_firewall_custom — custom rules per surface
resource "cloudflare_ruleset" "custom_waf" {
  zone_id     = var.zone_id
  name        = "Custom WAF rules — ${var.surface_name}"
  description = "Custom rules for ${var.surface_name}"
  kind        = "zone"
  phase       = "http_request_firewall_custom"

  # Rule 1: Postmark webhook bypass — allow known Postmark IPs before bot rules
  rules {
    action      = "skip"
    action_parameters { ruleset = "current" }
    expression  = "(ip.src in ${var.postmark_ip_list} and http.request.uri.path eq \"/api/webhooks/postmark\")"
    description = "Postmark inbound: skip WAF for known Postmark delivery IPs"
    enabled     = true
  }

  # Rule 2: CF Access service tokens — skip bot rules for machine callers
  rules {
    action      = "skip"
    action_parameters { rulesets = ["managed_bot_challenge"] }
    expression  = "(http.request.headers[\"cf-access-client-id\"] ne \"\")"
    description = "CF Access service tokens: skip bot challenge"
    enabled     = true
  }

  # Rule 3: Quebec signup geo-block (pre-launch blocker — ADR project_quebec_geoblock_decision)
  rules {
    action      = "block"
    expression  = "(ip.geoip.subdivision_1_iso_code eq \"QC\" and http.request.uri.path contains \"/api/auth/register\")"
    description = "Quebec: block registration (Bill 96 compliance gate)"
    enabled     = var.enable_qc_block
  }

  # Rule 4: Block requests to *.herokuapp.com direct URL (belt-and-suspenders for origin guard)
  rules {
    action      = "block"
    expression  = "(http.host matches r\".*\\.herokuapp\\.com\")"
    description = "Block direct Heroku URL access (belt-and-suspenders)"
    enabled     = true
  }

  # Rule 5: Path-based rate-limit trigger for auth endpoints (see rate_limits.tf)
  rules {
    action      = var.auth_challenge_action   # "log" | "managed_challenge" | "block"
    expression  = "(http.request.uri.path contains \"/api/auth/\" and cf.threat_score gt 30)"
    description = "Elevated threat score on auth paths: challenge"
    enabled     = true
  }
}

`rate_limits.tf`

# Coarse per-IP rate limit — all surfaces
resource "cloudflare_rate_limit" "global" {
  zone_id   = var.zone_id
  threshold = var.global_rate_limit_threshold   # e.g. 500 req/10s per IP
  period    = 10
  match {
    request { url_pattern = "${var.zone_hostname}/*" }
  }
  action {
    mode    = var.rate_limit_action   # "simulate" | "challenge" | "ban"
    timeout = 60
  }
  description = "Global coarse rate limit — ${var.surface_name}"
}

# Auth path rate limit — tighter for /api/auth/*
resource "cloudflare_rate_limit" "auth_endpoints" {
  zone_id   = var.zone_id
  threshold = var.auth_rate_limit_threshold   # e.g. 20 req/60s per IP
  period    = 60
  match {
    request { url_pattern = "${var.zone_hostname}/api/auth/*" }
  }
  action {
    mode    = var.rate_limit_action
    timeout = 300
  }
  description = "Auth endpoints rate limit — ${var.surface_name}"
}

# Order submission rate limit — protect paper-trade paths
resource "cloudflare_rate_limit" "order_submission" {
  zone_id   = var.zone_id
  threshold = var.order_rate_limit_threshold   # e.g. 10 req/60s per IP
  period    = 60
  match {
    request {
      url_pattern = "${var.zone_hostname}/api/trading/orders"
      methods     = ["POST"]
    }
  }
  action {
    mode    = var.rate_limit_action
    timeout = 300
  }
  description = "Order submission rate limit — ${var.surface_name}"
}

`zone_settings.tf`

resource "cloudflare_zone_setting" "security_level" {
  zone_id = var.zone_id
  setting = "security_level"
  value   = var.security_level   # "medium" | "high" | "under_attack"
}

resource "cloudflare_zone_setting" "browser_check" {
  zone_id = var.zone_id
  setting = "browser_check"
  value   = "on"
}

resource "cloudflare_zone_setting" "challenge_ttl" {
  zone_id = var.zone_id
  setting = "challenge_ttl"
  value   = var.challenge_ttl_seconds   # operator decision: 1800 (30m) default
}

`logpush.tf`

resource "cloudflare_logpush_job" "waf_events" {
  zone_id          = var.zone_id
  name             = "waf-events-${var.surface_name}"
  destination_conf = var.logpush_destination_conf   # "s3://<bucket>/<path>?..."

  dataset          = "http_requests"
  filter           = "{\"where\":{\"and\":[{\"key\":\"FirewallMatchesActions\",\"operator\":\"!empty\"}]}}"

  # Fields: timestamp, ClientIP, ClientRequestHost, ClientRequestURI,
  #         ClientRequestMethod, FirewallMatchesActions, FirewallMatchesRuleIDs,
  #         EdgeResponseStatus
  # Redacted: ClientRequestBody (never exported — may contain WebAuthn data)
  # Redacted: ClientRequestHeaders["cookie"] (session tokens)
  fields = join(",", [
    "Datetime", "ClientIP", "ClientRequestHost",
    "ClientRequestURI", "ClientRequestMethod",
    "FirewallMatchesActions", "FirewallMatchesRuleIDs",
    "EdgeResponseStatus", "BotScore", "BotScoreSrc",
    "ClientASN", "ClientCountry"
  ])

  enabled = true
}

Variables the module exposes per surface

Variable	Type	Purpose
`zone_id`	string	CF zone ID
`surface_name`	string	Human label (e.g. "raxx-app", "getraxx")
`zone_hostname`	string	Zone apex hostname
`managed_ruleset_action`	string	"log" / "block"
`owasp_action`	string	"log" / "block"
`owasp_sensitivity`	string	"low" / "medium" / "high"
`auth_challenge_action`	string	"log" / "managed_challenge" / "block"
`global_rate_limit_threshold`	number	Requests per 10s per IP
`auth_rate_limit_threshold`	number	Requests per 60s per IP on auth paths
`order_rate_limit_threshold`	number	Requests per 60s per IP on order paths
`rate_limit_action`	string	"simulate" / "challenge" / "ban"
`security_level`	string	CF security level
`challenge_ttl_seconds`	number	Challenge passage lifetime
`postmark_ip_list`	list(string)	Postmark delivery IP ranges
`enable_qc_block`	bool	Enable Quebec registration block
`logpush_destination_conf`	string	Logpush destination URI (from SSM)
`bot_fight_mode`	string	"off" / "on" / "super" — operator decision

Outputs

output "managed_waf_ruleset_id"  { value = cloudflare_ruleset.managed_waf.id }
output "custom_waf_ruleset_id"   { value = cloudflare_ruleset.custom_waf.id }
output "logpush_job_id"          { value = cloudflare_logpush_job.waf_events.id }
output "rate_limit_global_id"    { value = cloudflare_rate_limit.global.id }
output "rate_limit_auth_id"      value = cloudflare_rate_limit.auth_endpoints.id }
output "rate_limit_orders_id"    { value = cloudflare_rate_limit.order_submission.id }

Per-zone instantiation

The module is instantiated twice in terraform/waf/main.tf:

module "waf_raxx_app" {
  source = "../modules/cf-waf"
  zone_id      = var.raxx_app_zone_id
  surface_name = "raxx-app"
  zone_hostname = "raxx.app"
  owasp_sensitivity        = "low"    # API surface — low reduces false positives on JSON bodies
  managed_ruleset_action   = "log"    # start log-only (Phase 1)
  auth_challenge_action    = "log"
  rate_limit_action        = "simulate"
  enable_qc_block          = true
  # ... remaining vars from tfvars
}

module "waf_getraxx" {
  source = "../modules/cf-waf"
  zone_id      = var.getraxx_zone_id
  surface_name = "getraxx"
  zone_hostname = "getraxx.com"
  owasp_sensitivity        = "medium"  # SPA/marketing — medium is fine
  managed_ruleset_action   = "log"
  auth_challenge_action    = "log"
  rate_limit_action        = "simulate"
  enable_qc_block          = false     # no signup on getraxx.com
  # ... remaining vars from tfvars
}

5. Integration with Existing Controls

5.1 `FLAG_ENFORCE_CF_ORIGIN`

Currently false on raxx-api-prod. This flag makes Raptor reject any request missing a valid CF-Connecting-IP header, hardening the origin.

Why it was left off: Without a WAF, CF proxying was not guaranteed to be the only traffic path. Direct-Heroku access was a legitimate fallback.

Why WAF deployment unlocks it: Once WAF rules are in block mode (Phase 4), all legitimate traffic arrives through CF. CF always injects CF-Connecting-IP on proxied requests. At that point, any request without the header is either a bypass attempt or an internal test tool — both of which should be blocked at the origin.

Order of operations:

Phase 1-3: WAF log → challenge → block on staging
Phase 4:   WAF block mode on prod (7-day soak)
Phase 5:   Flip FLAG_ENFORCE_CF_ORIGIN ON on raxx-api-prod + raxx-console-*
           (dedicated sub-card, requires operator action via heroku config:set)

The flag flip sub-card includes a smoke test: curl -I https://api.raxx.app/health must still return 200, and curl -I https://raxx-api-prod.herokuapp.com/health must return 403.

5.2 CF Access and WAF interaction

WAF (Layer 1) executes before CF Access (Layer 2) at the edge. This creates two concerns:

Aggressive bot rules blocking CF Access service tokens. Service tokens send a CF-Access-Client-Id header but have no browser fingerprint — they will score highly on bot detection. Custom WAF Rule 2 (see §4) explicitly skips bot challenges for requests carrying cf-access-client-id. This applies to Velvet, CI runners, and any other machine caller using a service token.
WAF blocking CF Access login redirects. CF Access login flows hit <domain>/cdn-cgi/access/ paths. These paths are automatically excluded from WAF managed rules by CF (they are CF-internal infrastructure). No explicit exclusion is needed, but feature-developer should verify this holds for the specific ruleset versions during Phase 1 soak.

5.3 App-layer rate limiter boundary

Dimension	CF WAF rate limit	App-layer rate limit
Granularity	Per source IP	Per authenticated user / session
Coverage	All traffic, pre-auth	Post-auth only
Action	Block or challenge at edge	HTTP 429 + audit log
Context	Blind to user identity	Full RBAC context

The two layers are complementary, not redundant. An IP-based WAF limit catches volumetric attacks before they consume Raptor dynos. A user-based app limit catches abuse by authenticated users (e.g., a logged-in user hammering order submissions).

Threshold calibration principle: WAF thresholds should be set at ~10x the expected legitimate peak for that surface. App-layer thresholds are set at the product policy level. These are independent knobs.

5.4 Postmark and third-party webhook bypass

Postmark inbound webhook (/api/webhooks/postmark) and any future Stripe/payment webhook require bypass of aggressive bot rules. These callers are server-to-server with no browser fingerprint and no CF Access service token.

Bypass strategy: WAF skip rule (Rule 1 in custom ruleset) matches known Postmark delivery IP ranges AND the exact webhook path. The IP list is managed as a Terraform variable sourced from Postmark's published IP range document. When Postmark rotates IPs without notice (Failure Mode F5), the WAF still passes the request but app-layer signature verification rejects invalid payloads — defense in depth applies here too.

For webhook callers that support HMAC signature validation (Stripe, etc.), signature verification at the app layer is the primary trust gate; the WAF skip is a performance optimization. Signature verification must succeed or the request is rejected at Layer 4, regardless of WAF bypass.

5.5 AWS API Gateway (execute-api.amazonaws.com)

Raxx's email delivery Lambda stack uses AWS API Gateway (execute-api.amazonaws.com). This endpoint is NOT behind Cloudflare — it is a bare AWS endpoint that callers (Postmark inbound bridge, SNS notifications) reach directly.

Decision: CF WAF does not protect execute-api.amazonaws.com. AWS WAF on the API Gateway is the appropriate control for that surface. However, as of this design, the Lambda stack is internal-to-AWS (SNS → SQS → Lambda), with the API Gateway only exposed to Postmark's inbound bridge IP range via an API Gateway resource policy. The resource policy serves as a coarse equivalent to WAF allowlisting.

Feature-developer implementing the WAF card should note this gap. A dedicated sub-card (SC-WAF-08) tracks whether AWS WAF on the email API Gateway is needed.

6. Failure Modes

ID	Failure	Detection	Recovery	Prevention
F1	WAF false positive: legitimate customer blocked (e.g., OWASP rule triggers on valid JSON body)	Customer error report; elevated 403 count in Logpush; WAF event in CF dashboard	Roll back specific rule to "log" mode via `terraform apply` with `managed_ruleset_action = "log"`; acknowledge customer support ticket	Phase 1 log-only soak for 7 days; false-positive gate <1% before advancing
F2	WAF false negative: real attack passes all rules	Audit log gap analysis; anomalous order-submission spike in app metrics	Tighten specific rule; escalate to CF support for managed rule update	Overlapping layers — app-layer rate limit catches abuse that WAF misses
F3	CF edge outage: WAF disappears entirely	CF status page; Heroku dyno CPU/memory spike; customer error reports	App-layer rate limiter becomes load-bearing; paper-first gate remains enforced; escalate to CF support; evaluate "under attack" mode on recovery	No single layer is load-bearing; app-layer is always active
F4	WAF rate limit too tight + Stripe/payment webhook backlog → cascading payment failures	Stripe webhook delivery failure alerts; payment processing lag	Immediately set `rate_limit_action = "simulate"` on affected rule; expand threshold; process backlog	Webhook bypass rules in custom ruleset (Rule 1); dedicated webhook rate limit exemption
F5	Postmark IP range rotates → customer support emails blocked	Postmark delivery failure bounce alerts; FreeScout ticket creation spike fails	Add new IP range to `postmark_ip_list` in tfvars + `terraform apply`; app-layer signature verification remains as fallback	HMAC signature verification is independent of IP allowlist; failed-signature requests rejected at Layer 4 regardless
F6	CF Access service token blocked by bot rules (new service token not on skip list)	Service returning 403/429 on machine-caller paths; Velvet distribution failures	Add new token header pattern to skip rule; or add token's CF Access Client ID to WAF bypass expression	Rule 2 (service token skip) is broad — matches any non-empty `cf-access-client-id`
F7	`FLAG_ENFORCE_CF_ORIGIN` flipped ON prematurely (before WAF Phase 4 soak)	Direct-Heroku smoke tests fail; monitoring tools using `.herokuapp.com` URLs break	Flip flag back to `false` via `heroku config:set FLAG_ENFORCE_CF_ORIGIN=false`; no redeploy needed	Origin guard flip is a dedicated sub-card with explicit gate criteria
F8	WAF logpush destination (S3) reaches retention limit → logs deleted before forensic use	S3 lifecycle rule triggers deletion; incident investigation finds log gap	Extend S3 lifecycle rule; restore from Glacier if available	Retention period must be set before Phase 1; operator decision required (§10)
F9	OWASP ruleset version update (CF auto-updates managed rules) → new false positives in prod	Spike in 403 responses; WAF event log shows new rule IDs	Set new rule to "log" mode via override; evaluate; promote back to "block"	Monitor WAF event log daily; CF changelog alerts on rule version bumps
F10	Quebec geo-block rule (`enable_qc_block = true`) blocks legitimate non-QC customer via VPN exit node in QC	Customer reports registration failure; CF country logged as `CA-QC`	Operator can temporarily set `enable_qc_block = false` + `terraform apply`; advise customer to disable VPN	Accept this UX tradeoff as per `project_quebec_geoblock_decision.md` — geo-block is the chosen compliance path
F11	Terraform state drift: WAF rule changed in CF dashboard (not via TF) → next `terraform apply` reverts it	`terraform plan` shows unexpected diff; CF dashboard vs TF state diverges	Import changed resource into TF state; document change; re-apply	Enforce IaC-only WAF changes; no direct CF dashboard edits after Phase 1 apply
F12	CF Logpush IAM credentials (S3) expire → WAF log gap	No new log files in S3 bucket for >30 min; CloudWatch S3 put metrics flatline	Velvet rotates Logpush S3 credentials; re-enable logpush job	Velvet enrollment expansion (ADR-0051 Layer C) covers S3 IAM credentials
F13	Bot Fight Mode flags a legitimate API client (e.g., mobile app with unusual TLS fingerprint)	Elevated bot score in Logpush; app returns CAPTCHA challenge to mobile client	Lower Bot Fight Mode strictness from "super" to "on"; or add mobile UA pattern to skip rule	Phase 0 decision: start with "on" (not "super"); validate before tightening
F14	WAF custom rule expression error (syntax mistake in Terraform HCL) → `terraform apply` fails	TF apply error at plan or apply step	Correct HCL expression; re-apply; no customer impact because apply failed before publishing	HCL expression syntax must be tested in CF dashboard sandbox before committing to TF module
F15	Logpush pushes raw session cookies in exported fields → credential leak	Security audit of Logpush field list	Immediately disable logpush job; rotate all active session tokens; audit which cookies were exported; notify affected users per GDPR breach timeline	Logpush field list in this design explicitly excludes `cookie` header; ADR-0002 (no stored credentials) applies to log destinations
F16	Per-surface WAF thresholds are miscalibrated for a traffic spike (e.g., marketing campaign) → legitimate customers rate-limited	Elevated 429 responses; customer complaints; traffic spike correlates with marketing event	Temporarily raise `global_rate_limit_threshold` + `terraform apply`; consider pre-event threshold lift procedure	Establish threshold review procedure before planned traffic events

7. Sequence Diagrams

Legitimate customer request (WAF pass-through)

sequenceDiagram
    participant C as Customer Browser
    participant WAF as CF Edge (WAF + Rate Limit)
    participant CFa as CF Access (operator surfaces)
    participant R as Raptor (api.raxx.app)
    participant DB as Postgres / Queue

    C->>WAF: POST /api/auth/login/verify
    WAF->>WAF: Evaluate managed + custom rules
    WAF->>WAF: Bot score check (score < threshold)
    WAF->>WAF: Rate limit check (under threshold)
    Note over WAF: PASS — request forwarded
    WAF->>R: Forward with CF-Connecting-IP injected
    R->>R: FLAG_ENFORCE_CF_ORIGIN check (CF-Connecting-IP present)
    R->>R: WebAuthn verification
    R->>DB: Write audit_log row
    R-->>C: 200 Set-Cookie: session=...

Attack blocked at edge

sequenceDiagram
    participant A as Attacker
    participant WAF as CF Edge (WAF)
    participant R as Raptor

    A->>WAF: POST /api/auth/login/verify (credential stuffing, 500 req/min)
    WAF->>WAF: Rate limit: 500 req/60s > threshold (20)
    WAF->>WAF: Log WAF event (FirewallMatchesActions: block)
    WAF-->>A: 429 Too Many Requests (Cloudflare challenge page)
    Note over R: Request never reaches Raptor

Webhook bypass

sequenceDiagram
    participant PM as Postmark Delivery
    participant WAF as CF Edge (WAF)
    participant R as Raptor

    PM->>WAF: POST /api/webhooks/postmark (from Postmark IP range)
    WAF->>WAF: Custom Rule 1: ip.src in postmark_ip_list AND path eq /api/webhooks/postmark
    WAF->>WAF: SKIP — bypass managed rules + bot challenge
    WAF->>R: Forward
    R->>R: HMAC signature verification (Postmark signing secret)
    Note over R: Signature valid → process; invalid → 403

8. Rollout Plan

Phase 0 — Operator account-level settings (operator action, no Terraform)

Gate criteria: must complete before Phase 1 Terraform apply

[ ] Enable CF Logpush in the CF account (Zero Trust → Settings → Logpush or zone-level Logpush settings).
[ ] Decide Logpush destination: S3 bucket in raxx-prod AWS account, or Sentry, or both. (Operator decision required — see §10.) Create S3 bucket + IAM write credentials in SSM before Phase 1.
[ ] Set Bot Fight Mode policy account-wide. Start with "on" (not "super"). (Operator decision required — see §10.)
[ ] Confirm CF API token scope: CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN must have Zone:WAF:Edit and Zone:Logs:Edit scopes. Add scopes if missing (per reference_cloudflare_tokens.md — do not confuse with DNS edit token).
[ ] Validate CF zone IDs for raxx.app and getraxx.com against what is in Infisical at /MooseQuest/cloudflare/.

Phase 1 — Log-only mode on staging (Terraform apply)

Duration: 7 days minimum Gate criteria to advance: false-positive rate on legitimate test traffic <1% of total requests

Deploy terraform/modules/cf-waf/ with all actions set to "log" / "simulate".
Enable Logpush on staging zones.
Run synthetic probes against staging (SC-WAF-06: dedicated synthetic probe card).
Review WAF event log daily: identify false positives by correlating FirewallMatchesRuleIDs with legitimate request patterns.
Adjust owasp_sensitivity and any per-rule overrides as needed.

Phase 2 — Challenge mode on staging

Duration: 72 hours minimum Gate criteria to advance: zero legitimate customer-sim flows challenged; bot/scanner traffic challenged successfully

Set managed_ruleset_action = "managed_challenge", rate_limit_action = "challenge".
Re-run synthetic probes. All must pass without a challenge page.
Confirm CF Access service token flows (Velvet, CI) are not challenged.
Review bot score distribution in Logpush.

Phase 3 — Block mode on staging

Duration: 72 hours minimum Gate criteria to advance: zero false blocks on legitimate traffic; at least one confirmed block of a simulated attack

Set managed_ruleset_action = "block", rate_limit_action = "ban".
Simulate common attack patterns (SQLi payload, OWASP test suite) and verify blocks.
Confirm Postmark webhook bypass works end-to-end.
Sign-off from operator before advancing.

Phase 4 — Prod rollout (log → block, per surface)

Timeline: After Phase 3 sign-off. Target 2026-05-23 UTC (pre-launch).

Step	Action	Duration
4a	Deploy WAF module to prod zones in `log` mode	Day 0
4b	Monitor prod Logpush; gate: false-positive rate <1% vs staging baseline	7 days
4c	Prod → challenge mode	Day 8
4d	Monitor challenge rate; gate: no legitimate customers challenged	48h
4e	Prod → block mode	Day 11
4f	Monitor 7 days; gate: no customer support tickets attributable to WAF	7 days

Phase 5 — `FLAG_ENFORCE_CF_ORIGIN` flip

Separate sub-card (SC-WAF-07). Executes only after Phase 4f gate passes.

heroku config:set FLAG_ENFORCE_CF_ORIGIN=true -a raxx-api-prod >/dev/null 2>&1
heroku config:set FLAG_ENFORCE_CF_ORIGIN=true -a raxx-console-prod >/dev/null 2>&1

Smoke test: curl -I https://api.raxx.app/health → 200. curl -I https://raxx-api-prod.herokuapp.com/health → 403.

9. Migrations

No application schema changes. No new Postgres tables.

Terraform state:

New module terraform/modules/cf-waf/ — no existing state to import.
New root stack terraform/waf/ — instantiates the module per zone.
Existing CF Access Terraform (terraform/cf-access/) is unchanged. WAF Terraform is a separate stack with its own state file to minimize blast radius.

Rollback at any phase: Set all module action variables to "log" / "simulate" and re-apply. This puts WAF in observation-only mode without removing any resources. Full rollback is terraform destroy on terraform/waf/ — removes all WAF rulesets and rate limits. CF Access and origin guard are unaffected.

10. Operator Decisions Required (Open Questions)

These block Phase 0 and therefore Phase 1. Feature-developer cannot start SC-WAF-01 until the operator resolves them.

#	Question	Blocking?	Stakes
OQ1	Logpush destination: S3 bucket only, or Sentry WAF event integration too? S3 enables long-term forensics; Sentry enables real-time alerting.	Blocks Phase 0	S3 is the baseline; Sentry adds ~$0/mo at current volume but requires Sentry project setup
OQ2	WAF log retention period: 90 days (matches ADR-0051 ops log baseline) or longer for financial audit compliance?	Blocks Phase 0	Longer retention = higher S3 cost; GDPR requires a defined retention period
OQ3	Bot Fight Mode strictness: "on" vs "super"? "super" challenges more aggressively including TLS fingerprint analysis; higher false-positive risk on API clients	Blocks Phase 0	Recommend "on" to start; revisit after Phase 1 data
OQ4	Challenge vs block decision for Auth paths: should elevated-threat-score auth requests get a challenge page (adds user friction) or a hard block?	Blocks Phase 1→Phase 2	Challenge = friction but allows legitimate users through; block = cleaner but risks false lockouts
OQ5	Allow-list management process: Terraform-only (requires PR per change) vs operator can add IPs/ASNs via CF dashboard with post-hoc TF import?	Does not block Phase 1	TF-only is safer (audit trail, drift prevention per ADR-0051); dashboard-then-import is faster for emergencies

11. Security Considerations

PII

WAF Logpush exports include ClientIP (full IPv4/IPv6). This is PII under GDPR.

Retention: Bounded by OQ2 (operator decision). Recommend 90 days as default.
Storage: S3 bucket must have server-side encryption (SSE-S3 or SSE-KMS) and access restricted to the raxx-waf-logs-reader IAM role. Public access must be blocked.
DSR deletion: WAF logs are infrastructure logs, not application-user records. Under GDPR, IP-only logs qualify for the "legitimate interest" exemption for security purposes. However, if a DSR erasure request is received, the operator should assess whether the requesting user's IP can be identified and whether deletion from S3 logs is warranted. This is low-risk at v1 scale.
Redaction: ClientRequestBody and the cookie header are never exported in the Logpush field list (see §4 logpush.tf). This is the primary mitigation against WebAuthn credential object or session token leakage.

Credential replay risk

WAF does not store credentials. The only credential-adjacent data in WAF logs is ClientIP and path (e.g., /api/auth/login/verify). Neither can be used to replay an authentication attempt.

If WAF logpush is compromised (S3 bucket exposed), an attacker learns which IPs authenticated when — a correlation attack, not a credential replay. GDPR breach notification applies per ADR-0003. The breachNotification flow must be triggered within 72 hours of confirmed S3 exposure.

Kill-switch

Per layer:

Layer	Kill-switch	Time to effect
WAF rules	`terraform apply` with all actions set to `"log"`	~30s (CF propagates ruleset changes globally)
Rate limits	`terraform apply` with `rate_limit_action = "simulate"`	~30s
Origin guard	`heroku config:set FLAG_ENFORCE_CF_ORIGIN=false`	~10s (no redeploy)
Logpush	`cloudflare_logpush_job.enabled = false` + apply	~30s

Secret rotation

Logpush S3 IAM credentials: enrolled in Velvet (ADR-0051 Layer C enrollment expansion, SC-WAF-09).
CF API token used by Terraform: rotated quarterly via Velvet; stored in Infisical at /MooseQuest/cloudflare/.
WAF rules themselves are not credentials — they are public-facing policy and not secret.

AWS WAF gap

execute-api.amazonaws.com (email delivery API Gateway) is not CF-proxied. This design does not cover that surface. SC-WAF-08 tracks the evaluation.

12. Sub-cards

See §8 (rollout) for sequencing. Cards listed here for reference:

Card	Title	Phase
SC-WAF-00	Phase 0 operator actions: CF account WAF settings + Logpush destination	Operator prerequisite
SC-WAF-01	Terraform: `terraform/modules/cf-waf/` module + `terraform/waf/` root stack, log mode	Phase 1
SC-WAF-02	Custom rules per surface: QC geo-block, service-token bypass, webhook bypass	Phase 1 (part of SC-WAF-01 or follow-on)
SC-WAF-03	WAF log-only soak: Logpush → S3, false-positive analysis, staging review	Phase 1 soak
SC-WAF-04	Cutover to challenge mode on staging	Phase 2
SC-WAF-05	Cutover to block mode on staging + prod log → block rollout	Phase 3 + 4
SC-WAF-06	Synthetic probes: per-surface flows that must pass WAF without challenge	Parallel with Phase 1
SC-WAF-07	`FLAG_ENFORCE_CF_ORIGIN` flip on raxx-api-prod + raxx-console-prod	Phase 5 (post Phase 4f)
SC-WAF-08	Evaluation: AWS WAF on email delivery API Gateway	Independent
SC-WAF-09	Velvet: enroll Logpush S3 IAM credentials in rotation	Independent (pairs with ADR-0051 SC-N6)