Raxx · internal docs

internal · gated ↑ index

CF WAF Layered Defense Strategy

Status: Draft Date: 2026-05-11 UTC Owner: software-architect ADR: 0077 Refs: - ADR-0031 — surface classes - ADR-0042 — auth posture, CF Access service tokens - ADR-0051 — layered controls pattern - raxx-app-track-b.mdFLAG_ENFORCE_CF_ORIGIN origin guard context - docs/security/waf-threat-model-2026-05-12.md — security-agent threat model (cross-reference when landed) - Existing Terraform: terraform/modules/cf-access-getraxx/, terraform/cf-access/


1. Context

Raxx exposes three zone-level perimeters to the internet:

Zone Primary surface Surface class (ADR-0031)
raxx.app Antlers SPA + api.raxx.app (Raptor) Class 1 (customer-facing)
getraxx.com Marketing / pre-launch site Class 1 (pre-launch CF-Access gated)
Operator surfaces (console.raxx.app, vault.raxx.app, tickets.raxx.app) Console, Infisical vault, FreeScout Class 2/3 (operator, CF Access gated)

As of 2026-05-11, the edge has Cloudflare proxying, CF Access for operator surfaces, and a feature-flagged origin guard (FLAG_ENFORCE_CF_ORIGIN, currently OFF on raxx-api-prod). There is no CF WAF ruleset in place. Application-layer rate limiting exists in the Flask middleware, but the edge applies no coarse rate limiting, no bot management, no OWASP ruleset, and no geo-blocking beyond what CF Access provides on gated surfaces.

This gap is pre-launch-critical because:


2. Invariants

All project-level invariants apply to this design. The ones most relevant to WAF:

  1. No stored credentials. WAF rules must not log request bodies that may contain WebAuthn attestation objects, auth tokens, or session cookies. Logpush must redact these fields at export time.
  2. Paper-first gating. WAF is a perimeter control, not a substitute for execution safety. The paper-mode gate in Raptor is preserved regardless of WAF state. WAF failing open is acceptable (see F3); order-submission paths must still be paper-gated server-side.
  3. Audit trail. Every WAF-triggered action (block, challenge, rate-limit) that touches an authenticated session must be attributable to an IP and timestamp. CF Logpush satisfies this requirement for edge events.
  4. Credentials into infra, not code. CF API tokens used by Terraform are read from Infisical at apply time — never hardcoded. Logpush destination credentials (S3, etc.) live in SSM (per feedback_aws_workloads_use_ssm_not_vault.md).
  5. GDPR by default. WAF logs carry IP addresses (PII). Retention period must be bounded. Logpush destination is DPA-ready. Breach notification applies if WAF log storage is compromised.
  6. Security is a design constraint. WAF rules are designed in this document before implementation. Feature-developer does not make WAF policy choices; they implement the Terraform module against this spec.

3. Layered Defense Architecture

Internet (attacker / legitimate traffic)
        |
        v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 1: Cloudflare Edge — WAF + Bot Management + Rate Limit   │
│  • OWASP Core Rule Set (managed)                                │
│  • CF Managed Ruleset (CVE/threat-intel)                        │
│  • Custom rules: geo-block QC signup, webhook bypass            │
│  • Bot Fight Mode (configurable challenge threshold)             │
│  • Per-surface coarse rate limits (per-IP)                      │
│  • Log-only → challenge → block per rollout phase               │
└────────────────────────┬────────────────────────────────────────┘
                         │ (only non-blocked traffic passes)
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 2: CF Access — operator surfaces only                    │
│  • Class 2/3 surfaces: Google Workspace IDP + MFA gate          │
│  • Class 1 (raxx.app, getraxx.com): no CF Access gate           │
│  • Service tokens for machine callers (Velvet, CI)              │
│  NOTE: WAF runs before CF Access. Service tokens must be on     │
│  WAF skip-list to avoid bot-rule false positives.               │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 3: Heroku origin guard — FLAG_ENFORCE_CF_ORIGIN          │
│  • Raptor rejects requests lacking CF-Connecting-IP header      │
│  • Flip to ON only after WAF Phase 4 soak (see §8)             │
│  • Console (raxx-console-*) same pattern                        │
│  • Direct-Heroku URL access blocked at origin                   │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 4: App-layer — Flask middleware                          │
│  • Fine-grained rate limiting: per-user / per-session           │
│  • Idempotency keys on order submission                          │
│  • Audit log: every state change                                │
│  • WebAuthn RP origin validation (FIDO2 spec invariant)         │
│  • RBAC enforcement (Queue)                                     │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         v
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 5: Data — Postgres + RBAC                                │
│  • Row-level access checks via RBAC model (ADR-0020)            │
│  • Append-only audit table (KMS hash chain — ADR-0022)          │
│  • No plaintext credentials ever persisted                      │
└─────────────────────────────────────────────────────────────────┘

What each layer catches and what passes through

Layer Catches Passes through (load on next layer)
1 — CF WAF Known attack signatures, OWASP Top-10 patterns, volumetric attacks, geo-restricted signup, QC block, bot crawlers, webhook IP spoofing Legitimate traffic, novel attacks not in managed rulesets, highly distributed low-volume attacks
2 — CF Access Unauthenticated operator-surface access, non-allowlisted Google accounts Machine callers with valid service tokens, authenticated operators
3 — Origin guard Direct-to-Heroku requests bypassing CF edge (once flag is ON) All CF-proxied traffic (the flag gates this layer)
4 — App middleware Per-user abuse, session-level rate limits, malformed auth payloads, CSRF Authenticated requests that pass rate limits
5 — DB Privilege escalation attempts, unauthorized row access Authorized data reads and writes

Design invariant: no single layer is load-bearing. F3 (CF outage) is the canonical test: Layer 4 rate limiting must absorb abuse-level traffic if Layer 1 disappears, even if it serves a degraded experience.


4. Terraform Module Design: terraform/modules/cf-waf/

Module structure

terraform/modules/cf-waf/
├── main.tf          # cloudflare_ruleset resources
├── rate_limits.tf   # cloudflare_rate_limit per surface
├── zone_settings.tf # cloudflare_zone_setting (security level, BIC, etc.)
├── logpush.tf       # cloudflare_logpush_job destination
├── variables.tf
├── outputs.tf
└── versions.tf

Resources

main.tf — Managed rulesets and custom rules

# Phase: http_request_firewall_managed — OWASP + CF Managed
resource "cloudflare_ruleset" "managed_waf" {
  zone_id     = var.zone_id
  name        = "Managed WAF rules — ${var.surface_name}"
  description = "OWASP CRS + CF Managed Ruleset for ${var.surface_name}"
  kind        = "zone"
  phase       = "http_request_firewall_managed"

  rules {
    # CF Managed Ruleset
    action = var.managed_ruleset_action   # "log" | "block"
    action_parameters { id = "efb7b8c949ac4650a09736fc376e9aee" }
    expression  = "true"
    description = "CF Managed Ruleset"
    enabled     = true
  }

  rules {
    # OWASP Core Rule Set
    action = var.owasp_action   # "log" | "block"
    action_parameters { id = "4814384a9e5d4991b9815dcfc25d2f1f" }
    expression  = "true"
    description = "OWASP Core Rule Set"
    enabled     = true
    action_parameters {
      overrides {
        # OWASP sensitivity: low for API surfaces, medium for SPA
        sensitivity_level = var.owasp_sensitivity
      }
    }
  }
}

# Phase: http_request_firewall_custom — custom rules per surface
resource "cloudflare_ruleset" "custom_waf" {
  zone_id     = var.zone_id
  name        = "Custom WAF rules — ${var.surface_name}"
  description = "Custom rules for ${var.surface_name}"
  kind        = "zone"
  phase       = "http_request_firewall_custom"

  # Rule 1: Postmark webhook bypass — allow known Postmark IPs before bot rules
  rules {
    action      = "skip"
    action_parameters { ruleset = "current" }
    expression  = "(ip.src in ${var.postmark_ip_list} and http.request.uri.path eq \"/api/webhooks/postmark\")"
    description = "Postmark inbound: skip WAF for known Postmark delivery IPs"
    enabled     = true
  }

  # Rule 2: CF Access service tokens — skip bot rules for machine callers
  rules {
    action      = "skip"
    action_parameters { rulesets = ["managed_bot_challenge"] }
    expression  = "(http.request.headers[\"cf-access-client-id\"] ne \"\")"
    description = "CF Access service tokens: skip bot challenge"
    enabled     = true
  }

  # Rule 3: Quebec signup geo-block (pre-launch blocker — ADR project_quebec_geoblock_decision)
  rules {
    action      = "block"
    expression  = "(ip.geoip.subdivision_1_iso_code eq \"QC\" and http.request.uri.path contains \"/api/auth/register\")"
    description = "Quebec: block registration (Bill 96 compliance gate)"
    enabled     = var.enable_qc_block
  }

  # Rule 4: Block requests to *.herokuapp.com direct URL (belt-and-suspenders for origin guard)
  rules {
    action      = "block"
    expression  = "(http.host matches r\".*\\.herokuapp\\.com\")"
    description = "Block direct Heroku URL access (belt-and-suspenders)"
    enabled     = true
  }

  # Rule 5: Path-based rate-limit trigger for auth endpoints (see rate_limits.tf)
  rules {
    action      = var.auth_challenge_action   # "log" | "managed_challenge" | "block"
    expression  = "(http.request.uri.path contains \"/api/auth/\" and cf.threat_score gt 30)"
    description = "Elevated threat score on auth paths: challenge"
    enabled     = true
  }
}

rate_limits.tf

# Coarse per-IP rate limit — all surfaces
resource "cloudflare_rate_limit" "global" {
  zone_id   = var.zone_id
  threshold = var.global_rate_limit_threshold   # e.g. 500 req/10s per IP
  period    = 10
  match {
    request { url_pattern = "${var.zone_hostname}/*" }
  }
  action {
    mode    = var.rate_limit_action   # "simulate" | "challenge" | "ban"
    timeout = 60
  }
  description = "Global coarse rate limit — ${var.surface_name}"
}

# Auth path rate limit — tighter for /api/auth/*
resource "cloudflare_rate_limit" "auth_endpoints" {
  zone_id   = var.zone_id
  threshold = var.auth_rate_limit_threshold   # e.g. 20 req/60s per IP
  period    = 60
  match {
    request { url_pattern = "${var.zone_hostname}/api/auth/*" }
  }
  action {
    mode    = var.rate_limit_action
    timeout = 300
  }
  description = "Auth endpoints rate limit — ${var.surface_name}"
}

# Order submission rate limit — protect paper-trade paths
resource "cloudflare_rate_limit" "order_submission" {
  zone_id   = var.zone_id
  threshold = var.order_rate_limit_threshold   # e.g. 10 req/60s per IP
  period    = 60
  match {
    request {
      url_pattern = "${var.zone_hostname}/api/trading/orders"
      methods     = ["POST"]
    }
  }
  action {
    mode    = var.rate_limit_action
    timeout = 300
  }
  description = "Order submission rate limit — ${var.surface_name}"
}

zone_settings.tf

resource "cloudflare_zone_setting" "security_level" {
  zone_id = var.zone_id
  setting = "security_level"
  value   = var.security_level   # "medium" | "high" | "under_attack"
}

resource "cloudflare_zone_setting" "browser_check" {
  zone_id = var.zone_id
  setting = "browser_check"
  value   = "on"
}

resource "cloudflare_zone_setting" "challenge_ttl" {
  zone_id = var.zone_id
  setting = "challenge_ttl"
  value   = var.challenge_ttl_seconds   # operator decision: 1800 (30m) default
}

logpush.tf

resource "cloudflare_logpush_job" "waf_events" {
  zone_id          = var.zone_id
  name             = "waf-events-${var.surface_name}"
  destination_conf = var.logpush_destination_conf   # "s3://<bucket>/<path>?..."

  dataset          = "http_requests"
  filter           = "{\"where\":{\"and\":[{\"key\":\"FirewallMatchesActions\",\"operator\":\"!empty\"}]}}"

  # Fields: timestamp, ClientIP, ClientRequestHost, ClientRequestURI,
  #         ClientRequestMethod, FirewallMatchesActions, FirewallMatchesRuleIDs,
  #         EdgeResponseStatus
  # Redacted: ClientRequestBody (never exported — may contain WebAuthn data)
  # Redacted: ClientRequestHeaders["cookie"] (session tokens)
  fields = join(",", [
    "Datetime", "ClientIP", "ClientRequestHost",
    "ClientRequestURI", "ClientRequestMethod",
    "FirewallMatchesActions", "FirewallMatchesRuleIDs",
    "EdgeResponseStatus", "BotScore", "BotScoreSrc",
    "ClientASN", "ClientCountry"
  ])

  enabled = true
}

Variables the module exposes per surface

Variable Type Purpose
zone_id string CF zone ID
surface_name string Human label (e.g. "raxx-app", "getraxx")
zone_hostname string Zone apex hostname
managed_ruleset_action string "log" / "block"
owasp_action string "log" / "block"
owasp_sensitivity string "low" / "medium" / "high"
auth_challenge_action string "log" / "managed_challenge" / "block"
global_rate_limit_threshold number Requests per 10s per IP
auth_rate_limit_threshold number Requests per 60s per IP on auth paths
order_rate_limit_threshold number Requests per 60s per IP on order paths
rate_limit_action string "simulate" / "challenge" / "ban"
security_level string CF security level
challenge_ttl_seconds number Challenge passage lifetime
postmark_ip_list list(string) Postmark delivery IP ranges
enable_qc_block bool Enable Quebec registration block
logpush_destination_conf string Logpush destination URI (from SSM)
bot_fight_mode string "off" / "on" / "super" — operator decision

Outputs

output "managed_waf_ruleset_id"  { value = cloudflare_ruleset.managed_waf.id }
output "custom_waf_ruleset_id"   { value = cloudflare_ruleset.custom_waf.id }
output "logpush_job_id"          { value = cloudflare_logpush_job.waf_events.id }
output "rate_limit_global_id"    { value = cloudflare_rate_limit.global.id }
output "rate_limit_auth_id"      value = cloudflare_rate_limit.auth_endpoints.id }
output "rate_limit_orders_id"    { value = cloudflare_rate_limit.order_submission.id }

Per-zone instantiation

The module is instantiated twice in terraform/waf/main.tf:

module "waf_raxx_app" {
  source = "../modules/cf-waf"
  zone_id      = var.raxx_app_zone_id
  surface_name = "raxx-app"
  zone_hostname = "raxx.app"
  owasp_sensitivity        = "low"    # API surface — low reduces false positives on JSON bodies
  managed_ruleset_action   = "log"    # start log-only (Phase 1)
  auth_challenge_action    = "log"
  rate_limit_action        = "simulate"
  enable_qc_block          = true
  # ... remaining vars from tfvars
}

module "waf_getraxx" {
  source = "../modules/cf-waf"
  zone_id      = var.getraxx_zone_id
  surface_name = "getraxx"
  zone_hostname = "getraxx.com"
  owasp_sensitivity        = "medium"  # SPA/marketing — medium is fine
  managed_ruleset_action   = "log"
  auth_challenge_action    = "log"
  rate_limit_action        = "simulate"
  enable_qc_block          = false     # no signup on getraxx.com
  # ... remaining vars from tfvars
}

5. Integration with Existing Controls

5.1 FLAG_ENFORCE_CF_ORIGIN

Currently false on raxx-api-prod. This flag makes Raptor reject any request missing a valid CF-Connecting-IP header, hardening the origin.

Why it was left off: Without a WAF, CF proxying was not guaranteed to be the only traffic path. Direct-Heroku access was a legitimate fallback.

Why WAF deployment unlocks it: Once WAF rules are in block mode (Phase 4), all legitimate traffic arrives through CF. CF always injects CF-Connecting-IP on proxied requests. At that point, any request without the header is either a bypass attempt or an internal test tool — both of which should be blocked at the origin.

Order of operations:

Phase 1-3: WAF log → challenge → block on staging
Phase 4:   WAF block mode on prod (7-day soak)
Phase 5:   Flip FLAG_ENFORCE_CF_ORIGIN ON on raxx-api-prod + raxx-console-*
           (dedicated sub-card, requires operator action via heroku config:set)

The flag flip sub-card includes a smoke test: curl -I https://api.raxx.app/health must still return 200, and curl -I https://raxx-api-prod.herokuapp.com/health must return 403.

5.2 CF Access and WAF interaction

WAF (Layer 1) executes before CF Access (Layer 2) at the edge. This creates two concerns:

  1. Aggressive bot rules blocking CF Access service tokens. Service tokens send a CF-Access-Client-Id header but have no browser fingerprint — they will score highly on bot detection. Custom WAF Rule 2 (see §4) explicitly skips bot challenges for requests carrying cf-access-client-id. This applies to Velvet, CI runners, and any other machine caller using a service token.

  2. WAF blocking CF Access login redirects. CF Access login flows hit <domain>/cdn-cgi/access/ paths. These paths are automatically excluded from WAF managed rules by CF (they are CF-internal infrastructure). No explicit exclusion is needed, but feature-developer should verify this holds for the specific ruleset versions during Phase 1 soak.

5.3 App-layer rate limiter boundary

Dimension CF WAF rate limit App-layer rate limit
Granularity Per source IP Per authenticated user / session
Coverage All traffic, pre-auth Post-auth only
Action Block or challenge at edge HTTP 429 + audit log
Context Blind to user identity Full RBAC context

The two layers are complementary, not redundant. An IP-based WAF limit catches volumetric attacks before they consume Raptor dynos. A user-based app limit catches abuse by authenticated users (e.g., a logged-in user hammering order submissions).

Threshold calibration principle: WAF thresholds should be set at ~10x the expected legitimate peak for that surface. App-layer thresholds are set at the product policy level. These are independent knobs.

5.4 Postmark and third-party webhook bypass

Postmark inbound webhook (/api/webhooks/postmark) and any future Stripe/payment webhook require bypass of aggressive bot rules. These callers are server-to-server with no browser fingerprint and no CF Access service token.

Bypass strategy: WAF skip rule (Rule 1 in custom ruleset) matches known Postmark delivery IP ranges AND the exact webhook path. The IP list is managed as a Terraform variable sourced from Postmark's published IP range document. When Postmark rotates IPs without notice (Failure Mode F5), the WAF still passes the request but app-layer signature verification rejects invalid payloads — defense in depth applies here too.

For webhook callers that support HMAC signature validation (Stripe, etc.), signature verification at the app layer is the primary trust gate; the WAF skip is a performance optimization. Signature verification must succeed or the request is rejected at Layer 4, regardless of WAF bypass.

5.5 AWS API Gateway (execute-api.amazonaws.com)

Raxx's email delivery Lambda stack uses AWS API Gateway (execute-api.amazonaws.com). This endpoint is NOT behind Cloudflare — it is a bare AWS endpoint that callers (Postmark inbound bridge, SNS notifications) reach directly.

Decision: CF WAF does not protect execute-api.amazonaws.com. AWS WAF on the API Gateway is the appropriate control for that surface. However, as of this design, the Lambda stack is internal-to-AWS (SNS → SQS → Lambda), with the API Gateway only exposed to Postmark's inbound bridge IP range via an API Gateway resource policy. The resource policy serves as a coarse equivalent to WAF allowlisting.

Feature-developer implementing the WAF card should note this gap. A dedicated sub-card (SC-WAF-08) tracks whether AWS WAF on the email API Gateway is needed.


6. Failure Modes

ID Failure Detection Recovery Prevention
F1 WAF false positive: legitimate customer blocked (e.g., OWASP rule triggers on valid JSON body) Customer error report; elevated 403 count in Logpush; WAF event in CF dashboard Roll back specific rule to "log" mode via terraform apply with managed_ruleset_action = "log"; acknowledge customer support ticket Phase 1 log-only soak for 7 days; false-positive gate <1% before advancing
F2 WAF false negative: real attack passes all rules Audit log gap analysis; anomalous order-submission spike in app metrics Tighten specific rule; escalate to CF support for managed rule update Overlapping layers — app-layer rate limit catches abuse that WAF misses
F3 CF edge outage: WAF disappears entirely CF status page; Heroku dyno CPU/memory spike; customer error reports App-layer rate limiter becomes load-bearing; paper-first gate remains enforced; escalate to CF support; evaluate "under attack" mode on recovery No single layer is load-bearing; app-layer is always active
F4 WAF rate limit too tight + Stripe/payment webhook backlog → cascading payment failures Stripe webhook delivery failure alerts; payment processing lag Immediately set rate_limit_action = "simulate" on affected rule; expand threshold; process backlog Webhook bypass rules in custom ruleset (Rule 1); dedicated webhook rate limit exemption
F5 Postmark IP range rotates → customer support emails blocked Postmark delivery failure bounce alerts; FreeScout ticket creation spike fails Add new IP range to postmark_ip_list in tfvars + terraform apply; app-layer signature verification remains as fallback HMAC signature verification is independent of IP allowlist; failed-signature requests rejected at Layer 4 regardless
F6 CF Access service token blocked by bot rules (new service token not on skip list) Service returning 403/429 on machine-caller paths; Velvet distribution failures Add new token header pattern to skip rule; or add token's CF Access Client ID to WAF bypass expression Rule 2 (service token skip) is broad — matches any non-empty cf-access-client-id
F7 FLAG_ENFORCE_CF_ORIGIN flipped ON prematurely (before WAF Phase 4 soak) Direct-Heroku smoke tests fail; monitoring tools using .herokuapp.com URLs break Flip flag back to false via heroku config:set FLAG_ENFORCE_CF_ORIGIN=false; no redeploy needed Origin guard flip is a dedicated sub-card with explicit gate criteria
F8 WAF logpush destination (S3) reaches retention limit → logs deleted before forensic use S3 lifecycle rule triggers deletion; incident investigation finds log gap Extend S3 lifecycle rule; restore from Glacier if available Retention period must be set before Phase 1; operator decision required (§10)
F9 OWASP ruleset version update (CF auto-updates managed rules) → new false positives in prod Spike in 403 responses; WAF event log shows new rule IDs Set new rule to "log" mode via override; evaluate; promote back to "block" Monitor WAF event log daily; CF changelog alerts on rule version bumps
F10 Quebec geo-block rule (enable_qc_block = true) blocks legitimate non-QC customer via VPN exit node in QC Customer reports registration failure; CF country logged as CA-QC Operator can temporarily set enable_qc_block = false + terraform apply; advise customer to disable VPN Accept this UX tradeoff as per project_quebec_geoblock_decision.md — geo-block is the chosen compliance path
F11 Terraform state drift: WAF rule changed in CF dashboard (not via TF) → next terraform apply reverts it terraform plan shows unexpected diff; CF dashboard vs TF state diverges Import changed resource into TF state; document change; re-apply Enforce IaC-only WAF changes; no direct CF dashboard edits after Phase 1 apply
F12 CF Logpush IAM credentials (S3) expire → WAF log gap No new log files in S3 bucket for >30 min; CloudWatch S3 put metrics flatline Velvet rotates Logpush S3 credentials; re-enable logpush job Velvet enrollment expansion (ADR-0051 Layer C) covers S3 IAM credentials
F13 Bot Fight Mode flags a legitimate API client (e.g., mobile app with unusual TLS fingerprint) Elevated bot score in Logpush; app returns CAPTCHA challenge to mobile client Lower Bot Fight Mode strictness from "super" to "on"; or add mobile UA pattern to skip rule Phase 0 decision: start with "on" (not "super"); validate before tightening
F14 WAF custom rule expression error (syntax mistake in Terraform HCL) → terraform apply fails TF apply error at plan or apply step Correct HCL expression; re-apply; no customer impact because apply failed before publishing HCL expression syntax must be tested in CF dashboard sandbox before committing to TF module
F15 Logpush pushes raw session cookies in exported fields → credential leak Security audit of Logpush field list Immediately disable logpush job; rotate all active session tokens; audit which cookies were exported; notify affected users per GDPR breach timeline Logpush field list in this design explicitly excludes cookie header; ADR-0002 (no stored credentials) applies to log destinations
F16 Per-surface WAF thresholds are miscalibrated for a traffic spike (e.g., marketing campaign) → legitimate customers rate-limited Elevated 429 responses; customer complaints; traffic spike correlates with marketing event Temporarily raise global_rate_limit_threshold + terraform apply; consider pre-event threshold lift procedure Establish threshold review procedure before planned traffic events

7. Sequence Diagrams

Legitimate customer request (WAF pass-through)

sequenceDiagram
    participant C as Customer Browser
    participant WAF as CF Edge (WAF + Rate Limit)
    participant CFa as CF Access (operator surfaces)
    participant R as Raptor (api.raxx.app)
    participant DB as Postgres / Queue

    C->>WAF: POST /api/auth/login/verify
    WAF->>WAF: Evaluate managed + custom rules
    WAF->>WAF: Bot score check (score < threshold)
    WAF->>WAF: Rate limit check (under threshold)
    Note over WAF: PASS — request forwarded
    WAF->>R: Forward with CF-Connecting-IP injected
    R->>R: FLAG_ENFORCE_CF_ORIGIN check (CF-Connecting-IP present)
    R->>R: WebAuthn verification
    R->>DB: Write audit_log row
    R-->>C: 200 Set-Cookie: session=...

Attack blocked at edge

sequenceDiagram
    participant A as Attacker
    participant WAF as CF Edge (WAF)
    participant R as Raptor

    A->>WAF: POST /api/auth/login/verify (credential stuffing, 500 req/min)
    WAF->>WAF: Rate limit: 500 req/60s > threshold (20)
    WAF->>WAF: Log WAF event (FirewallMatchesActions: block)
    WAF-->>A: 429 Too Many Requests (Cloudflare challenge page)
    Note over R: Request never reaches Raptor

Webhook bypass

sequenceDiagram
    participant PM as Postmark Delivery
    participant WAF as CF Edge (WAF)
    participant R as Raptor

    PM->>WAF: POST /api/webhooks/postmark (from Postmark IP range)
    WAF->>WAF: Custom Rule 1: ip.src in postmark_ip_list AND path eq /api/webhooks/postmark
    WAF->>WAF: SKIP — bypass managed rules + bot challenge
    WAF->>R: Forward
    R->>R: HMAC signature verification (Postmark signing secret)
    Note over R: Signature valid → process; invalid → 403

8. Rollout Plan

Phase 0 — Operator account-level settings (operator action, no Terraform)

Gate criteria: must complete before Phase 1 Terraform apply

Phase 1 — Log-only mode on staging (Terraform apply)

Duration: 7 days minimum Gate criteria to advance: false-positive rate on legitimate test traffic <1% of total requests

Phase 2 — Challenge mode on staging

Duration: 72 hours minimum Gate criteria to advance: zero legitimate customer-sim flows challenged; bot/scanner traffic challenged successfully

Phase 3 — Block mode on staging

Duration: 72 hours minimum Gate criteria to advance: zero false blocks on legitimate traffic; at least one confirmed block of a simulated attack

Phase 4 — Prod rollout (log → block, per surface)

Timeline: After Phase 3 sign-off. Target 2026-05-23 UTC (pre-launch).

Step Action Duration
4a Deploy WAF module to prod zones in log mode Day 0
4b Monitor prod Logpush; gate: false-positive rate <1% vs staging baseline 7 days
4c Prod → challenge mode Day 8
4d Monitor challenge rate; gate: no legitimate customers challenged 48h
4e Prod → block mode Day 11
4f Monitor 7 days; gate: no customer support tickets attributable to WAF 7 days

Phase 5 — FLAG_ENFORCE_CF_ORIGIN flip

Separate sub-card (SC-WAF-07). Executes only after Phase 4f gate passes.

heroku config:set FLAG_ENFORCE_CF_ORIGIN=true -a raxx-api-prod >/dev/null 2>&1
heroku config:set FLAG_ENFORCE_CF_ORIGIN=true -a raxx-console-prod >/dev/null 2>&1

Smoke test: curl -I https://api.raxx.app/health → 200. curl -I https://raxx-api-prod.herokuapp.com/health → 403.


9. Migrations

No application schema changes. No new Postgres tables.

Terraform state:

Rollback at any phase: Set all module action variables to "log" / "simulate" and re-apply. This puts WAF in observation-only mode without removing any resources. Full rollback is terraform destroy on terraform/waf/ — removes all WAF rulesets and rate limits. CF Access and origin guard are unaffected.


10. Operator Decisions Required (Open Questions)

These block Phase 0 and therefore Phase 1. Feature-developer cannot start SC-WAF-01 until the operator resolves them.

# Question Blocking? Stakes
OQ1 Logpush destination: S3 bucket only, or Sentry WAF event integration too? S3 enables long-term forensics; Sentry enables real-time alerting. Blocks Phase 0 S3 is the baseline; Sentry adds ~$0/mo at current volume but requires Sentry project setup
OQ2 WAF log retention period: 90 days (matches ADR-0051 ops log baseline) or longer for financial audit compliance? Blocks Phase 0 Longer retention = higher S3 cost; GDPR requires a defined retention period
OQ3 Bot Fight Mode strictness: "on" vs "super"? "super" challenges more aggressively including TLS fingerprint analysis; higher false-positive risk on API clients Blocks Phase 0 Recommend "on" to start; revisit after Phase 1 data
OQ4 Challenge vs block decision for Auth paths: should elevated-threat-score auth requests get a challenge page (adds user friction) or a hard block? Blocks Phase 1→Phase 2 Challenge = friction but allows legitimate users through; block = cleaner but risks false lockouts
OQ5 Allow-list management process: Terraform-only (requires PR per change) vs operator can add IPs/ASNs via CF dashboard with post-hoc TF import? Does not block Phase 1 TF-only is safer (audit trail, drift prevention per ADR-0051); dashboard-then-import is faster for emergencies

11. Security Considerations

PII

WAF Logpush exports include ClientIP (full IPv4/IPv6). This is PII under GDPR.

Credential replay risk

WAF does not store credentials. The only credential-adjacent data in WAF logs is ClientIP and path (e.g., /api/auth/login/verify). Neither can be used to replay an authentication attempt.

If WAF logpush is compromised (S3 bucket exposed), an attacker learns which IPs authenticated when — a correlation attack, not a credential replay. GDPR breach notification applies per ADR-0003. The breachNotification flow must be triggered within 72 hours of confirmed S3 exposure.

Kill-switch

Per layer:

Layer Kill-switch Time to effect
WAF rules terraform apply with all actions set to "log" ~30s (CF propagates ruleset changes globally)
Rate limits terraform apply with rate_limit_action = "simulate" ~30s
Origin guard heroku config:set FLAG_ENFORCE_CF_ORIGIN=false ~10s (no redeploy)
Logpush cloudflare_logpush_job.enabled = false + apply ~30s

Secret rotation

AWS WAF gap

execute-api.amazonaws.com (email delivery API Gateway) is not CF-proxied. This design does not cover that surface. SC-WAF-08 tracks the evaluation.


12. Sub-cards

See §8 (rollout) for sequencing. Cards listed here for reference:

Card Title Phase
SC-WAF-00 Phase 0 operator actions: CF account WAF settings + Logpush destination Operator prerequisite
SC-WAF-01 Terraform: terraform/modules/cf-waf/ module + terraform/waf/ root stack, log mode Phase 1
SC-WAF-02 Custom rules per surface: QC geo-block, service-token bypass, webhook bypass Phase 1 (part of SC-WAF-01 or follow-on)
SC-WAF-03 WAF log-only soak: Logpush → S3, false-positive analysis, staging review Phase 1 soak
SC-WAF-04 Cutover to challenge mode on staging Phase 2
SC-WAF-05 Cutover to block mode on staging + prod log → block rollout Phase 3 + 4
SC-WAF-06 Synthetic probes: per-surface flows that must pass WAF without challenge Parallel with Phase 1
SC-WAF-07 FLAG_ENFORCE_CF_ORIGIN flip on raxx-api-prod + raxx-console-prod Phase 5 (post Phase 4f)
SC-WAF-08 Evaluation: AWS WAF on email delivery API Gateway Independent
SC-WAF-09 Velvet: enroll Logpush S3 IAM credentials in rotation Independent (pairs with ADR-0051 SC-N6)