Status: Draft
Date: 2026-05-11 UTC
Owner: software-architect
ADR: 0077
Refs:
- ADR-0031 — surface classes
- ADR-0042 — auth posture, CF Access service tokens
- ADR-0051 — layered controls pattern
- raxx-app-track-b.md — FLAG_ENFORCE_CF_ORIGIN origin guard context
- docs/security/waf-threat-model-2026-05-12.md — security-agent threat model (cross-reference when landed)
- Existing Terraform: terraform/modules/cf-access-getraxx/, terraform/cf-access/
Raxx exposes three zone-level perimeters to the internet:
| Zone | Primary surface | Surface class (ADR-0031) |
|---|---|---|
raxx.app |
Antlers SPA + api.raxx.app (Raptor) |
Class 1 (customer-facing) |
getraxx.com |
Marketing / pre-launch site | Class 1 (pre-launch CF-Access gated) |
Operator surfaces (console.raxx.app, vault.raxx.app, tickets.raxx.app) |
Console, Infisical vault, FreeScout | Class 2/3 (operator, CF Access gated) |
As of 2026-05-11, the edge has Cloudflare proxying, CF Access for operator surfaces, and a feature-flagged origin guard (FLAG_ENFORCE_CF_ORIGIN, currently OFF on raxx-api-prod). There is no CF WAF ruleset in place. Application-layer rate limiting exists in the Flask middleware, but the edge applies no coarse rate limiting, no bot management, no OWASP ruleset, and no geo-blocking beyond what CF Access provides on gated surfaces.
This gap is pre-launch-critical because:
api.raxx.app carries order-submission and WebAuthn credential exchange — both attract credential stuffing and brute-force attacks.FLAG_ENFORCE_CF_ORIGIN ON: if all legitimate traffic arrives through CF, rejecting direct-Heroku requests adds a hard layer with zero customer impact.All project-level invariants apply to this design. The ones most relevant to WAF:
feedback_aws_workloads_use_ssm_not_vault.md).Internet (attacker / legitimate traffic)
|
v
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 1: Cloudflare Edge — WAF + Bot Management + Rate Limit │
│ • OWASP Core Rule Set (managed) │
│ • CF Managed Ruleset (CVE/threat-intel) │
│ • Custom rules: geo-block QC signup, webhook bypass │
│ • Bot Fight Mode (configurable challenge threshold) │
│ • Per-surface coarse rate limits (per-IP) │
│ • Log-only → challenge → block per rollout phase │
└────────────────────────┬────────────────────────────────────────┘
│ (only non-blocked traffic passes)
v
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 2: CF Access — operator surfaces only │
│ • Class 2/3 surfaces: Google Workspace IDP + MFA gate │
│ • Class 1 (raxx.app, getraxx.com): no CF Access gate │
│ • Service tokens for machine callers (Velvet, CI) │
│ NOTE: WAF runs before CF Access. Service tokens must be on │
│ WAF skip-list to avoid bot-rule false positives. │
└────────────────────────┬────────────────────────────────────────┘
│
v
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 3: Heroku origin guard — FLAG_ENFORCE_CF_ORIGIN │
│ • Raptor rejects requests lacking CF-Connecting-IP header │
│ • Flip to ON only after WAF Phase 4 soak (see §8) │
│ • Console (raxx-console-*) same pattern │
│ • Direct-Heroku URL access blocked at origin │
└────────────────────────┬────────────────────────────────────────┘
│
v
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 4: App-layer — Flask middleware │
│ • Fine-grained rate limiting: per-user / per-session │
│ • Idempotency keys on order submission │
│ • Audit log: every state change │
│ • WebAuthn RP origin validation (FIDO2 spec invariant) │
│ • RBAC enforcement (Queue) │
└────────────────────────┬────────────────────────────────────────┘
│
v
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 5: Data — Postgres + RBAC │
│ • Row-level access checks via RBAC model (ADR-0020) │
│ • Append-only audit table (KMS hash chain — ADR-0022) │
│ • No plaintext credentials ever persisted │
└─────────────────────────────────────────────────────────────────┘
| Layer | Catches | Passes through (load on next layer) |
|---|---|---|
| 1 — CF WAF | Known attack signatures, OWASP Top-10 patterns, volumetric attacks, geo-restricted signup, QC block, bot crawlers, webhook IP spoofing | Legitimate traffic, novel attacks not in managed rulesets, highly distributed low-volume attacks |
| 2 — CF Access | Unauthenticated operator-surface access, non-allowlisted Google accounts | Machine callers with valid service tokens, authenticated operators |
| 3 — Origin guard | Direct-to-Heroku requests bypassing CF edge (once flag is ON) | All CF-proxied traffic (the flag gates this layer) |
| 4 — App middleware | Per-user abuse, session-level rate limits, malformed auth payloads, CSRF | Authenticated requests that pass rate limits |
| 5 — DB | Privilege escalation attempts, unauthorized row access | Authorized data reads and writes |
Design invariant: no single layer is load-bearing. F3 (CF outage) is the canonical test: Layer 4 rate limiting must absorb abuse-level traffic if Layer 1 disappears, even if it serves a degraded experience.
terraform/modules/cf-waf/terraform/modules/cf-waf/
├── main.tf # cloudflare_ruleset resources
├── rate_limits.tf # cloudflare_rate_limit per surface
├── zone_settings.tf # cloudflare_zone_setting (security level, BIC, etc.)
├── logpush.tf # cloudflare_logpush_job destination
├── variables.tf
├── outputs.tf
└── versions.tf
main.tf — Managed rulesets and custom rules# Phase: http_request_firewall_managed — OWASP + CF Managed
resource "cloudflare_ruleset" "managed_waf" {
zone_id = var.zone_id
name = "Managed WAF rules — ${var.surface_name}"
description = "OWASP CRS + CF Managed Ruleset for ${var.surface_name}"
kind = "zone"
phase = "http_request_firewall_managed"
rules {
# CF Managed Ruleset
action = var.managed_ruleset_action # "log" | "block"
action_parameters { id = "efb7b8c949ac4650a09736fc376e9aee" }
expression = "true"
description = "CF Managed Ruleset"
enabled = true
}
rules {
# OWASP Core Rule Set
action = var.owasp_action # "log" | "block"
action_parameters { id = "4814384a9e5d4991b9815dcfc25d2f1f" }
expression = "true"
description = "OWASP Core Rule Set"
enabled = true
action_parameters {
overrides {
# OWASP sensitivity: low for API surfaces, medium for SPA
sensitivity_level = var.owasp_sensitivity
}
}
}
}
# Phase: http_request_firewall_custom — custom rules per surface
resource "cloudflare_ruleset" "custom_waf" {
zone_id = var.zone_id
name = "Custom WAF rules — ${var.surface_name}"
description = "Custom rules for ${var.surface_name}"
kind = "zone"
phase = "http_request_firewall_custom"
# Rule 1: Postmark webhook bypass — allow known Postmark IPs before bot rules
rules {
action = "skip"
action_parameters { ruleset = "current" }
expression = "(ip.src in ${var.postmark_ip_list} and http.request.uri.path eq \"/api/webhooks/postmark\")"
description = "Postmark inbound: skip WAF for known Postmark delivery IPs"
enabled = true
}
# Rule 2: CF Access service tokens — skip bot rules for machine callers
rules {
action = "skip"
action_parameters { rulesets = ["managed_bot_challenge"] }
expression = "(http.request.headers[\"cf-access-client-id\"] ne \"\")"
description = "CF Access service tokens: skip bot challenge"
enabled = true
}
# Rule 3: Quebec signup geo-block (pre-launch blocker — ADR project_quebec_geoblock_decision)
rules {
action = "block"
expression = "(ip.geoip.subdivision_1_iso_code eq \"QC\" and http.request.uri.path contains \"/api/auth/register\")"
description = "Quebec: block registration (Bill 96 compliance gate)"
enabled = var.enable_qc_block
}
# Rule 4: Block requests to *.herokuapp.com direct URL (belt-and-suspenders for origin guard)
rules {
action = "block"
expression = "(http.host matches r\".*\\.herokuapp\\.com\")"
description = "Block direct Heroku URL access (belt-and-suspenders)"
enabled = true
}
# Rule 5: Path-based rate-limit trigger for auth endpoints (see rate_limits.tf)
rules {
action = var.auth_challenge_action # "log" | "managed_challenge" | "block"
expression = "(http.request.uri.path contains \"/api/auth/\" and cf.threat_score gt 30)"
description = "Elevated threat score on auth paths: challenge"
enabled = true
}
}
rate_limits.tf# Coarse per-IP rate limit — all surfaces
resource "cloudflare_rate_limit" "global" {
zone_id = var.zone_id
threshold = var.global_rate_limit_threshold # e.g. 500 req/10s per IP
period = 10
match {
request { url_pattern = "${var.zone_hostname}/*" }
}
action {
mode = var.rate_limit_action # "simulate" | "challenge" | "ban"
timeout = 60
}
description = "Global coarse rate limit — ${var.surface_name}"
}
# Auth path rate limit — tighter for /api/auth/*
resource "cloudflare_rate_limit" "auth_endpoints" {
zone_id = var.zone_id
threshold = var.auth_rate_limit_threshold # e.g. 20 req/60s per IP
period = 60
match {
request { url_pattern = "${var.zone_hostname}/api/auth/*" }
}
action {
mode = var.rate_limit_action
timeout = 300
}
description = "Auth endpoints rate limit — ${var.surface_name}"
}
# Order submission rate limit — protect paper-trade paths
resource "cloudflare_rate_limit" "order_submission" {
zone_id = var.zone_id
threshold = var.order_rate_limit_threshold # e.g. 10 req/60s per IP
period = 60
match {
request {
url_pattern = "${var.zone_hostname}/api/trading/orders"
methods = ["POST"]
}
}
action {
mode = var.rate_limit_action
timeout = 300
}
description = "Order submission rate limit — ${var.surface_name}"
}
zone_settings.tfresource "cloudflare_zone_setting" "security_level" {
zone_id = var.zone_id
setting = "security_level"
value = var.security_level # "medium" | "high" | "under_attack"
}
resource "cloudflare_zone_setting" "browser_check" {
zone_id = var.zone_id
setting = "browser_check"
value = "on"
}
resource "cloudflare_zone_setting" "challenge_ttl" {
zone_id = var.zone_id
setting = "challenge_ttl"
value = var.challenge_ttl_seconds # operator decision: 1800 (30m) default
}
logpush.tfresource "cloudflare_logpush_job" "waf_events" {
zone_id = var.zone_id
name = "waf-events-${var.surface_name}"
destination_conf = var.logpush_destination_conf # "s3://<bucket>/<path>?..."
dataset = "http_requests"
filter = "{\"where\":{\"and\":[{\"key\":\"FirewallMatchesActions\",\"operator\":\"!empty\"}]}}"
# Fields: timestamp, ClientIP, ClientRequestHost, ClientRequestURI,
# ClientRequestMethod, FirewallMatchesActions, FirewallMatchesRuleIDs,
# EdgeResponseStatus
# Redacted: ClientRequestBody (never exported — may contain WebAuthn data)
# Redacted: ClientRequestHeaders["cookie"] (session tokens)
fields = join(",", [
"Datetime", "ClientIP", "ClientRequestHost",
"ClientRequestURI", "ClientRequestMethod",
"FirewallMatchesActions", "FirewallMatchesRuleIDs",
"EdgeResponseStatus", "BotScore", "BotScoreSrc",
"ClientASN", "ClientCountry"
])
enabled = true
}
| Variable | Type | Purpose |
|---|---|---|
zone_id |
string | CF zone ID |
surface_name |
string | Human label (e.g. "raxx-app", "getraxx") |
zone_hostname |
string | Zone apex hostname |
managed_ruleset_action |
string | "log" / "block" |
owasp_action |
string | "log" / "block" |
owasp_sensitivity |
string | "low" / "medium" / "high" |
auth_challenge_action |
string | "log" / "managed_challenge" / "block" |
global_rate_limit_threshold |
number | Requests per 10s per IP |
auth_rate_limit_threshold |
number | Requests per 60s per IP on auth paths |
order_rate_limit_threshold |
number | Requests per 60s per IP on order paths |
rate_limit_action |
string | "simulate" / "challenge" / "ban" |
security_level |
string | CF security level |
challenge_ttl_seconds |
number | Challenge passage lifetime |
postmark_ip_list |
list(string) | Postmark delivery IP ranges |
enable_qc_block |
bool | Enable Quebec registration block |
logpush_destination_conf |
string | Logpush destination URI (from SSM) |
bot_fight_mode |
string | "off" / "on" / "super" — operator decision |
output "managed_waf_ruleset_id" { value = cloudflare_ruleset.managed_waf.id }
output "custom_waf_ruleset_id" { value = cloudflare_ruleset.custom_waf.id }
output "logpush_job_id" { value = cloudflare_logpush_job.waf_events.id }
output "rate_limit_global_id" { value = cloudflare_rate_limit.global.id }
output "rate_limit_auth_id" value = cloudflare_rate_limit.auth_endpoints.id }
output "rate_limit_orders_id" { value = cloudflare_rate_limit.order_submission.id }
The module is instantiated twice in terraform/waf/main.tf:
module "waf_raxx_app" {
source = "../modules/cf-waf"
zone_id = var.raxx_app_zone_id
surface_name = "raxx-app"
zone_hostname = "raxx.app"
owasp_sensitivity = "low" # API surface — low reduces false positives on JSON bodies
managed_ruleset_action = "log" # start log-only (Phase 1)
auth_challenge_action = "log"
rate_limit_action = "simulate"
enable_qc_block = true
# ... remaining vars from tfvars
}
module "waf_getraxx" {
source = "../modules/cf-waf"
zone_id = var.getraxx_zone_id
surface_name = "getraxx"
zone_hostname = "getraxx.com"
owasp_sensitivity = "medium" # SPA/marketing — medium is fine
managed_ruleset_action = "log"
auth_challenge_action = "log"
rate_limit_action = "simulate"
enable_qc_block = false # no signup on getraxx.com
# ... remaining vars from tfvars
}
FLAG_ENFORCE_CF_ORIGINCurrently false on raxx-api-prod. This flag makes Raptor reject any request missing a valid CF-Connecting-IP header, hardening the origin.
Why it was left off: Without a WAF, CF proxying was not guaranteed to be the only traffic path. Direct-Heroku access was a legitimate fallback.
Why WAF deployment unlocks it: Once WAF rules are in block mode (Phase 4), all legitimate traffic arrives through CF. CF always injects CF-Connecting-IP on proxied requests. At that point, any request without the header is either a bypass attempt or an internal test tool — both of which should be blocked at the origin.
Order of operations:
Phase 1-3: WAF log → challenge → block on staging
Phase 4: WAF block mode on prod (7-day soak)
Phase 5: Flip FLAG_ENFORCE_CF_ORIGIN ON on raxx-api-prod + raxx-console-*
(dedicated sub-card, requires operator action via heroku config:set)
The flag flip sub-card includes a smoke test: curl -I https://api.raxx.app/health must still return 200, and curl -I https://raxx-api-prod.herokuapp.com/health must return 403.
WAF (Layer 1) executes before CF Access (Layer 2) at the edge. This creates two concerns:
Aggressive bot rules blocking CF Access service tokens. Service tokens send a CF-Access-Client-Id header but have no browser fingerprint — they will score highly on bot detection. Custom WAF Rule 2 (see §4) explicitly skips bot challenges for requests carrying cf-access-client-id. This applies to Velvet, CI runners, and any other machine caller using a service token.
WAF blocking CF Access login redirects. CF Access login flows hit <domain>/cdn-cgi/access/ paths. These paths are automatically excluded from WAF managed rules by CF (they are CF-internal infrastructure). No explicit exclusion is needed, but feature-developer should verify this holds for the specific ruleset versions during Phase 1 soak.
| Dimension | CF WAF rate limit | App-layer rate limit |
|---|---|---|
| Granularity | Per source IP | Per authenticated user / session |
| Coverage | All traffic, pre-auth | Post-auth only |
| Action | Block or challenge at edge | HTTP 429 + audit log |
| Context | Blind to user identity | Full RBAC context |
The two layers are complementary, not redundant. An IP-based WAF limit catches volumetric attacks before they consume Raptor dynos. A user-based app limit catches abuse by authenticated users (e.g., a logged-in user hammering order submissions).
Threshold calibration principle: WAF thresholds should be set at ~10x the expected legitimate peak for that surface. App-layer thresholds are set at the product policy level. These are independent knobs.
Postmark inbound webhook (/api/webhooks/postmark) and any future Stripe/payment webhook require bypass of aggressive bot rules. These callers are server-to-server with no browser fingerprint and no CF Access service token.
Bypass strategy: WAF skip rule (Rule 1 in custom ruleset) matches known Postmark delivery IP ranges AND the exact webhook path. The IP list is managed as a Terraform variable sourced from Postmark's published IP range document. When Postmark rotates IPs without notice (Failure Mode F5), the WAF still passes the request but app-layer signature verification rejects invalid payloads — defense in depth applies here too.
For webhook callers that support HMAC signature validation (Stripe, etc.), signature verification at the app layer is the primary trust gate; the WAF skip is a performance optimization. Signature verification must succeed or the request is rejected at Layer 4, regardless of WAF bypass.
Raxx's email delivery Lambda stack uses AWS API Gateway (execute-api.amazonaws.com). This endpoint is NOT behind Cloudflare — it is a bare AWS endpoint that callers (Postmark inbound bridge, SNS notifications) reach directly.
Decision: CF WAF does not protect execute-api.amazonaws.com. AWS WAF on the API Gateway is the appropriate control for that surface. However, as of this design, the Lambda stack is internal-to-AWS (SNS → SQS → Lambda), with the API Gateway only exposed to Postmark's inbound bridge IP range via an API Gateway resource policy. The resource policy serves as a coarse equivalent to WAF allowlisting.
Feature-developer implementing the WAF card should note this gap. A dedicated sub-card (SC-WAF-08) tracks whether AWS WAF on the email API Gateway is needed.
| ID | Failure | Detection | Recovery | Prevention |
|---|---|---|---|---|
| F1 | WAF false positive: legitimate customer blocked (e.g., OWASP rule triggers on valid JSON body) | Customer error report; elevated 403 count in Logpush; WAF event in CF dashboard | Roll back specific rule to "log" mode via terraform apply with managed_ruleset_action = "log"; acknowledge customer support ticket |
Phase 1 log-only soak for 7 days; false-positive gate <1% before advancing |
| F2 | WAF false negative: real attack passes all rules | Audit log gap analysis; anomalous order-submission spike in app metrics | Tighten specific rule; escalate to CF support for managed rule update | Overlapping layers — app-layer rate limit catches abuse that WAF misses |
| F3 | CF edge outage: WAF disappears entirely | CF status page; Heroku dyno CPU/memory spike; customer error reports | App-layer rate limiter becomes load-bearing; paper-first gate remains enforced; escalate to CF support; evaluate "under attack" mode on recovery | No single layer is load-bearing; app-layer is always active |
| F4 | WAF rate limit too tight + Stripe/payment webhook backlog → cascading payment failures | Stripe webhook delivery failure alerts; payment processing lag | Immediately set rate_limit_action = "simulate" on affected rule; expand threshold; process backlog |
Webhook bypass rules in custom ruleset (Rule 1); dedicated webhook rate limit exemption |
| F5 | Postmark IP range rotates → customer support emails blocked | Postmark delivery failure bounce alerts; FreeScout ticket creation spike fails | Add new IP range to postmark_ip_list in tfvars + terraform apply; app-layer signature verification remains as fallback |
HMAC signature verification is independent of IP allowlist; failed-signature requests rejected at Layer 4 regardless |
| F6 | CF Access service token blocked by bot rules (new service token not on skip list) | Service returning 403/429 on machine-caller paths; Velvet distribution failures | Add new token header pattern to skip rule; or add token's CF Access Client ID to WAF bypass expression | Rule 2 (service token skip) is broad — matches any non-empty cf-access-client-id |
| F7 | FLAG_ENFORCE_CF_ORIGIN flipped ON prematurely (before WAF Phase 4 soak) |
Direct-Heroku smoke tests fail; monitoring tools using .herokuapp.com URLs break |
Flip flag back to false via heroku config:set FLAG_ENFORCE_CF_ORIGIN=false; no redeploy needed |
Origin guard flip is a dedicated sub-card with explicit gate criteria |
| F8 | WAF logpush destination (S3) reaches retention limit → logs deleted before forensic use | S3 lifecycle rule triggers deletion; incident investigation finds log gap | Extend S3 lifecycle rule; restore from Glacier if available | Retention period must be set before Phase 1; operator decision required (§10) |
| F9 | OWASP ruleset version update (CF auto-updates managed rules) → new false positives in prod | Spike in 403 responses; WAF event log shows new rule IDs | Set new rule to "log" mode via override; evaluate; promote back to "block" | Monitor WAF event log daily; CF changelog alerts on rule version bumps |
| F10 | Quebec geo-block rule (enable_qc_block = true) blocks legitimate non-QC customer via VPN exit node in QC |
Customer reports registration failure; CF country logged as CA-QC |
Operator can temporarily set enable_qc_block = false + terraform apply; advise customer to disable VPN |
Accept this UX tradeoff as per project_quebec_geoblock_decision.md — geo-block is the chosen compliance path |
| F11 | Terraform state drift: WAF rule changed in CF dashboard (not via TF) → next terraform apply reverts it |
terraform plan shows unexpected diff; CF dashboard vs TF state diverges |
Import changed resource into TF state; document change; re-apply | Enforce IaC-only WAF changes; no direct CF dashboard edits after Phase 1 apply |
| F12 | CF Logpush IAM credentials (S3) expire → WAF log gap | No new log files in S3 bucket for >30 min; CloudWatch S3 put metrics flatline | Velvet rotates Logpush S3 credentials; re-enable logpush job | Velvet enrollment expansion (ADR-0051 Layer C) covers S3 IAM credentials |
| F13 | Bot Fight Mode flags a legitimate API client (e.g., mobile app with unusual TLS fingerprint) | Elevated bot score in Logpush; app returns CAPTCHA challenge to mobile client | Lower Bot Fight Mode strictness from "super" to "on"; or add mobile UA pattern to skip rule | Phase 0 decision: start with "on" (not "super"); validate before tightening |
| F14 | WAF custom rule expression error (syntax mistake in Terraform HCL) → terraform apply fails |
TF apply error at plan or apply step | Correct HCL expression; re-apply; no customer impact because apply failed before publishing | HCL expression syntax must be tested in CF dashboard sandbox before committing to TF module |
| F15 | Logpush pushes raw session cookies in exported fields → credential leak | Security audit of Logpush field list | Immediately disable logpush job; rotate all active session tokens; audit which cookies were exported; notify affected users per GDPR breach timeline | Logpush field list in this design explicitly excludes cookie header; ADR-0002 (no stored credentials) applies to log destinations |
| F16 | Per-surface WAF thresholds are miscalibrated for a traffic spike (e.g., marketing campaign) → legitimate customers rate-limited | Elevated 429 responses; customer complaints; traffic spike correlates with marketing event | Temporarily raise global_rate_limit_threshold + terraform apply; consider pre-event threshold lift procedure |
Establish threshold review procedure before planned traffic events |
sequenceDiagram
participant C as Customer Browser
participant WAF as CF Edge (WAF + Rate Limit)
participant CFa as CF Access (operator surfaces)
participant R as Raptor (api.raxx.app)
participant DB as Postgres / Queue
C->>WAF: POST /api/auth/login/verify
WAF->>WAF: Evaluate managed + custom rules
WAF->>WAF: Bot score check (score < threshold)
WAF->>WAF: Rate limit check (under threshold)
Note over WAF: PASS — request forwarded
WAF->>R: Forward with CF-Connecting-IP injected
R->>R: FLAG_ENFORCE_CF_ORIGIN check (CF-Connecting-IP present)
R->>R: WebAuthn verification
R->>DB: Write audit_log row
R-->>C: 200 Set-Cookie: session=...
sequenceDiagram
participant A as Attacker
participant WAF as CF Edge (WAF)
participant R as Raptor
A->>WAF: POST /api/auth/login/verify (credential stuffing, 500 req/min)
WAF->>WAF: Rate limit: 500 req/60s > threshold (20)
WAF->>WAF: Log WAF event (FirewallMatchesActions: block)
WAF-->>A: 429 Too Many Requests (Cloudflare challenge page)
Note over R: Request never reaches Raptor
sequenceDiagram
participant PM as Postmark Delivery
participant WAF as CF Edge (WAF)
participant R as Raptor
PM->>WAF: POST /api/webhooks/postmark (from Postmark IP range)
WAF->>WAF: Custom Rule 1: ip.src in postmark_ip_list AND path eq /api/webhooks/postmark
WAF->>WAF: SKIP — bypass managed rules + bot challenge
WAF->>R: Forward
R->>R: HMAC signature verification (Postmark signing secret)
Note over R: Signature valid → process; invalid → 403
Gate criteria: must complete before Phase 1 Terraform apply
raxx-prod AWS account, or Sentry, or both. (Operator decision required — see §10.) Create S3 bucket + IAM write credentials in SSM before Phase 1.CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN must have Zone:WAF:Edit and Zone:Logs:Edit scopes. Add scopes if missing (per reference_cloudflare_tokens.md — do not confuse with DNS edit token).raxx.app and getraxx.com against what is in Infisical at /MooseQuest/cloudflare/.Duration: 7 days minimum Gate criteria to advance: false-positive rate on legitimate test traffic <1% of total requests
terraform/modules/cf-waf/ with all actions set to "log" / "simulate".FirewallMatchesRuleIDs with legitimate request patterns.owasp_sensitivity and any per-rule overrides as needed.Duration: 72 hours minimum Gate criteria to advance: zero legitimate customer-sim flows challenged; bot/scanner traffic challenged successfully
managed_ruleset_action = "managed_challenge", rate_limit_action = "challenge".Duration: 72 hours minimum Gate criteria to advance: zero false blocks on legitimate traffic; at least one confirmed block of a simulated attack
managed_ruleset_action = "block", rate_limit_action = "ban".Timeline: After Phase 3 sign-off. Target 2026-05-23 UTC (pre-launch).
| Step | Action | Duration |
|---|---|---|
| 4a | Deploy WAF module to prod zones in log mode |
Day 0 |
| 4b | Monitor prod Logpush; gate: false-positive rate <1% vs staging baseline | 7 days |
| 4c | Prod → challenge mode | Day 8 |
| 4d | Monitor challenge rate; gate: no legitimate customers challenged | 48h |
| 4e | Prod → block mode | Day 11 |
| 4f | Monitor 7 days; gate: no customer support tickets attributable to WAF | 7 days |
FLAG_ENFORCE_CF_ORIGIN flipSeparate sub-card (SC-WAF-07). Executes only after Phase 4f gate passes.
heroku config:set FLAG_ENFORCE_CF_ORIGIN=true -a raxx-api-prod >/dev/null 2>&1
heroku config:set FLAG_ENFORCE_CF_ORIGIN=true -a raxx-console-prod >/dev/null 2>&1
Smoke test: curl -I https://api.raxx.app/health → 200. curl -I https://raxx-api-prod.herokuapp.com/health → 403.
No application schema changes. No new Postgres tables.
Terraform state:
terraform/modules/cf-waf/ — no existing state to import.terraform/waf/ — instantiates the module per zone.terraform/cf-access/) is unchanged. WAF Terraform is a separate stack with its own state file to minimize blast radius.Rollback at any phase: Set all module action variables to "log" / "simulate" and re-apply. This puts WAF in observation-only mode without removing any resources. Full rollback is terraform destroy on terraform/waf/ — removes all WAF rulesets and rate limits. CF Access and origin guard are unaffected.
These block Phase 0 and therefore Phase 1. Feature-developer cannot start SC-WAF-01 until the operator resolves them.
| # | Question | Blocking? | Stakes |
|---|---|---|---|
| OQ1 | Logpush destination: S3 bucket only, or Sentry WAF event integration too? S3 enables long-term forensics; Sentry enables real-time alerting. | Blocks Phase 0 | S3 is the baseline; Sentry adds ~$0/mo at current volume but requires Sentry project setup |
| OQ2 | WAF log retention period: 90 days (matches ADR-0051 ops log baseline) or longer for financial audit compliance? | Blocks Phase 0 | Longer retention = higher S3 cost; GDPR requires a defined retention period |
| OQ3 | Bot Fight Mode strictness: "on" vs "super"? "super" challenges more aggressively including TLS fingerprint analysis; higher false-positive risk on API clients | Blocks Phase 0 | Recommend "on" to start; revisit after Phase 1 data |
| OQ4 | Challenge vs block decision for Auth paths: should elevated-threat-score auth requests get a challenge page (adds user friction) or a hard block? | Blocks Phase 1→Phase 2 | Challenge = friction but allows legitimate users through; block = cleaner but risks false lockouts |
| OQ5 | Allow-list management process: Terraform-only (requires PR per change) vs operator can add IPs/ASNs via CF dashboard with post-hoc TF import? | Does not block Phase 1 | TF-only is safer (audit trail, drift prevention per ADR-0051); dashboard-then-import is faster for emergencies |
WAF Logpush exports include ClientIP (full IPv4/IPv6). This is PII under GDPR.
raxx-waf-logs-reader IAM role. Public access must be blocked.ClientRequestBody and the cookie header are never exported in the Logpush field list (see §4 logpush.tf). This is the primary mitigation against WebAuthn credential object or session token leakage.WAF does not store credentials. The only credential-adjacent data in WAF logs is ClientIP and path (e.g., /api/auth/login/verify). Neither can be used to replay an authentication attempt.
If WAF logpush is compromised (S3 bucket exposed), an attacker learns which IPs authenticated when — a correlation attack, not a credential replay. GDPR breach notification applies per ADR-0003. The breachNotification flow must be triggered within 72 hours of confirmed S3 exposure.
Per layer:
| Layer | Kill-switch | Time to effect |
|---|---|---|
| WAF rules | terraform apply with all actions set to "log" |
~30s (CF propagates ruleset changes globally) |
| Rate limits | terraform apply with rate_limit_action = "simulate" |
~30s |
| Origin guard | heroku config:set FLAG_ENFORCE_CF_ORIGIN=false |
~10s (no redeploy) |
| Logpush | cloudflare_logpush_job.enabled = false + apply |
~30s |
/MooseQuest/cloudflare/.execute-api.amazonaws.com (email delivery API Gateway) is not CF-proxied. This design does not cover that surface. SC-WAF-08 tracks the evaluation.
See §8 (rollout) for sequencing. Cards listed here for reference:
| Card | Title | Phase |
|---|---|---|
| SC-WAF-00 | Phase 0 operator actions: CF account WAF settings + Logpush destination | Operator prerequisite |
| SC-WAF-01 | Terraform: terraform/modules/cf-waf/ module + terraform/waf/ root stack, log mode |
Phase 1 |
| SC-WAF-02 | Custom rules per surface: QC geo-block, service-token bypass, webhook bypass | Phase 1 (part of SC-WAF-01 or follow-on) |
| SC-WAF-03 | WAF log-only soak: Logpush → S3, false-positive analysis, staging review | Phase 1 soak |
| SC-WAF-04 | Cutover to challenge mode on staging | Phase 2 |
| SC-WAF-05 | Cutover to block mode on staging + prod log → block rollout | Phase 3 + 4 |
| SC-WAF-06 | Synthetic probes: per-surface flows that must pass WAF without challenge | Parallel with Phase 1 |
| SC-WAF-07 | FLAG_ENFORCE_CF_ORIGIN flip on raxx-api-prod + raxx-console-prod |
Phase 5 (post Phase 4f) |
| SC-WAF-08 | Evaluation: AWS WAF on email delivery API Gateway | Independent |
| SC-WAF-09 | Velvet: enroll Logpush S3 IAM credentials in rotation | Independent (pairs with ADR-0051 SC-N6) |