WAF synthetic probes runbook
System: Cloudflare WAF — synthetic probe runner (SC-WAF-06) Owner: operator Last incident: n/a (initial setup — SC-WAF-06 #1739) Last reviewed: 2026-05-12
Purpose
Before advancing the Cloudflare WAF from log-only (Phase 1) to challenge/block mode (SC-WAF-07, #1741), this probe suite provides a continuous signal that all public Raxx surfaces are reachable and the WAF is not would-block-ing legitimate customer traffic.
The probes exercise representative flows on each surface using the same
headers that real customers send, plus:
- User-Agent: RaxxProbe/1.0 (+https://raxx.app/probe)
- X-Raxx-Probe: sc-waf-06-synthetic
The cf-waf module (merged via #1795) includes a WAF skip rule that
allows probe requests through without challenge. If a probe gets a 403,
that skip rule is broken — which is itself a signal worth surfacing.
Surfaces probed
| Surface | Flows | Zone key |
|---|---|---|
getraxx.com |
landing /, FAQ /faq, pricing /pricing |
getraxx |
raxx.app |
sign-in /signin, dashboard /dashboard, health /health |
raxx-app |
support.raxx.app |
root /, help /help |
raxx-app (subdomain) |
docs.raxx.app |
root /, getting-started /getting-started |
raxx-app (subdomain) |
How to tell it's broken
- Symptom 1: Workflow
WAF Synthetic Probesis red in GitHub Actions. - Symptom 2: Slack DM to Kristerpher (
D0AJ7K184TV) withwaf-synthetic-probe-failure. - Symptom 3: A probe returns HTTP 403 — WAF false-positive on legitimate traffic.
- Symptom 4: A probe returns HTTP 200 but expected content is absent — surface is up but WAF is rewriting/stripping the response.
- Symptom 5:
waf_would_block=trueinartifacts/waf-probe-results.json— WAF is in log mode so requests pass, but the WAF event log records a would-block action. Advancing to challenge/block (SC-WAF-07) would break this flow.
How to diagnose (in order)
-
Open the failing workflow run. Expand the
Run WAF synthetic probesstep. The step summary shows a table of failed probe names and WAF false-positive names. -
Download the
waf-probe-results-<run-id>artifact. Open the JSON. Each probe haspassed,status_code,error,waf_would_block, andwaf_rule_id. -
For
status_code=403: - Go to CF dashboard → raxx.app (or getraxx.com) → Security → WAF → Events. - Filter by the time window. Look for the probe UA (RaxxProbe/1.0). - Check whether the Priority-1 skip rule (CF-Access skip) or theX-Raxx-Probeskip rule fired or is missing. - If the skip rule is absent, the cf-waf module state may have drifted. Runcd terraform/waf && terraform planto detect drift. -
For
waf_would_block=truewithstatus_code=200(WAF in log mode): - The request passed because the WAF is in Phase 1 (log-only). But after SC-WAF-07 (#1741) flips to challenge/block, this flow will break. - Identify thewaf_rule_idfrom the probe result. Look up the rule in the CF WAF Events dashboard. Determine if it is: (a) a legitimate false-positive (the rule is too broad) — file a ticket to tune or add an exception before advancing WAF phase, or (b) a probe configuration error (the expected content or URL is wrong) — fix the probe definition inscripts/waf/probe.py. -
For
content-mismatcherrors (HTTP 200 but wrong body): - The surface is up but the WAF or an upstream change altered the response. - Check the surface directly in a browser to confirm expected content. - If the expected string is stale (page redesign), updateexpected_stringsinscripts/waf/probe.pyand open a PR. -
For
request-error(network failure): - Check Cloudflare status:https://www.cloudflarestatus.com- Check Heroku status:https://status.heroku.com- Check CF Pages status:https://www.cloudflarestatus.com- If a surface is down, escalate per the relevant surface runbook.
Known failure modes
Failure mode A: X-Raxx-Probe skip rule absent or disabled
Symptom: Probes return HTTP 403. waf_would_block=true with no WAF log event
(block fires immediately, not in log mode).
Cause: The Priority-1 skip rule keyed on X-Raxx-Probe was removed or
disabled. This can happen if someone manually edited the CF dashboard (violates
ADR-0077 D2) or if Terraform state drifted.
Fix:
cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
--path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
# Review: expect the skip rule to be added/re-enabled
terraform apply tfplan
Verification: Re-run the probe workflow manually (workflow_dispatch). All
probes return HTTP 200. No 403s.
Failure mode B: WAF false-positive on probe flow (would-block)
Symptom: waf_would_block=true for a probe. WAF is in log mode so HTTP 200
is returned, but the CF event log records a would-block action.
Cause: An OWASP or CF Managed rule is firing on a legitimate request. The
probe URL or body triggered a rule match.
Impact: If WAF advances to Phase 2 (challenge/block) before this is resolved,
customers on this flow will be challenged or blocked.
Fix:
1. Identify the waf_rule_id from the probe result JSON.
2. Look up the rule in the CF WAF Events dashboard.
3. Determine the correct remediation:
- If the probe request is genuinely triggering a rule that should not fire on
legitimate traffic: add a per-rule exception in terraform/modules/cf-waf/main.tf
(overrides block for that rule ID). Open a PR.
- If the probe itself is using a URL/body that looks like an attack: fix the
probe configuration in scripts/waf/probe.py.
4. Do NOT advance WAF to Phase 2 (SC-WAF-07) while any probe has waf_would_block=true.
Failure mode C: Probe content mismatch after page redesign
Symptom: HTTP 200 but content-mismatch error. expected_strings no longer
appear on the page.
Cause: Marketing or product redesigned a surface and removed content the probe
was checking for.
Fix: Update the probe's expected_strings list in scripts/waf/probe.py.
Open a PR. This is a probe maintenance task, not a WAF incident.
Failure mode D: Surface offline (DNS, CF Pages, Heroku)
Symptom: request-error or HTTP 5xx across multiple probes for one surface.
Cause: The surface itself is down (not a WAF issue).
Fix: Diagnose the surface using its dedicated runbook:
- raxx.app API: docs/ops/runbooks/heroku.md
- getraxx.com: CF Pages deploy log
- support.raxx.app: docs/ops/runbooks/freescout.md
- docs.raxx.app: docs/ops/runbooks/docs-customer-deploy.md
Failure mode E: CF API token for WAF log check expired
Symptom: waf_would_block=false for all probes even when WAF events exist.
No error in probe output. WAF log check returns no data.
Cause: CF_WAF_PROBE_READ_TOKEN secret expired or was revoked.
Impact: WAF false-positive detection is silently disabled. HTTP 200 / content
checks still run; only the would-block detection is dark.
Fix:
1. Mint a new CF API token with Zone:Read + Zone:WAF:Read scope.
2. Store in repo secrets as CF_WAF_PROBE_READ_TOKEN.
3. Re-run the probe workflow to verify WAF log check resumes.
How to run probes manually
# Run all surfaces (no WAF log check — no CF token required)
python3 scripts/waf/probe.py
# Run one surface only
python3 scripts/waf/probe.py --surfaces getraxx
# Run with WAF log check (requires CF read token + zone IDs)
export CF_API_TOKEN=$(infisical secrets get CF_WAF_PROBE_READ_TOKEN \
--path /MooseQuest/cloudflare/ --plain)
export CF_ZONE_IDS="getraxx=<getraxx_zone_id>,raxx-app=<raxx_app_zone_id>"
python3 scripts/waf/probe.py --json
# JSON output to file (useful for CI artifact matching)
python3 scripts/waf/probe.py --output /tmp/probe-results.json
cat /tmp/probe-results.json | python3 -m json.tool
Adding a new surface
To add a new surface to the probe list:
- Open
scripts/waf/probe.py. - Add a new entry to the
SURFACESlist. Follow the existing pattern:
{
"surface": "my-surface",
"zone_key": "raxx-app", # CF zone that owns this subdomain
"flows": [
{
"name": "my-surface-root",
"url": "https://my-surface.raxx.app/",
"method": "GET",
"expected_strings": ["my content marker"],
"allow_redirect": True,
"expected_status": [200],
},
],
},
-
For
expected_strings: choose strings that are stable (not release-version dependent), case-insensitive, and present in the page HTML even before JavaScript hydration. -
For
zone_key: use the key that maps to the Cloudflare zone ID inCF_ZONE_IDS(repo secret). Subdomains ofraxx.appuseraxx-app. New apex domains need a new zone key and a corresponding entry inCF_ZONE_IDS. -
Open a PR. Include a note in the PR body indicating the new surface and the expected strings chosen.
-
After the PR merges, manually trigger the
WAF Synthetic Probesworkflow to confirm the new probe passes before the next scheduled run.
Secrets required
| Secret name | Description | Scope |
|---|---|---|
CF_WAF_PROBE_READ_TOKEN |
CF API token for WAF Firewall Events read access | Zone:Read, Zone:WAF:Read on all probe zones |
CF_ZONE_IDS_PROBE |
Comma-separated key=zone_id pairs for probe surfaces |
Not a secret but stored as repo secret for convenience |
SLACK_BOT_TOKEN |
Slack bot OAuth token for DM notifications | chat:write scope |
CF_WAF_PROBE_READ_TOKEN and CF_ZONE_IDS_PROBE are optional — the probes
degrade gracefully (HTTP 200 + content check only, no WAF log check) when absent.
WAF phase gate dependency
The probe workflow is a hard gate for SC-WAF-07 (#1741).
Before the operator flips managed_ruleset_action from "log" to
"managed_challenge" or "block":
- [ ] The probe workflow must have run at least 3 consecutive passes since the WAF was last modified.
- [ ] No probe has
waf_would_block=truein the most recent run. - [ ] All surfaces show HTTP 200 with expected content.
See docs/ops/runbooks/waf.md §Phase advancement for the full gate checklist.
Cross-references
- Probe script:
scripts/waf/probe.py - Workflow:
.github/workflows/waf-synthetic-probe.yml - WAF runbook:
docs/ops/runbooks/waf.md - WAF module (frozen):
terraform/modules/cf-waf/ - WAF root stack (frozen):
terraform/waf/ - SC-WAF-01 (#1737) — WAF module (merged via #1795)
- SC-WAF-06 (#1739) — this probe suite
- SC-WAF-07 (#1741) — enforce flag flip (depends on probe green)