Raxx · internal docs

internal · gated

WAF synthetic probes runbook

System: Cloudflare WAF — synthetic probe runner (SC-WAF-06) Owner: operator Last incident: n/a (initial setup — SC-WAF-06 #1739) Last reviewed: 2026-05-12

Purpose

Before advancing the Cloudflare WAF from log-only (Phase 1) to challenge/block mode (SC-WAF-07, #1741), this probe suite provides a continuous signal that all public Raxx surfaces are reachable and the WAF is not would-block-ing legitimate customer traffic.

The probes exercise representative flows on each surface using the same headers that real customers send, plus: - User-Agent: RaxxProbe/1.0 (+https://raxx.app/probe) - X-Raxx-Probe: sc-waf-06-synthetic

The cf-waf module (merged via #1795) includes a WAF skip rule that allows probe requests through without challenge. If a probe gets a 403, that skip rule is broken — which is itself a signal worth surfacing.

Surfaces probed

Surface Flows Zone key
getraxx.com landing /, FAQ /faq, pricing /pricing getraxx
raxx.app sign-in /signin, dashboard /dashboard, health /health raxx-app
support.raxx.app root /, help /help raxx-app (subdomain)
docs.raxx.app root /, getting-started /getting-started raxx-app (subdomain)

How to tell it's broken

How to diagnose (in order)

  1. Open the failing workflow run. Expand the Run WAF synthetic probes step. The step summary shows a table of failed probe names and WAF false-positive names.

  2. Download the waf-probe-results-<run-id> artifact. Open the JSON. Each probe has passed, status_code, error, waf_would_block, and waf_rule_id.

  3. For status_code=403: - Go to CF dashboard → raxx.app (or getraxx.com) → Security → WAF → Events. - Filter by the time window. Look for the probe UA (RaxxProbe/1.0). - Check whether the Priority-1 skip rule (CF-Access skip) or the X-Raxx-Probe skip rule fired or is missing. - If the skip rule is absent, the cf-waf module state may have drifted. Run cd terraform/waf && terraform plan to detect drift.

  4. For waf_would_block=true with status_code=200 (WAF in log mode): - The request passed because the WAF is in Phase 1 (log-only). But after SC-WAF-07 (#1741) flips to challenge/block, this flow will break. - Identify the waf_rule_id from the probe result. Look up the rule in the CF WAF Events dashboard. Determine if it is: (a) a legitimate false-positive (the rule is too broad) — file a ticket to tune or add an exception before advancing WAF phase, or (b) a probe configuration error (the expected content or URL is wrong) — fix the probe definition in scripts/waf/probe.py.

  5. For content-mismatch errors (HTTP 200 but wrong body): - The surface is up but the WAF or an upstream change altered the response. - Check the surface directly in a browser to confirm expected content. - If the expected string is stale (page redesign), update expected_strings in scripts/waf/probe.py and open a PR.

  6. For request-error (network failure): - Check Cloudflare status: https://www.cloudflarestatus.com - Check Heroku status: https://status.heroku.com - Check CF Pages status: https://www.cloudflarestatus.com - If a surface is down, escalate per the relevant surface runbook.

Known failure modes

Failure mode A: X-Raxx-Probe skip rule absent or disabled

Symptom: Probes return HTTP 403. waf_would_block=true with no WAF log event (block fires immediately, not in log mode). Cause: The Priority-1 skip rule keyed on X-Raxx-Probe was removed or disabled. This can happen if someone manually edited the CF dashboard (violates ADR-0077 D2) or if Terraform state drifted. Fix:

cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
  --path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
# Review: expect the skip rule to be added/re-enabled
terraform apply tfplan

Verification: Re-run the probe workflow manually (workflow_dispatch). All probes return HTTP 200. No 403s.

Failure mode B: WAF false-positive on probe flow (would-block)

Symptom: waf_would_block=true for a probe. WAF is in log mode so HTTP 200 is returned, but the CF event log records a would-block action. Cause: An OWASP or CF Managed rule is firing on a legitimate request. The probe URL or body triggered a rule match. Impact: If WAF advances to Phase 2 (challenge/block) before this is resolved, customers on this flow will be challenged or blocked. Fix: 1. Identify the waf_rule_id from the probe result JSON. 2. Look up the rule in the CF WAF Events dashboard. 3. Determine the correct remediation: - If the probe request is genuinely triggering a rule that should not fire on legitimate traffic: add a per-rule exception in terraform/modules/cf-waf/main.tf (overrides block for that rule ID). Open a PR. - If the probe itself is using a URL/body that looks like an attack: fix the probe configuration in scripts/waf/probe.py. 4. Do NOT advance WAF to Phase 2 (SC-WAF-07) while any probe has waf_would_block=true.

Failure mode C: Probe content mismatch after page redesign

Symptom: HTTP 200 but content-mismatch error. expected_strings no longer appear on the page. Cause: Marketing or product redesigned a surface and removed content the probe was checking for. Fix: Update the probe's expected_strings list in scripts/waf/probe.py. Open a PR. This is a probe maintenance task, not a WAF incident.

Failure mode D: Surface offline (DNS, CF Pages, Heroku)

Symptom: request-error or HTTP 5xx across multiple probes for one surface. Cause: The surface itself is down (not a WAF issue). Fix: Diagnose the surface using its dedicated runbook: - raxx.app API: docs/ops/runbooks/heroku.md - getraxx.com: CF Pages deploy log - support.raxx.app: docs/ops/runbooks/freescout.md - docs.raxx.app: docs/ops/runbooks/docs-customer-deploy.md

Failure mode E: CF API token for WAF log check expired

Symptom: waf_would_block=false for all probes even when WAF events exist. No error in probe output. WAF log check returns no data. Cause: CF_WAF_PROBE_READ_TOKEN secret expired or was revoked. Impact: WAF false-positive detection is silently disabled. HTTP 200 / content checks still run; only the would-block detection is dark. Fix: 1. Mint a new CF API token with Zone:Read + Zone:WAF:Read scope. 2. Store in repo secrets as CF_WAF_PROBE_READ_TOKEN. 3. Re-run the probe workflow to verify WAF log check resumes.

How to run probes manually

# Run all surfaces (no WAF log check — no CF token required)
python3 scripts/waf/probe.py

# Run one surface only
python3 scripts/waf/probe.py --surfaces getraxx

# Run with WAF log check (requires CF read token + zone IDs)
export CF_API_TOKEN=$(infisical secrets get CF_WAF_PROBE_READ_TOKEN \
  --path /MooseQuest/cloudflare/ --plain)
export CF_ZONE_IDS="getraxx=<getraxx_zone_id>,raxx-app=<raxx_app_zone_id>"
python3 scripts/waf/probe.py --json

# JSON output to file (useful for CI artifact matching)
python3 scripts/waf/probe.py --output /tmp/probe-results.json
cat /tmp/probe-results.json | python3 -m json.tool

Adding a new surface

To add a new surface to the probe list:

  1. Open scripts/waf/probe.py.
  2. Add a new entry to the SURFACES list. Follow the existing pattern:
{
    "surface": "my-surface",
    "zone_key": "raxx-app",  # CF zone that owns this subdomain
    "flows": [
        {
            "name": "my-surface-root",
            "url": "https://my-surface.raxx.app/",
            "method": "GET",
            "expected_strings": ["my content marker"],
            "allow_redirect": True,
            "expected_status": [200],
        },
    ],
},
  1. For expected_strings: choose strings that are stable (not release-version dependent), case-insensitive, and present in the page HTML even before JavaScript hydration.

  2. For zone_key: use the key that maps to the Cloudflare zone ID in CF_ZONE_IDS (repo secret). Subdomains of raxx.app use raxx-app. New apex domains need a new zone key and a corresponding entry in CF_ZONE_IDS.

  3. Open a PR. Include a note in the PR body indicating the new surface and the expected strings chosen.

  4. After the PR merges, manually trigger the WAF Synthetic Probes workflow to confirm the new probe passes before the next scheduled run.

Secrets required

Secret name Description Scope
CF_WAF_PROBE_READ_TOKEN CF API token for WAF Firewall Events read access Zone:Read, Zone:WAF:Read on all probe zones
CF_ZONE_IDS_PROBE Comma-separated key=zone_id pairs for probe surfaces Not a secret but stored as repo secret for convenience
SLACK_BOT_TOKEN Slack bot OAuth token for DM notifications chat:write scope

CF_WAF_PROBE_READ_TOKEN and CF_ZONE_IDS_PROBE are optional — the probes degrade gracefully (HTTP 200 + content check only, no WAF log check) when absent.

WAF phase gate dependency

The probe workflow is a hard gate for SC-WAF-07 (#1741).

Before the operator flips managed_ruleset_action from "log" to "managed_challenge" or "block":

See docs/ops/runbooks/waf.md §Phase advancement for the full gate checklist.

Cross-references