WAF runbook
System: Cloudflare WAF — raxx.app zone (raxx.app, api.raxx.app, console.raxx.app, vault.raxx.app, tickets.raxx.app) + getraxx.com zone (getraxx.com, www.getraxx.com)
Owner: operator
Last incident: 2026-06-19 (BFM re-enabled with WAF skip rules; see docs/incidents/2026-06-19-bfm-restored.md)
Last reviewed: 2026-06-19
How to tell it's broken
- Symptom 1: Customers reporting HTTP 403 on paths that should succeed (false positive). Check CF WAF Events dashboard and correlate rule IDs with recent Logpush data.
- Symptom 2: Attack traffic reaching Raptor despite WAF being in block mode. Check Logpush for missing
FirewallMatchesActionson attack-pattern requests. - Symptom 3: Postmark webhook delivery failures.
FirewallMatchesActionscontains block on/api/webhooks/postmarkfrom Postmark IP ranges. - Symptom 4: CF Access service token calls returning 403/challenge from console, Velvet, or CI. Check
cf-access-client-idskip rule is present in custom ruleset. - Symptom 4b: Infisical CLI returning CF error 1010 on
vault.raxx.app. This is the vault-specific BFM issue fixed in #2143 — check that the Priority 0.5 vault skip rule is present and enabled (see §CF-Access skip rules below anddocs/ops/runbooks/vault-access.md). - Symptom 4c:
billing-retention-cronortrace-integrity-cronGH Actions workflow returns HTTP 403 managed challenge onapi.raxx.app/api/internal/jobs/*. Check the Priority 0.6 Raptor internal jobs skip rule is present and enabled (ref:raptor_internal_jobs_ci_skip). See §Failure mode C2 below. - Symptom 5:
terraform planshows unexpected diffs to CF Access resources (the WAF stack should NOT touchterraform/cf-access/state). - Symptom 6: Operator cannot reach
console.raxx.app,vault.raxx.app, ortickets.raxx.app. WAF rate limit firing before CF Access (operator IP hit the 60/min ceiling). Check operator_surface_hostnames rate limit rule in the raxx.app zone rate limit ruleset. - Symptom 7:
terraform applyfails with "ruleset already exists" on raxx.app http_request_firewall_custom phase. This is the cross-stack conflict — see §Cross-stack ruleset migration before proceeding.
How to diagnose (in order)
- Check CF WAF Events dashboard —
raxx.apporgetraxx.comzone → Security → WAF. Filter by last 30 min. Expected: zero blocking actions in Phase 1 (log-only). - Check Logpush S3 bucket (once SC-WAF-00 is complete) —
FirewallMatchesActionsfield. A block action in Phase 1 indicates a rule error. - Correlate
FirewallMatchesRuleIDswith the ruleset IDs fromterraform output. Identify which ruleset (managed vs custom vs rate limit) fired. - For Postmark webhook failures:
GET /zones/{zone_id}/rulesets/{custom_waf_ruleset_id}— verify the Postmark IP ranges in rule Priority 2 match the current Postmark IP list. - For service token block:
GET /zones/{zone_id}/rulesets/{custom_waf_ruleset_id}— verify Priority 1 skip rule expression is(len(http.request.headers["cf-access-client-id"]) gt 0)and is enabled. - Check Terraform state drift:
cd terraform/waf && terraform plan. Any non-zero diff against a known-good apply indicates dashboard drift (Failure Mode F11).
Token setup
This stack requires a CF API token with Zone:WAF:Edit + Zone:Logs:Edit on both zones.
Verify your token has the correct scopes before applying:
curl -s -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
https://api.cloudflare.com/client/v4/user/tokens/verify | python3 -m json.tool
The CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN documented in terraform/README.md was
confirmed to NOT have WAF:Edit scope as of 2026-04-30 (see cloudflare-rate-limiting.md).
If that token has not been updated, mint a new WAF-scoped token:
- CF dashboard → My Profile → API Tokens → Create Token
- Permissions: Zone > WAF > Edit, Zone > Logs > Edit
- Zone resources: Include > Specific zone > raxx.app AND getraxx.com (both)
- Store in Infisical:
POST /api/v3/secrets/raw/CF_WAF_EDITat path/MooseQuest/cloudflare/ - Export at apply time:
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT --path /MooseQuest/cloudflare/ --plain)
Known failure modes
Failure mode A: False positive — legitimate customer blocked (F1)
Symptom: Customer reports 403 on a valid request. WAF Events log shows an OWASP or CF Managed rule firing on a legitimate path.
Cause: OWASP CRS triggering on valid JSON body or API field names containing SQL/XSS patterns. Most common on api.raxx.app with complex order payloads.
Fix:
# Identify the rule ID from WAF Events or Logpush
# Edit terraform/waf/terraform.tfvars: set owasp_action = "log" to revert to observation
# Or apply a per-rule override in terraform/modules/cf-waf/main.tf overrides block
cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT --path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
terraform apply tfplan
Verification: Customer can complete the previously blocked action. WAF Events shows "log" not "block" for the rule. Phase impact: Rolling back to log is always safe. Docs: waf-strategy.md §8 Phase 1, Failure Mode F1.
Failure mode B: Postmark webhook blocked (F5)
Symptom: Postmark webhook delivery failures. FreeScout inbound email stops. Logpush shows block on /api/webhooks/postmark.
Cause: Postmark rotated their delivery IP ranges without notice.
Fix:
# Get current Postmark IP ranges from:
# https://postmarkapp.com/support/article/800-ips-for-rate-limiting-or-firewall-rules
# Update terraform/waf/main.tf postmark_ip_ranges in both module calls
# Then:
cd terraform/waf
terraform plan -out=tfplan
terraform apply tfplan
Verification: curl -X POST https://api.raxx.app/api/webhooks/postmark from a Postmark IP returns 200 (not 403). Logpush shows no block on this path.
Failure mode C: CF Access service token challenged or blocked (F6)
Symptom: Velvet, CI, or Console machine calls to Queue or Raptor returning 403 or CAPTCHA challenge. Cause: New service token not matching the skip rule, or BFM skip rule accidentally disabled. Fix:
# Verify the skip rule in CF dashboard:
# raxx.app zone → Security → WAF → Custom rules → "Priority 1 — skip BFM..."
# Confirm: expression = (len(http.request.headers["cf-access-client-id"]) gt 0)
# Confirm: Action = Skip, Status = Enabled
# If the rule is present but not working, verify the service token is sending
# the CF-Access-Client-Id header. Trace with:
curl -v -H "CF-Access-Client-Id: <token-id>" -H "CF-Access-Client-Secret: <token-secret>" \
https://api.raxx.app/health
Verification: Machine caller returns expected response (not 403/challenge). Logpush shows no block on affected path.
Failure mode D: Rate limit too tight — Stripe/payment webhook backlog (F4)
Symptom: Stripe webhook delivery failures. Payment processing lag. Rate limit action fires on /api/v1/billing/webhook.
Cause: Rate limit threshold on global or order path too tight during a Stripe event replay burst.
Fix:
# Immediately revert rate_limit_action to "log" (observation mode):
# In terraform.tfvars: rate_limit_action = "log"
cd terraform/waf
terraform plan -out=tfplan
terraform apply tfplan
Verification: Stripe webhook delivery resumes. Check Stripe dashboard for webhook retry status.
Failure mode E: Terraform state drift (F11)
Symptom: terraform plan shows diff for a resource that was not intentionally changed. Indicates a direct CF dashboard edit (not via Terraform).
Fix:
cd terraform/waf
# Review the diff carefully. If the dashboard state is correct:
# Import the changed resource into TF state and update main.tf to match.
# If TF state is correct:
terraform apply -target=<resource_address>
Prevention: All WAF changes must go through Terraform. No direct CF dashboard edits after first apply (ADR-0077 D2, ADR-0051).
Phase advancement
Phase transitions require explicit operator sign-off. Do not advance phases autonomously.
| Phase | tfvars change | Gate criteria |
|---|---|---|
| Phase 1 → Phase 2 | managed_ruleset_action = "managed_challenge", rate_limit_action = "managed_challenge" |
7-day log soak; false-positive rate <1% |
| Phase 2 → Phase 3 | managed_ruleset_action = "block", rate_limit_action = "block" |
72h; zero legitimate flows challenged |
| Phase 4 → Phase 5 | n/a (flag flip — FLAG_ENFORCE_CF_ORIGIN) |
7-day Phase 4 soak; SC-WAF-07 (#1741) |
Always run terraform plan and review before terraform apply on any phase change.
Emergency stop (kill-switch)
Fastest rollback: set all actions to log/simulate and apply. ~30s CF propagation.
cd terraform/waf
# Edit terraform.tfvars:
# managed_ruleset_action = "log"
# owasp_action = "log"
# auth_challenge_action = "log"
# rate_limit_action = "simulate"
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT --path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
terraform apply tfplan
Full removal (removes all WAF rulesets and rate limits; CF Access unaffected):
cd terraform/waf
terraform destroy
Note: terraform destroy removes WAF only. It does not touch terraform/cf-access/ (separate state file).
Pre-apply token verification (REQUIRED — do not skip)
Before running terraform apply, verify the active CF API token is valid AND has the
required scopes. A token that passes the /verify check but is missing a scope will
produce a silent partial-apply: some resources succeed while others return 403, leaving
the WAF in an indeterminate state. This step prevents that class of incident (see #2378
for a missed-scope apply that required a full state reconciliation).
Step A — token liveness:
curl -sS -H "Authorization: Bearer $CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN" \
https://api.cloudflare.com/client/v4/user/tokens/verify
Expected: {"result": {"status": "active"}, ...}
If the response contains "code": 10000 (auth error) or "code": 7003 (no route to
resource), STOP. Do not run terraform apply. Surface the full response body to the
operator and request a token refresh before proceeding.
Step B — account-level Firewall Services scope:
curl -sS -H "Authorization: Bearer $CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN" \
"https://api.cloudflare.com/client/v4/accounts/22b5c35090724fbf05db6d4f501ac821/firewall/access_rules/rules?per_page=1"
Expected: HTTP 200 with "success": true. An errors array containing code 10000 or
7003 means the token lacks account-level Firewall Services scope. STOP and escalate.
Step C — zone-level WAF scope:
curl -sS -H "Authorization: Bearer $CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN" \
"https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/rulesets?per_page=1"
Expected: HTTP 200 with "success": true. Code 10000 or 7003 means the token lacks
Zone WAF scope on raxx.app. STOP and escalate.
All three steps must return clean before proceeding to terraform apply. If the WAF
token (CF_WAF_EDIT) passes Steps A–C but CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN
does not, see docs/ops/runbooks/cloudflare-tokens.md — these are distinct tokens with
distinct scopes. Export the correct token before running Terraform.
How to run this stack
Full apply (both zones — requires cross-stack migration first for raxx.app)
Prerequisite: the cross-stack state migration in §Cross-stack ruleset migration must be complete before applying raxx.app. getraxx.com has no conflict and can be applied at any time.
cd terraform/waf
# 0. Run pre-apply token verification (§Pre-apply token verification above) FIRST.
# 1. Set the CF WAF-scoped API token
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
--path /MooseQuest/cloudflare/ --plain)
# 2. Inject zone IDs via environment (do not edit terraform.tfvars values)
export TF_VAR_raxx_app_zone_id=$(infisical secrets get CF_ZONE_ID_RAXX_APP \
--path /MooseQuest/cloudflare/ --plain)
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
--path /MooseQuest/cloudflare/ --plain)
# 3. If SC-WAF-00 is complete and CF plan upgrade (#2386) resolved, set Logpush vars from SSM:
# export TF_VAR_logpush_destination_conf=$(aws ssm get-parameter \
# --name /raxx/waf/logpush_destination_conf --with-decryption \
# --query Parameter.Value --output text)
# export TF_VAR_logpush_ownership_challenge=$(aws ssm get-parameter \
# --name /raxx/waf/logpush_ownership_challenge --with-decryption \
# --query Parameter.Value --output text)
# 4. Init + plan + apply
terraform init
terraform plan -out=tfplan
# Review: all changes must be additive; no modifications to cf-access/ resources
terraform apply tfplan
# 5. Verify
terraform output
# Check CF dashboard: Security → WAF → Custom rules
# Expected: all rules show mode "log"; no blocking actions
getraxx.com only (no cross-stack conflict — safe to run immediately)
getraxx.com has no prior custom-phase ruleset. This subset of resources can be applied without completing the raxx.app state migration. Useful during the T-4 launch window when the full migration is pending.
cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
--path /MooseQuest/cloudflare/ --plain)
export TF_VAR_raxx_app_zone_id=$(infisical secrets get CF_ZONE_ID_RAXX_APP \
--path /MooseQuest/cloudflare/ --plain)
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
--path /MooseQuest/cloudflare/ --plain)
terraform init
# Target getraxx module resources only:
terraform plan -target=module.waf_getraxx -out=tfplan-getraxx
# Review plan: expect 3 new resources (managed_waf, custom_waf, rate_limits)
# and 1 zone_settings_override for getraxx.com. No raxx.app resources in plan.
terraform apply tfplan-getraxx
After applying getraxx.com, complete the cross-stack state migration (§Cross-stack ruleset migration) to unblock the raxx.app apply.
CF-Access skip rules (BFM bypass for machine callers)
CF WAF (Layer 1) evaluates before CF Access (Layer 2). Machine callers using CF Access service tokens egress from AWS/Azure ASNs which score high on bot detection. Without explicit skip rules, BFM fires a managed challenge before the service-token is authenticated — returning HTTP 403 to CI and vault tooling.
The raxx.app custom WAF ruleset contains three CF-Access skip rules, applied in
priority order:
Priority 0.5 — Vault Infisical auth skip
Skips the full current ruleset for Infisical CLI machine-identity (universal-auth)
requests to vault.raxx.app/api/v1/auth/* that carry a CF-Access-Client-Id header.
Without this rule, BFM trips on AWS ASN egress (AS14618/AS16509) and returns CF error 1010 before CF Access can authenticate the service token. Root cause of #680.
Controlled by vault_infisical_auth_skip_enabled = true in terraform/waf/main.tf.
Enabled for raxx.app only; false for getraxx.com.
CF expression:
(http.host eq "vault.raxx.app" and starts_with(http.request.uri.path, "/api/v1/auth/") and len(http.request.headers["cf-access-client-id"]) gt 0)
Priority 0.6 — Raptor internal jobs skip (#3621)
Skips the full current ruleset for GH Actions compliance crons
(billing-retention-cron, trace-integrity-cron) posting to
api.raxx.app/api/internal/jobs/* that carry a CF-Access-Client-Id header.
Without this rule, BFM fires on Azure ASN egress (AS8075 / eastus) before CF Access can authenticate the service token — returning HTTP 403 managed challenge. Root cause of the cron failures described in #3621. Prerequisite for re-enabling BFM (#3634).
Controlled by raptor_internal_jobs_skip_enabled = true in terraform/waf/main.tf.
Enabled for raxx.app only; false for getraxx.com.
CF expression (ref: raptor_internal_jobs_ci_skip):
(http.host eq "api.raxx.app" and starts_with(http.request.uri.path, "/api/internal/jobs/") and len(http.request.headers["cf-access-client-id"]) gt 0)
Verification after apply:
# Trigger compliance crons; both must return HTTP 200 (not 403):
gh workflow run billing-retention-cron.yml
gh workflow run trace-integrity-cron.yml
# Or probe directly with a valid CF Access service token:
curl -sI \
-H "CF-Access-Client-Id: <client_id_from_vault>" \
-H "CF-Access-Client-Secret: <client_secret_from_vault>" \
https://api.raxx.app/api/internal/jobs/billing-retention
# Expected: HTTP 200 (or 405 Method Not Allowed if GET; the 403 challenge is gone)
Priority 1 — Generic CF-Access service token skip
Skips BFM, hot-linking protection, UA blocking, and security-level gate for ALL
requests that carry a non-empty CF-Access-Client-Id header, zone-wide. This is
the broadest skip and covers Velvet, console, and CI runners for all paths.
CF expression:
(len(http.request.headers["cf-access-client-id"]) gt 0)
See docs/ops/runbooks/vault-access.md for vault-specific verification commands.
Failure mode C2 — internal jobs path challenged (F6b)
Symptom: billing-retention-cron or trace-integrity-cron GH Actions workflow
returns HTTP 403 managed challenge on api.raxx.app/api/internal/jobs/*.
Cause: Priority 0.6 skip rule missing or disabled; BFM enabled on zone.
Fix:
# Verify rule is present and enabled in CF dashboard:
# raxx.app → Security → WAF → Custom rules → ref: raptor_internal_jobs_ci_skip
# Confirm: action=skip, enabled=true
# If missing: re-apply terraform/waf with raptor_internal_jobs_skip_enabled = true
cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
--path /MooseQuest/cloudflare/ --plain)
export TF_VAR_raxx_app_zone_id=$(infisical secrets get CF_ZONE_ID_RAXX_APP \
--path /MooseQuest/cloudflare/ --plain)
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
--path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
terraform apply tfplan
Verification: Re-run the affected cron workflow; confirm HTTP 200 from endpoint.
Logpush setup dependency (SC-WAF-00)
The logpush_destination_conf and logpush_ownership_challenge variables are empty
by default. The Logpush job is not created until SC-WAF-00 (#1736) completes.
SC-WAF-00 operator actions:
1. Create S3 bucket for WAF logs (raxx-waf-logs-prod recommended).
2. Create IAM user with s3:PutObject on that bucket only.
3. Run Cloudflare ownership challenge for the destination.
4. Store destination conf and challenge token in SSM:
- /raxx/waf/logpush_destination_conf
- /raxx/waf/logpush_ownership_challenge
5. Re-apply this stack with the SSM values injected via TF_VAR_*.
Cross-stack ruleset migration (operator state operations — post #2328)
Code migration status: COMPLETE as of 2026-05-17 (Issue #2183, PR #2527). State migration: pending operator execution (Issue #2378, Option C locked 2026-05-19).
The freescout_lambda_skip rule has been moved out of terraform/cf-access/freescout_service_token.tf and into terraform/modules/cf-waf/main.tf as a Priority 0 dynamic rule inside cloudflare_ruleset.custom_waf. The cloudflare_ruleset.freescout_lambda_skip resource declaration has been removed from terraform/cf-access. PR #2527 also added the Priority 0.5 vault Infisical auth skip rule into the same module.
What remains: two state operations. These require live CF credentials (#2328 token refresh) and cannot be executed until tokens are valid. Until then, terraform plan on terraform/cf-access will show the ruleset as a planned destroy — do not run terraform apply on cf-access until Step 2 below is complete.
Automated migration script (recommended):
# Run from repo root — handles Steps 1–5 interactively with guards:
bash scripts/waf-state-migrate-raxx-app.sh
# If the migration fails at any step or terraform plan shows drift:
bash scripts/waf-state-migrate-raxx-app-rollback.sh
Manual steps (for reference or partial recovery):
Step 1 — import the live zone-default ruleset into terraform/waf state:
cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_ACCESS_MGMT \
--path /MooseQuest/cloudflare/ --plain)
export TF_VAR_raxx_app_zone_id="f12dbb5cac57d5591a5058874498a6d1"
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
--path /MooseQuest/cloudflare/ --plain)
terraform init
terraform import module.waf_raxx_app.cloudflare_ruleset.custom_waf \
zones/f12dbb5cac57d5591a5058874498a6d1/17dc768ccadf4d02ae279e133b7b5bfd
Step 2 — remove from terraform/cf-access state (does NOT destroy the CF resource):
cd terraform/cf-access
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_ACCESS_MGMT \
--path /MooseQuest/cloudflare/ --plain)
terraform init
terraform state rm cloudflare_ruleset.freescout_lambda_skip
Step 3 — plan both stacks; both must show zero diff on the ruleset. Switch to the WAF-scoped token:
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
--path /MooseQuest/cloudflare/ --plain)
cd terraform/waf && terraform plan -out=tfplan-waf-migration
cd terraform/cf-access && terraform plan
Expected terraform/waf plan: new resources for managed WAF, rate limits, zone settings — zero destroys. The custom_waf ruleset shows as an in-place update with Priority 0 (freescout Lambda skip), Priority 0.5 (vault Infisical auth skip, added by #2527), and Priorities 1–5 all present.
Expected terraform/cf-access plan: zero changes (ruleset resource gone from both config and state).
If either plan shows the ruleset as a destroy, STOP and run the rollback script.
Step 4 — apply terraform/waf only:
cd terraform/waf
terraform apply tfplan-waf-migration
Step 5 — verify in CF dashboard:
raxx.appzone → Security → WAF → Custom rules: Priority 0 skip rule present fortickets.raxx.app/apiwith CF-Access-Client-Id condition.- Priority 0.5 skip rule present for
vault.raxx.app/api/v1/auth/(Infisical CLI auth). - All other custom rules (Priority 1–5) present and in log mode.
- No second custom ruleset for raxx.app (CF enforces one per phase per zone).
- Run
terraform planon both stacks post-apply: both must show "No changes."
Rollback: if the apply produces unexpected drift or any rule is missing post-apply:
bash scripts/waf-state-migrate-raxx-app-rollback.sh
The rollback removes the ruleset from terraform/waf state and re-imports it into terraform/cf-access state. The CF ruleset itself is not modified.
Tracking: Required before #1735 and #2378 can be closed.
Bot Fight Mode
Current state (as of 2026-06-19T00:00:00Z UTC)
fight_mode = true — re-enabled by sre-agent after WAF CF-Access skip rules were applied. See docs/incidents/2026-06-19-bfm-restored.md and Issue #3634.
Live skip rules in custom ruleset 17dc768ccadf4d02ae279e133b7b5bfd (raxx.app zone):
| Priority | ref | Expression summary | bic skip |
|---|---|---|---|
| 0 | c8c0b91d4e2a4f99bc62237ad6a498b9 |
tickets.raxx.app/api + CF-Access-Client-Id |
yes |
| 0.5 | vault_infisical_auth_ci_skip |
vault.raxx.app/api/v1/auth/ + CF-Access-Client-Id |
yes |
| 0.6 | raptor_internal_jobs_ci_skip |
api.raxx.app/api/internal/jobs/ + CF-Access-Client-Id |
yes |
| 1 | generic_cf_access_service_token_skip |
any host/path + CF-Access-Client-Id | yes |
Note: These rules were applied directly via the CF Rulesets API (WAF token CF_WAF_EDIT_RAXX_APP) bypassing Terraform, because the cross-stack state migration (#2378 Option C) was not yet executed. See §Terraform state drift note below.
Re-enable path (for future use — BFM is already ON):
If BFM is ever disabled again (operator-authorized emergency only), re-enable by:
- Confirm all four skip rules in §CF-Access skip rules are still live in custom ruleset
17dc768ccadf4d02ae279e133b7b5bfd. - Re-enable BFM via
CLOUDFLARE_RAXX_AUTOMATION_API_TOKENfrom vault:
python3 << 'EOF'
import os, urllib.request, urllib.parse, json
vault = os.environ['INFISICAL_HOST']
cid = os.environ['CF_ACCESS_CLIENT_ID']; csec = os.environ['CF_ACCESS_CLIENT_SECRET']
icid = os.environ['INFISICAL_CLIENT_ID']; icsec = os.environ['INFISICAL_CLIENT_SECRET']
pid = os.environ['INFISICAL_PROJECT_ID']
ua = 'raxx-sre-agent/1.0'
body = json.dumps({'clientId': icid, 'clientSecret': icsec}).encode()
req = urllib.request.Request(vault+'/api/v1/auth/universal-auth/login', data=body,
method='POST', headers={'Content-Type':'application/json',
'CF-Access-Client-Id':cid,'CF-Access-Client-Secret':csec,'User-Agent':ua})
with urllib.request.urlopen(req, timeout=15) as r: itok = json.loads(r.read())['accessToken']
url = vault+'/api/v3/secrets/raw/CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN?workspaceId='+pid+'&environment=prod&secretPath=%2FMooseQuest%2Fcloudflare'
req2 = urllib.request.Request(url, headers={'Authorization':'Bearer '+itok,
'CF-Access-Client-Id':cid,'CF-Access-Client-Secret':csec,'User-Agent':ua})
with urllib.request.urlopen(req2, timeout=15) as r: tok = json.loads(r.read())['secret']['secretValue']
put = urllib.request.Request(
'https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/bot_management',
data=json.dumps({'fight_mode': True}).encode(), method='PUT',
headers={'Authorization':'Bearer '+tok,'Content-Type':'application/json','User-Agent':ua})
with urllib.request.urlopen(put, timeout=30) as r: resp = json.loads(r.read())
print('fight_mode:', resp['result']['fight_mode'])
EOF
- Verify vault auth probe returns 200 (see §Vault Infisical auth skip verification below).
- Spot-check public paths.
- Document in
docs/incidents/.
Terraform state drift note (post-2026-06-19)
The three new skip rules (vault_infisical_auth_ci_skip, raptor_internal_jobs_ci_skip, generic_cf_access_service_token_skip) were applied via direct CF API call, NOT via Terraform. terraform plan on terraform/waf will show these rules as a planned addition (drift). Running terraform apply will be a no-op in practice because the rules already exist — but the TF state must be reconciled before the next Terraform WAF apply or TF will error on the ruleset.
Action required before next terraform/waf apply: complete the cross-stack state migration (§Cross-stack ruleset migration). Until then, avoid running terraform apply on terraform/waf for raxx.app resources.
Vault Infisical auth skip verification
# With BFM ON — this must return 200 (not CF 403/challenge):
python3 -c "
import os, urllib.request, json
vault = os.environ['INFISICAL_HOST']
cid = os.environ['CF_ACCESS_CLIENT_ID']; csec = os.environ['CF_ACCESS_CLIENT_SECRET']
icid = os.environ['INFISICAL_CLIENT_ID']; icsec = os.environ['INFISICAL_CLIENT_SECRET']
body = json.dumps({'clientId': icid, 'clientSecret': icsec}).encode()
req = urllib.request.Request(vault+'/api/v1/auth/universal-auth/login', data=body,
method='POST', headers={'Content-Type':'application/json',
'CF-Access-Client-Id':cid,'CF-Access-Client-Secret':csec,'User-Agent':'raxx-sre-agent/1.0'})
with urllib.request.urlopen(req, timeout=15) as r:
resp = json.loads(r.read())
print('accessToken present:', 'accessToken' in resp)
"
# Expected: accessToken present: True
How to disable BFM (operator-authorized emergency only)
Token: CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN (vault: /MooseQuest/cloudflare/) — has Zone:Bot Management:Write scope (CF group id 3b94c49258ec4573b06d51d99b6416c0).
Endpoint: PUT /zones/f12dbb5cac57d5591a5058874498a6d1/bot_management with body {"fight_mode": false}.
Always capture before/after state. Document the disable timestamp (UTC) and the re-secure path in docs/ops/incidents/.
Risk window: While BFM is off, automated scanners are no longer challenged by the BFM JS challenge layer. Custom WAF rulesets (OWASP, rate limits) and CF Access remain active. Minimize window length.
Cross-references
- Design:
docs/architecture/waf-strategy.md - ADR:
docs/architecture/adr/0077-cloudflare-waf-layered-defense.md - Threat model:
docs/security/waf-threat-model-2026-05-11.md - Cloudflare tokens:
docs/ops/runbooks/cloudflare-tokens.md - Rate limiting runbook:
docs/ops/runbooks/cloudflare-rate-limiting.md - Vault access runbook:
docs/ops/runbooks/vault-access.md(vault skip rule detail) - Origin guard:
docs/architecture/raxx-app-track-b.md, SC-WAF-07 (#1741) - SC-WAF-00 (Phase 0 operator prereqs): issue #1736
- SC-WAF-06 (synthetic probes): issue #1739
- SC-WAF-07 (enforce flag flip): issue #1741
- CF provider import bug / cross-stack options: issue #2378
- CF plan upgrade for Logpush: issue #2386
- Migration script:
scripts/waf-state-migrate-raxx-app.sh - Rollback script:
scripts/waf-state-migrate-raxx-app-rollback.sh
Escalation
Wake the operator when:
- A WAF rule is blocking customers in Phase 1 (log mode should never block)
- CF WAF Events shows a block action that cannot be explained by the ruleset
- terraform destroy is being considered (impacts all WAF protection for both zones)
- Any incident that involves the WAF affecting payment or order submission flows