Raxx · internal docs

internal · gated

WAF runbook

System: Cloudflare WAF — raxx.app zone (raxx.app, api.raxx.app, console.raxx.app, vault.raxx.app, tickets.raxx.app) + getraxx.com zone (getraxx.com, www.getraxx.com) Owner: operator Last incident: 2026-06-19 (BFM re-enabled with WAF skip rules; see docs/incidents/2026-06-19-bfm-restored.md) Last reviewed: 2026-06-19

How to tell it's broken

How to diagnose (in order)

  1. Check CF WAF Events dashboard — raxx.app or getraxx.com zone → Security → WAF. Filter by last 30 min. Expected: zero blocking actions in Phase 1 (log-only).
  2. Check Logpush S3 bucket (once SC-WAF-00 is complete) — FirewallMatchesActions field. A block action in Phase 1 indicates a rule error.
  3. Correlate FirewallMatchesRuleIDs with the ruleset IDs from terraform output. Identify which ruleset (managed vs custom vs rate limit) fired.
  4. For Postmark webhook failures: GET /zones/{zone_id}/rulesets/{custom_waf_ruleset_id} — verify the Postmark IP ranges in rule Priority 2 match the current Postmark IP list.
  5. For service token block: GET /zones/{zone_id}/rulesets/{custom_waf_ruleset_id} — verify Priority 1 skip rule expression is (len(http.request.headers["cf-access-client-id"]) gt 0) and is enabled.
  6. Check Terraform state drift: cd terraform/waf && terraform plan. Any non-zero diff against a known-good apply indicates dashboard drift (Failure Mode F11).

Token setup

This stack requires a CF API token with Zone:WAF:Edit + Zone:Logs:Edit on both zones.

Verify your token has the correct scopes before applying:

curl -s -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  https://api.cloudflare.com/client/v4/user/tokens/verify | python3 -m json.tool

The CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN documented in terraform/README.md was confirmed to NOT have WAF:Edit scope as of 2026-04-30 (see cloudflare-rate-limiting.md). If that token has not been updated, mint a new WAF-scoped token:

  1. CF dashboard → My Profile → API Tokens → Create Token
  2. Permissions: Zone > WAF > Edit, Zone > Logs > Edit
  3. Zone resources: Include > Specific zone > raxx.app AND getraxx.com (both)
  4. Store in Infisical: POST /api/v3/secrets/raw/CF_WAF_EDIT at path /MooseQuest/cloudflare/
  5. Export at apply time: export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT --path /MooseQuest/cloudflare/ --plain)

Known failure modes

Failure mode A: False positive — legitimate customer blocked (F1)

Symptom: Customer reports 403 on a valid request. WAF Events log shows an OWASP or CF Managed rule firing on a legitimate path. Cause: OWASP CRS triggering on valid JSON body or API field names containing SQL/XSS patterns. Most common on api.raxx.app with complex order payloads. Fix:

# Identify the rule ID from WAF Events or Logpush
# Edit terraform/waf/terraform.tfvars: set owasp_action = "log" to revert to observation
# Or apply a per-rule override in terraform/modules/cf-waf/main.tf overrides block
cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT --path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
terraform apply tfplan

Verification: Customer can complete the previously blocked action. WAF Events shows "log" not "block" for the rule. Phase impact: Rolling back to log is always safe. Docs: waf-strategy.md §8 Phase 1, Failure Mode F1.

Failure mode B: Postmark webhook blocked (F5)

Symptom: Postmark webhook delivery failures. FreeScout inbound email stops. Logpush shows block on /api/webhooks/postmark. Cause: Postmark rotated their delivery IP ranges without notice. Fix:

# Get current Postmark IP ranges from:
# https://postmarkapp.com/support/article/800-ips-for-rate-limiting-or-firewall-rules
# Update terraform/waf/main.tf postmark_ip_ranges in both module calls
# Then:
cd terraform/waf
terraform plan -out=tfplan
terraform apply tfplan

Verification: curl -X POST https://api.raxx.app/api/webhooks/postmark from a Postmark IP returns 200 (not 403). Logpush shows no block on this path.

Failure mode C: CF Access service token challenged or blocked (F6)

Symptom: Velvet, CI, or Console machine calls to Queue or Raptor returning 403 or CAPTCHA challenge. Cause: New service token not matching the skip rule, or BFM skip rule accidentally disabled. Fix:

# Verify the skip rule in CF dashboard:
# raxx.app zone → Security → WAF → Custom rules → "Priority 1 — skip BFM..."
# Confirm: expression = (len(http.request.headers["cf-access-client-id"]) gt 0)
# Confirm: Action = Skip, Status = Enabled

# If the rule is present but not working, verify the service token is sending
# the CF-Access-Client-Id header. Trace with:
curl -v -H "CF-Access-Client-Id: <token-id>" -H "CF-Access-Client-Secret: <token-secret>" \
  https://api.raxx.app/health

Verification: Machine caller returns expected response (not 403/challenge). Logpush shows no block on affected path.

Failure mode D: Rate limit too tight — Stripe/payment webhook backlog (F4)

Symptom: Stripe webhook delivery failures. Payment processing lag. Rate limit action fires on /api/v1/billing/webhook. Cause: Rate limit threshold on global or order path too tight during a Stripe event replay burst. Fix:

# Immediately revert rate_limit_action to "log" (observation mode):
# In terraform.tfvars: rate_limit_action = "log"
cd terraform/waf
terraform plan -out=tfplan
terraform apply tfplan

Verification: Stripe webhook delivery resumes. Check Stripe dashboard for webhook retry status.

Failure mode E: Terraform state drift (F11)

Symptom: terraform plan shows diff for a resource that was not intentionally changed. Indicates a direct CF dashboard edit (not via Terraform). Fix:

cd terraform/waf
# Review the diff carefully. If the dashboard state is correct:
# Import the changed resource into TF state and update main.tf to match.
# If TF state is correct:
terraform apply -target=<resource_address>

Prevention: All WAF changes must go through Terraform. No direct CF dashboard edits after first apply (ADR-0077 D2, ADR-0051).

Phase advancement

Phase transitions require explicit operator sign-off. Do not advance phases autonomously.

Phase tfvars change Gate criteria
Phase 1 → Phase 2 managed_ruleset_action = "managed_challenge", rate_limit_action = "managed_challenge" 7-day log soak; false-positive rate <1%
Phase 2 → Phase 3 managed_ruleset_action = "block", rate_limit_action = "block" 72h; zero legitimate flows challenged
Phase 4 → Phase 5 n/a (flag flip — FLAG_ENFORCE_CF_ORIGIN) 7-day Phase 4 soak; SC-WAF-07 (#1741)

Always run terraform plan and review before terraform apply on any phase change.

Emergency stop (kill-switch)

Fastest rollback: set all actions to log/simulate and apply. ~30s CF propagation.

cd terraform/waf
# Edit terraform.tfvars:
#   managed_ruleset_action = "log"
#   owasp_action           = "log"
#   auth_challenge_action  = "log"
#   rate_limit_action      = "simulate"
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT --path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
terraform apply tfplan

Full removal (removes all WAF rulesets and rate limits; CF Access unaffected):

cd terraform/waf
terraform destroy

Note: terraform destroy removes WAF only. It does not touch terraform/cf-access/ (separate state file).

Pre-apply token verification (REQUIRED — do not skip)

Before running terraform apply, verify the active CF API token is valid AND has the required scopes. A token that passes the /verify check but is missing a scope will produce a silent partial-apply: some resources succeed while others return 403, leaving the WAF in an indeterminate state. This step prevents that class of incident (see #2378 for a missed-scope apply that required a full state reconciliation).

Step A — token liveness:

curl -sS -H "Authorization: Bearer $CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN" \
  https://api.cloudflare.com/client/v4/user/tokens/verify

Expected: {"result": {"status": "active"}, ...}

If the response contains "code": 10000 (auth error) or "code": 7003 (no route to resource), STOP. Do not run terraform apply. Surface the full response body to the operator and request a token refresh before proceeding.

Step B — account-level Firewall Services scope:

curl -sS -H "Authorization: Bearer $CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN" \
  "https://api.cloudflare.com/client/v4/accounts/22b5c35090724fbf05db6d4f501ac821/firewall/access_rules/rules?per_page=1"

Expected: HTTP 200 with "success": true. An errors array containing code 10000 or 7003 means the token lacks account-level Firewall Services scope. STOP and escalate.

Step C — zone-level WAF scope:

curl -sS -H "Authorization: Bearer $CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN" \
  "https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/rulesets?per_page=1"

Expected: HTTP 200 with "success": true. Code 10000 or 7003 means the token lacks Zone WAF scope on raxx.app. STOP and escalate.

All three steps must return clean before proceeding to terraform apply. If the WAF token (CF_WAF_EDIT) passes Steps A–C but CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN does not, see docs/ops/runbooks/cloudflare-tokens.md — these are distinct tokens with distinct scopes. Export the correct token before running Terraform.


How to run this stack

Full apply (both zones — requires cross-stack migration first for raxx.app)

Prerequisite: the cross-stack state migration in §Cross-stack ruleset migration must be complete before applying raxx.app. getraxx.com has no conflict and can be applied at any time.

cd terraform/waf

# 0. Run pre-apply token verification (§Pre-apply token verification above) FIRST.

# 1. Set the CF WAF-scoped API token
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
  --path /MooseQuest/cloudflare/ --plain)

# 2. Inject zone IDs via environment (do not edit terraform.tfvars values)
export TF_VAR_raxx_app_zone_id=$(infisical secrets get CF_ZONE_ID_RAXX_APP \
  --path /MooseQuest/cloudflare/ --plain)
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
  --path /MooseQuest/cloudflare/ --plain)

# 3. If SC-WAF-00 is complete and CF plan upgrade (#2386) resolved, set Logpush vars from SSM:
#    export TF_VAR_logpush_destination_conf=$(aws ssm get-parameter \
#      --name /raxx/waf/logpush_destination_conf --with-decryption \
#      --query Parameter.Value --output text)
#    export TF_VAR_logpush_ownership_challenge=$(aws ssm get-parameter \
#      --name /raxx/waf/logpush_ownership_challenge --with-decryption \
#      --query Parameter.Value --output text)

# 4. Init + plan + apply
terraform init
terraform plan -out=tfplan
# Review: all changes must be additive; no modifications to cf-access/ resources
terraform apply tfplan

# 5. Verify
terraform output
# Check CF dashboard: Security → WAF → Custom rules
# Expected: all rules show mode "log"; no blocking actions

getraxx.com only (no cross-stack conflict — safe to run immediately)

getraxx.com has no prior custom-phase ruleset. This subset of resources can be applied without completing the raxx.app state migration. Useful during the T-4 launch window when the full migration is pending.

cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
  --path /MooseQuest/cloudflare/ --plain)
export TF_VAR_raxx_app_zone_id=$(infisical secrets get CF_ZONE_ID_RAXX_APP \
  --path /MooseQuest/cloudflare/ --plain)
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
  --path /MooseQuest/cloudflare/ --plain)

terraform init
# Target getraxx module resources only:
terraform plan -target=module.waf_getraxx -out=tfplan-getraxx
# Review plan: expect 3 new resources (managed_waf, custom_waf, rate_limits)
# and 1 zone_settings_override for getraxx.com. No raxx.app resources in plan.
terraform apply tfplan-getraxx

After applying getraxx.com, complete the cross-stack state migration (§Cross-stack ruleset migration) to unblock the raxx.app apply.

CF-Access skip rules (BFM bypass for machine callers)

CF WAF (Layer 1) evaluates before CF Access (Layer 2). Machine callers using CF Access service tokens egress from AWS/Azure ASNs which score high on bot detection. Without explicit skip rules, BFM fires a managed challenge before the service-token is authenticated — returning HTTP 403 to CI and vault tooling.

The raxx.app custom WAF ruleset contains three CF-Access skip rules, applied in priority order:

Priority 0.5 — Vault Infisical auth skip

Skips the full current ruleset for Infisical CLI machine-identity (universal-auth) requests to vault.raxx.app/api/v1/auth/* that carry a CF-Access-Client-Id header.

Without this rule, BFM trips on AWS ASN egress (AS14618/AS16509) and returns CF error 1010 before CF Access can authenticate the service token. Root cause of #680.

Controlled by vault_infisical_auth_skip_enabled = true in terraform/waf/main.tf. Enabled for raxx.app only; false for getraxx.com.

CF expression:

(http.host eq "vault.raxx.app" and starts_with(http.request.uri.path, "/api/v1/auth/") and len(http.request.headers["cf-access-client-id"]) gt 0)

Priority 0.6 — Raptor internal jobs skip (#3621)

Skips the full current ruleset for GH Actions compliance crons (billing-retention-cron, trace-integrity-cron) posting to api.raxx.app/api/internal/jobs/* that carry a CF-Access-Client-Id header.

Without this rule, BFM fires on Azure ASN egress (AS8075 / eastus) before CF Access can authenticate the service token — returning HTTP 403 managed challenge. Root cause of the cron failures described in #3621. Prerequisite for re-enabling BFM (#3634).

Controlled by raptor_internal_jobs_skip_enabled = true in terraform/waf/main.tf. Enabled for raxx.app only; false for getraxx.com.

CF expression (ref: raptor_internal_jobs_ci_skip):

(http.host eq "api.raxx.app" and starts_with(http.request.uri.path, "/api/internal/jobs/") and len(http.request.headers["cf-access-client-id"]) gt 0)

Verification after apply:

# Trigger compliance crons; both must return HTTP 200 (not 403):
gh workflow run billing-retention-cron.yml
gh workflow run trace-integrity-cron.yml

# Or probe directly with a valid CF Access service token:
curl -sI \
  -H "CF-Access-Client-Id: <client_id_from_vault>" \
  -H "CF-Access-Client-Secret: <client_secret_from_vault>" \
  https://api.raxx.app/api/internal/jobs/billing-retention
# Expected: HTTP 200 (or 405 Method Not Allowed if GET; the 403 challenge is gone)

Priority 1 — Generic CF-Access service token skip

Skips BFM, hot-linking protection, UA blocking, and security-level gate for ALL requests that carry a non-empty CF-Access-Client-Id header, zone-wide. This is the broadest skip and covers Velvet, console, and CI runners for all paths.

CF expression:

(len(http.request.headers["cf-access-client-id"]) gt 0)

See docs/ops/runbooks/vault-access.md for vault-specific verification commands.

Failure mode C2 — internal jobs path challenged (F6b)

Symptom: billing-retention-cron or trace-integrity-cron GH Actions workflow returns HTTP 403 managed challenge on api.raxx.app/api/internal/jobs/*. Cause: Priority 0.6 skip rule missing or disabled; BFM enabled on zone. Fix:

# Verify rule is present and enabled in CF dashboard:
# raxx.app → Security → WAF → Custom rules → ref: raptor_internal_jobs_ci_skip
# Confirm: action=skip, enabled=true

# If missing: re-apply terraform/waf with raptor_internal_jobs_skip_enabled = true
cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
  --path /MooseQuest/cloudflare/ --plain)
export TF_VAR_raxx_app_zone_id=$(infisical secrets get CF_ZONE_ID_RAXX_APP \
  --path /MooseQuest/cloudflare/ --plain)
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
  --path /MooseQuest/cloudflare/ --plain)
terraform plan -out=tfplan
terraform apply tfplan

Verification: Re-run the affected cron workflow; confirm HTTP 200 from endpoint.

Logpush setup dependency (SC-WAF-00)

The logpush_destination_conf and logpush_ownership_challenge variables are empty by default. The Logpush job is not created until SC-WAF-00 (#1736) completes.

SC-WAF-00 operator actions: 1. Create S3 bucket for WAF logs (raxx-waf-logs-prod recommended). 2. Create IAM user with s3:PutObject on that bucket only. 3. Run Cloudflare ownership challenge for the destination. 4. Store destination conf and challenge token in SSM: - /raxx/waf/logpush_destination_conf - /raxx/waf/logpush_ownership_challenge 5. Re-apply this stack with the SSM values injected via TF_VAR_*.

Cross-stack ruleset migration (operator state operations — post #2328)

Code migration status: COMPLETE as of 2026-05-17 (Issue #2183, PR #2527). State migration: pending operator execution (Issue #2378, Option C locked 2026-05-19).

The freescout_lambda_skip rule has been moved out of terraform/cf-access/freescout_service_token.tf and into terraform/modules/cf-waf/main.tf as a Priority 0 dynamic rule inside cloudflare_ruleset.custom_waf. The cloudflare_ruleset.freescout_lambda_skip resource declaration has been removed from terraform/cf-access. PR #2527 also added the Priority 0.5 vault Infisical auth skip rule into the same module.

What remains: two state operations. These require live CF credentials (#2328 token refresh) and cannot be executed until tokens are valid. Until then, terraform plan on terraform/cf-access will show the ruleset as a planned destroy — do not run terraform apply on cf-access until Step 2 below is complete.

Automated migration script (recommended):

# Run from repo root — handles Steps 1–5 interactively with guards:
bash scripts/waf-state-migrate-raxx-app.sh

# If the migration fails at any step or terraform plan shows drift:
bash scripts/waf-state-migrate-raxx-app-rollback.sh

Manual steps (for reference or partial recovery):

Step 1 — import the live zone-default ruleset into terraform/waf state:

cd terraform/waf
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_ACCESS_MGMT \
  --path /MooseQuest/cloudflare/ --plain)
export TF_VAR_raxx_app_zone_id="f12dbb5cac57d5591a5058874498a6d1"
export TF_VAR_getraxx_zone_id=$(infisical secrets get CF_ZONE_ID_GETRAXX \
  --path /MooseQuest/cloudflare/ --plain)
terraform init
terraform import module.waf_raxx_app.cloudflare_ruleset.custom_waf \
  zones/f12dbb5cac57d5591a5058874498a6d1/17dc768ccadf4d02ae279e133b7b5bfd

Step 2 — remove from terraform/cf-access state (does NOT destroy the CF resource):

cd terraform/cf-access
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_ACCESS_MGMT \
  --path /MooseQuest/cloudflare/ --plain)
terraform init
terraform state rm cloudflare_ruleset.freescout_lambda_skip

Step 3 — plan both stacks; both must show zero diff on the ruleset. Switch to the WAF-scoped token:

export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_WAF_EDIT \
  --path /MooseQuest/cloudflare/ --plain)
cd terraform/waf    && terraform plan -out=tfplan-waf-migration
cd terraform/cf-access && terraform plan

Expected terraform/waf plan: new resources for managed WAF, rate limits, zone settings — zero destroys. The custom_waf ruleset shows as an in-place update with Priority 0 (freescout Lambda skip), Priority 0.5 (vault Infisical auth skip, added by #2527), and Priorities 1–5 all present.

Expected terraform/cf-access plan: zero changes (ruleset resource gone from both config and state).

If either plan shows the ruleset as a destroy, STOP and run the rollback script.

Step 4 — apply terraform/waf only:

cd terraform/waf
terraform apply tfplan-waf-migration

Step 5 — verify in CF dashboard:

Rollback: if the apply produces unexpected drift or any rule is missing post-apply:

bash scripts/waf-state-migrate-raxx-app-rollback.sh

The rollback removes the ruleset from terraform/waf state and re-imports it into terraform/cf-access state. The CF ruleset itself is not modified.

Tracking: Required before #1735 and #2378 can be closed.

Bot Fight Mode

Current state (as of 2026-06-19T00:00:00Z UTC)

fight_mode = true — re-enabled by sre-agent after WAF CF-Access skip rules were applied. See docs/incidents/2026-06-19-bfm-restored.md and Issue #3634.

Live skip rules in custom ruleset 17dc768ccadf4d02ae279e133b7b5bfd (raxx.app zone):

Priority ref Expression summary bic skip
0 c8c0b91d4e2a4f99bc62237ad6a498b9 tickets.raxx.app/api + CF-Access-Client-Id yes
0.5 vault_infisical_auth_ci_skip vault.raxx.app/api/v1/auth/ + CF-Access-Client-Id yes
0.6 raptor_internal_jobs_ci_skip api.raxx.app/api/internal/jobs/ + CF-Access-Client-Id yes
1 generic_cf_access_service_token_skip any host/path + CF-Access-Client-Id yes

Note: These rules were applied directly via the CF Rulesets API (WAF token CF_WAF_EDIT_RAXX_APP) bypassing Terraform, because the cross-stack state migration (#2378 Option C) was not yet executed. See §Terraform state drift note below.

Re-enable path (for future use — BFM is already ON):

If BFM is ever disabled again (operator-authorized emergency only), re-enable by:

  1. Confirm all four skip rules in §CF-Access skip rules are still live in custom ruleset 17dc768ccadf4d02ae279e133b7b5bfd.
  2. Re-enable BFM via CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN from vault:
python3 << 'EOF'
import os, urllib.request, urllib.parse, json
vault = os.environ['INFISICAL_HOST']
cid = os.environ['CF_ACCESS_CLIENT_ID']; csec = os.environ['CF_ACCESS_CLIENT_SECRET']
icid = os.environ['INFISICAL_CLIENT_ID']; icsec = os.environ['INFISICAL_CLIENT_SECRET']
pid = os.environ['INFISICAL_PROJECT_ID']
ua = 'raxx-sre-agent/1.0'
body = json.dumps({'clientId': icid, 'clientSecret': icsec}).encode()
req = urllib.request.Request(vault+'/api/v1/auth/universal-auth/login', data=body,
  method='POST', headers={'Content-Type':'application/json',
  'CF-Access-Client-Id':cid,'CF-Access-Client-Secret':csec,'User-Agent':ua})
with urllib.request.urlopen(req, timeout=15) as r: itok = json.loads(r.read())['accessToken']
url = vault+'/api/v3/secrets/raw/CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN?workspaceId='+pid+'&environment=prod&secretPath=%2FMooseQuest%2Fcloudflare'
req2 = urllib.request.Request(url, headers={'Authorization':'Bearer '+itok,
  'CF-Access-Client-Id':cid,'CF-Access-Client-Secret':csec,'User-Agent':ua})
with urllib.request.urlopen(req2, timeout=15) as r: tok = json.loads(r.read())['secret']['secretValue']
put = urllib.request.Request(
  'https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/bot_management',
  data=json.dumps({'fight_mode': True}).encode(), method='PUT',
  headers={'Authorization':'Bearer '+tok,'Content-Type':'application/json','User-Agent':ua})
with urllib.request.urlopen(put, timeout=30) as r: resp = json.loads(r.read())
print('fight_mode:', resp['result']['fight_mode'])
EOF
  1. Verify vault auth probe returns 200 (see §Vault Infisical auth skip verification below).
  2. Spot-check public paths.
  3. Document in docs/incidents/.

Terraform state drift note (post-2026-06-19)

The three new skip rules (vault_infisical_auth_ci_skip, raptor_internal_jobs_ci_skip, generic_cf_access_service_token_skip) were applied via direct CF API call, NOT via Terraform. terraform plan on terraform/waf will show these rules as a planned addition (drift). Running terraform apply will be a no-op in practice because the rules already exist — but the TF state must be reconciled before the next Terraform WAF apply or TF will error on the ruleset.

Action required before next terraform/waf apply: complete the cross-stack state migration (§Cross-stack ruleset migration). Until then, avoid running terraform apply on terraform/waf for raxx.app resources.

Vault Infisical auth skip verification

# With BFM ON — this must return 200 (not CF 403/challenge):
python3 -c "
import os, urllib.request, json
vault = os.environ['INFISICAL_HOST']
cid = os.environ['CF_ACCESS_CLIENT_ID']; csec = os.environ['CF_ACCESS_CLIENT_SECRET']
icid = os.environ['INFISICAL_CLIENT_ID']; icsec = os.environ['INFISICAL_CLIENT_SECRET']
body = json.dumps({'clientId': icid, 'clientSecret': icsec}).encode()
req = urllib.request.Request(vault+'/api/v1/auth/universal-auth/login', data=body,
  method='POST', headers={'Content-Type':'application/json',
  'CF-Access-Client-Id':cid,'CF-Access-Client-Secret':csec,'User-Agent':'raxx-sre-agent/1.0'})
with urllib.request.urlopen(req, timeout=15) as r:
  resp = json.loads(r.read())
  print('accessToken present:', 'accessToken' in resp)
"
# Expected: accessToken present: True

How to disable BFM (operator-authorized emergency only)

Token: CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN (vault: /MooseQuest/cloudflare/) — has Zone:Bot Management:Write scope (CF group id 3b94c49258ec4573b06d51d99b6416c0).

Endpoint: PUT /zones/f12dbb5cac57d5591a5058874498a6d1/bot_management with body {"fight_mode": false}.

Always capture before/after state. Document the disable timestamp (UTC) and the re-secure path in docs/ops/incidents/.

Risk window: While BFM is off, automated scanners are no longer challenged by the BFM JS challenge layer. Custom WAF rulesets (OWASP, rate limits) and CF Access remain active. Minimize window length.

Cross-references

Escalation

Wake the operator when: - A WAF rule is blocking customers in Phase 1 (log mode should never block) - CF WAF Events shows a block action that cannot be explained by the ruleset - terraform destroy is being considered (impacts all WAF protection for both zones) - Any incident that involves the WAF affecting payment or order submission flows