Raxx · internal docs

internal · gated

RCA — BFM restored on raxx.app after WAF CF-Access skip rules applied

Incident ID: 2026-06-19-bfm-restored Date: 2026-06-19 Severity: SEV-2 Duration: ~8 days total disable window (2026-06-18 disable → 2026-06-19 restore). sre-agent execution: ~15 minutes. Blast radius: No production outage during disable window. BFM off = automated scanners unchallenged for 8 days. No customer-facing impact observed. CI→vault was broken during disable cause (prior to this session); this session restores both. Author: sre-agent

Summary

Bot Fight Mode on the raxx.app Cloudflare zone was disabled on 2026-06-18 (operator-authorized) because GitHub Actions runners from AWS/Azure ASNs were receiving CF error 1010 before their CF-Access service-token headers could be validated — blocking all CI→vault workflows. The permanent fix required WAF skip rules to be applied before re-enabling BFM. This RCA documents the 2026-06-19 session that applied those skip rules via the CF Rulesets API and restored fight_mode=true, with the CI→vault golden path confirmed working.

Timeline (all times UTC)

Impact

What went well

What didn't go well

Root cause analysis

Detection

Resolution

Action items

# Action Owner Due Issue
1 Mint CF_BOT_MGMT_RAXX_APP token (dedicated Bot Management scope) and store in vault at /MooseQuest/cloudflare/ — eliminates reliance on broad automation token for BFM toggles operator 2026-06-26 #3634 Action B
2 Complete cross-stack TF state migration (#2378 Option C): import new skip rules into terraform/waf state so TF plan shows zero drift; prevents next terraform apply from clobbering the live skip rules sre-agent (requires op token sign-off) 2026-06-26 #2378
3 Add synthetic vault-auth probe to GH Actions (runs every 30m, alerts on non-200) to detect BFM false-positive within minutes sre-agent 2026-07-03 new
4 Close issue #3634 with link to this RCA sre-agent 2026-06-19 #3634

References