RCA — Bot Fight Mode disabled on raxx.app (operator-authorized pre-launch window)
Incident ID: 2026-06-18-bfm-disabled-window Date: 2026-06-18 Severity: SEV-2 Duration: Ongoing until re-secure path is completed (see Action items) Blast radius: GitHub Actions Azure-ASN CI runners blocked from vault.raxx.app; Sprint readiness gate and BCP export workflows failing with HTTP 403 at vault universal-auth/login Author: sre-agent
Summary
Cloudflare Bot Fight Mode (BFM) on the raxx.app zone was scoring GitHub Actions runners (AWS/Azure ASN egress) as bot traffic and returning CF error 1010 before CF Access could authenticate service-token headers. This blocked all CI workflows that load secrets from vault.raxx.app. The operator authorized disabling BFM as a pre-launch bridge while the permanent WAF CF-Access skip rules (Priority 0.5 vault + Priority 1 generic, tracked in #2328 and #2378) are deployed. BFM was disabled at 2026-06-18T11:22:19Z UTC via CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN which carries Zone:Bot Management:Write scope.
Timeline (all times UTC)
- 2026-06-18T11:20:00Z — Operator reports CI→vault auth failures (HTTP 403 at vault.raxx.app/api/v1/auth/universal-auth/login) blocking Sprint readiness gate and prod deploy workflows.
- 2026-06-18T11:20:30Z — sre-agent begins investigation. Enumerates CF token inventory from Infisical vault at /MooseQuest/cloudflare/.
- 2026-06-18T11:21:00Z — Probed GET /zones/f12dbb5cac57d5591a5058874498a6d1/bot_management with all vaulted tokens.
CLOUDFLARE_RAXX_AUTOMATION_API_TOKENreturns HTTP 200 with fight_mode=True.CF_WAF_EDIT_RAXX_APPandCLOUDFLARE_ACCESS_MGMT_TOKENreturn HTTP 403. - 2026-06-18T11:21:30Z — Confirmed
CLOUDFLARE_RAXX_AUTOMATION_API_TOKENhasBot Management Write (id=3b94c49258ec4573b06d51d99b6416c0)via /user/tokens self-inspection. - 2026-06-18T11:22:17Z — PRE-CHANGE state recorded: fight_mode=True, enable_js=True, using_latest_model=True, ai_bots_protection=disabled.
- 2026-06-18T11:22:19Z — PUT /zones/f12dbb5cac57d5591a5058874498a6d1/bot_management {"fight_mode": false} returned HTTP 200, success=true, fight_mode=False.
- 2026-06-18T11:22:21Z — POST-CHANGE GET confirmed: fight_mode=False.
- 2026-06-18T11:22:35Z — Golden path probe: POST vault.raxx.app/api/v1/auth/universal-auth/login returned HTTP 200 with valid accessToken. PASS.
- 2026-06-18T11:23:00Z — Re-ran failing BCP vault export workflow (run ID 27755471839).
- 2026-06-18T11:25:00Z — Re-run result: "Authenticate to Infisical (universal auth via REST)" step now PASSES (was FAIL before BFM disable). Workflow fails later at "Import GPG public key" — a pre-existing separate issue unrelated to BFM (see
docs/ops/incidents/2026-06-15-bcp-vault-snapshot-gpg-key-missing.md). Vault auth blocker confirmed cleared.
Impact
- Users affected: 0 (pre-launch; no customers)
- User-visible symptoms: none (CI-internal)
- Data integrity: ok
- Revenue / billing: ok
- CI workflows impacted: Sprint readiness gate, BCP vault export (run 27755471839), any workflow hitting vault.raxx.app from Azure/AWS ASN runners
What went well
CLOUDFLARE_RAXX_AUTOMATION_API_TOKENin vault carried the requiredBot Management Writescope — no minting needed.- Vault token inventory enumeration via Infisical REST API was complete and fast.
- Before/after state was captured with timestamps before any change was applied.
- Golden path verification confirmed recovery via both a direct probe and a CI workflow re-run within 3 minutes of the change.
What didn't go well
- The WAF Terraform skip rules (Priority 0.5 vault + Priority 1 generic) that would have made BFM disablement unnecessary are pending state migration (#2328, #2378). BFM disable is a bridge, not the permanent fix.
- The
CF_WAF_EDIT_RAXX_APP__SCOPESannotation in vault does not includeZone:Bot Management:Edit— the scope inventory indocs/ops/runbooks/cloudflare-tokens.mddoes not document which vaulted token carries Bot Management Write scope, requiring a discovery enumeration at incident time.
Root cause analysis
- Contributing factor 1: BFM evaluates before CF Access — CF Bot Fight Mode runs at the WAF layer, before CF Access service-token authentication. AWS/Azure ASN egress IPs (GitHub Actions runners) score as bot-origin traffic, causing CF error 1010 on vault.raxx.app/api/v1/auth/* before the CF-Access-Client-Id header is ever examined. This was the root cause of prior incident #680 (fixed with a WAF skip rule), and recurs whenever BFM is enabled without skip rules covering the vault auth path.
- Contributing factor 2: WAF skip rules blocked by state migration — The permanent fix (Priority 0.5 vault skip + Priority 1 generic skip in terraform/waf) requires a cross-stack state migration (#2378, Option C locked 2026-05-19). That migration has not completed, leaving CI runners exposed to BFM on every vault auth call.
- Contributing factor 3: No Bot Management scope in token index — The cloudflare-tokens.md inventory table does not document which token carries
Zone:Bot Management:Writescope. This added a discovery enumeration step at incident time.
Detection
- What alerted us: Operator report of CI→vault HTTP 403 failures in Sprint readiness gate
- How long between cause and detection: unknown (BCP vault export has a separate pre-existing failure obscuring this signal)
- How to detect faster next time: Add a synthetic probe that authenticates to vault from a public-IP (non-operator) endpoint daily and alerts ops@ on any non-200
Resolution
- What was changed:
fight_modeon raxx.app zone set fromtruetofalsevia PUT /zones/f12dbb5cac57d5591a5058874498a6d1/bot_management - Token used:
CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN(vault path: /MooseQuest/cloudflare/; CF permission:Bot Management Write, group id3b94c49258ec4573b06d51d99b6416c0) - Validation: GET /bot_management confirms fight_mode=False; vault auth probe returns HTTP 200 with accessToken; CI re-run of BCP vault export (run 27755471839) passes vault auth step
Exposure window and risk
BFM fight_mode=False means Cloudflare's bot heuristics no longer automatically challenge high-bot-score traffic on raxx.app. This degrades WAF posture:
- Automated scanning (vulnerability scanners, credential-stuffing bots) will no longer be challenged by BFM. The custom WAF rulesets (OWASP, rate limits) and CF Access remain active and provide the primary defense.
- JS challenge bypass — BFM uses JS challenges that real browsers pass; turning it off removes this layer for endpoints like api.raxx.app and console.raxx.app.
- Pre-launch risk is lower than post-launch — no customers yet. This window must be minimized.
- Acceptable because: pre-launch, no customer data, custom WAF + CF Access + rate limits remain active.
Re-secure path (in order)
- Complete cross-stack state migration:
docs/ops/runbooks/waf.md§Cross-stack ruleset migration (Issue #2378, Option C). Requires validCF_WAF_EDIT_RAXX_APPtoken andCF_ACCESS_MGMTtoken. - Apply Priority 0.5 vault Infisical auth skip rule and Priority 1 CF-Access-Client-Id generic skip rule via
terraform/waf(Issues #2328, #2378). - Verify both skip rules are live: vault auth probe from a non-operator IP returns 200; CF WAF Events shows no blocks on /api/v1/auth/*.
- Re-enable BFM:
PUT /zones/f12dbb5cac57d5591a5058874498a6d1/bot_management {"fight_mode": true}usingCLOUDFLARE_RAXX_AUTOMATION_API_TOKEN. - Verify: vault auth probe still returns 200 (the skip rule must preempt BFM for vault path).
- Close Issue #3634.
Action items
| # | Action | Owner | Due | Issue |
|---|---|---|---|---|
| 1 | Complete WAF cross-stack state migration and apply Priority 0.5 + Priority 1 skip rules | operator | 2026-06-25 | #2378 |
| 2 | Re-enable BFM fight_mode after skip rules are verified live | sre-agent (after #2378) | 2026-06-25 | #3634 |
| 3 | Document CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN Bot Management Write scope in cloudflare-tokens.md inventory |
sre-agent | 2026-06-20 | n/a |
| 4 | Add daily synthetic vault-auth probe from non-operator IP to detect BFM regressions before they block CI | sre-agent | 2026-06-25 | n/a |
References
- Runbook:
docs/ops/runbooks/waf.md§Bot Fight Mode - Vault access runbook:
docs/ops/runbooks/vault-access.md - Prior incident #680 (2026-05-15 — BFM blocking Infisical CLI; same root cause class)
- WAF state migration:
docs/ops/runbooks/waf.md§Cross-stack ruleset migration - Issue #2328 (Firewall Services token refresh + skip rule deploy)
- Issue #2378 (cross-stack state migration, Option C)
- Issue #3634 (BFM re-enable tracking)
- Token used:
CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN(vault: /MooseQuest/cloudflare/) - Memory:
feedback_cf_access_does_not_bypass_bot_fight_mode.md - Memory:
feedback_cf_access_service_token_needs_non_identity.md