Raxx · internal docs

internal · gated

RCA — WebAuthn validator boot regression on raxx-api-prod

Incident ID: 2026-05-21-prod-webauthn-boot-fail Date: 2026-05-21 Severity: SEV-1 Duration: 6m 18s (19:00:01 UTC first crash → 19:06:09 UTC rollback complete) Blast radius: raxx-api-prod web dyno; operator-only (CF Access perimeter active; no customer traffic) Author: sre-agent

Summary

Flipping FLAG_WEBAUTHN_REGISTRATION=1 + FLAG_AUTH_WEBAUTHN_LOGIN=1 on raxx-api-prod caused the web dyno to crash-loop at boot. The validator in create_app() raised RuntimeError: WebAuthn config validation failed — refusing to start: WEBAUTHN_RP_ID is not set even though WEBAUTHN_RP_ID=raxx.app was present in the Heroku config and visible to one-off dynos. The crash was caused by app.config.from_pyfile('config.py', silent=True) overwriting the correctly-mapped WEBAUTHN_RP_ID value with '' after from_mapping() set it — a code path that the PR #2590 regression tests did not exercise because they used the test_config code path, not the production from_pyfile path. The dyno was restored in 6 minutes via heroku config:unset.

Timeline (all times UTC)

Impact

What went well

What didn't go well

Root cause analysis

Detection

Resolution

Action items

# Action Owner Due Issue
1 Write BOOTSTRAP_TOKEN_SIGNING_KEY to Infisical vault (was set on prod but not in vault) operator 2026-05-22
2 Add pre-flip smoke step to WebAuthn runbook: heroku run -a raxx-api-prod -- python -c "from api import create_app; create_app()" before any flag flip that touches boot-time validators sre-agent 2026-05-23
3 Prod deploy after this fix merges (workflow_dispatch + approval per ADR-0020) operator 2026-05-22
4 Update .gitignore to explicitly ignore backend_v2/instance/config.py so a dev machine never accidentally commits it into a slug developer 2026-05-28

References