Raxx · internal docs

internal · gated

RCA — WebAuthn registration challenge miss on every signup attempt

Incident ID: 2026-05-25-signup-challenge-store-miss Date: 2026-05-25 Severity: SEV-1 Duration: ~3h active (00:56 UTC first attempt → 03:37 UTC fix deployed and verified) Blast radius: Operator (single user; no customers yet — pre-launch) Author: sre-agent

Summary

Every attempt to complete WebAuthn passkey registration via the /register/begin-with-token/register/verify-with-token flow (Option C bootstrap-link path) failed at the verify step with _pop_challenge(user_id) returning None, causing a 400 "Passkey verification failed" in 1ms. Root cause: the in-process _challenge_store dict in webauthn_service.py diverges per gunicorn process due to copy-on-write semantics after --preload fork, so the challenge written in the master/worker that handled begin is not visible to the process that handles verify. Fix: Redis-backed challenge store using the already-provisioned Heroku Redis instance (REDIS_URL already set on prod).

Timeline (all times UTC)

Impact

What went well

What didn't go well

Root cause analysis

Detection

Resolution

Action items

# Action Owner Due Issue
1 Merge PR #2728, deploy v47, verify signup end-to-end operator 2026-05-25 #2728
2 Add webauthn.challenge_miss Sentry alert / Heroku log drain alert sre-agent 2026-06-01 TBD
3 Add runbook entry: gunicorn --preload + module-level mutable state is unsafe; all shared state must use Redis or Postgres sre-agent 2026-05-26 TBD
4 Require begin-with-token + verify-with-token to ship in same PR (documented in release checklist) operator 2026-05-30 TBD
5 Enable root logger INFO output in production so logger.info(...) calls in auth routes appear in Heroku logs dev 2026-06-01 TBD

References