DET-DATA-002 — audit log hash chain break
Rule ID: DET-DATA-002
Title: KMS HMAC hash chain verification failure across customer_audit_events
Category: data
Last validated: 2026-06-04 (initial catalog, dormant)
State: dormant — activates when the KMS HMAC hash chain ships (gated on SC-A11 + SC-A14 deploy per project_kms_audit_chain_approved)
Telemetry source
- Postgres table
customer_audit_eventswithevent_hash+prev_event_hashcolumns (schema present perbackend_v2/tests/test_customer_audit_writer_1483.py:594,689). - The hash chain itself: each row's
event_hash = HMAC(KMS_KEY, prev_event_hash || row_canonical_form). Verification recomputes the chain and compares. - KMS key: stored per
project_kms_audit_chain_approved(~$2/mo approved). Not yet provisioned.
Statistical method + baseline window
- Method: deterministic chain-verification — no statistics. Walk rows in
created_atorder, recompute eachevent_hashfromprev_event_hash + canonical_form, compare to stored value. - Baseline window: since-last-verification anchor. Default check window: last 24 hours; full-table audit weekly.
- Fire condition: ANY mismatch between computed and stored
event_hash. ANY missingprev_event_hashlinkage. No tolerance — single observation is the signal.
Threshold + expected FP rate
- Threshold: zero. Any chain break = fire.
- Expected FP rate: zero. The only legitimate way to break the chain is a KMS key rotation that wasn't accompanied by a re-anchoring procedure — and that procedure must update an anchor table that this rule reads, so a properly-executed rotation does not trip the rule. If it does trip on a known rotation, the rotation procedure is buggy and the fix is the procedure, not the rule.
Alert route
- CRITICAL — any fire:
#raxx-ops-alert-sev1perproject_oncall_severity_routing. Page operator immediately. This is the ONE rule in the catalog where a single observation is sufficient justification for SEV1.
Escalation owner
- security-agent primary — adversary-shaped by definition.
- operator — informational page parallel to security-agent dispatch.
Test fixture / synthetic positive
See _fixtures/audit_log_hash_chain_break_positive.py for a synthetic 3-row chain where row 2's event_hash does not match the expected HMAC.
What to do when this fires
- Do not restart, redeploy, or write any new audit row until forensic snapshot is captured. Each new write changes the chain state and complicates forensics.
- Snapshot the table to S3 (or local file) via
heroku pg:backups:capturefollowed by extracting the audit-events table. - Identify the first broken row (chain integrity holds for rows before the break, fails for rows at-or-after).
- Cross-reference the break point with: deploy events, KMS key activity in CloudTrail, recent migrations, operator session windows.
- If the break corresponds to a known KMS rotation, fix the rotation procedure (an sre-agent task) and re-anchor.
- If the break does NOT correspond to operational activity: this is unambiguously adversarial tampering. Treat as data-breach posture per the (future) incident response plan.
What NOT to do
- Do not attempt to "repair" the chain by recomputing and rewriting. The mismatch IS the evidence; preserving it is the entire point of the hash chain.
- Do not silence the rule under any circumstance, including during operator-driven testing. If a test scenario requires a chain break, test against a separate database, not the production table.
- Do not extend the verification window beyond 24h on hourly runs (full table audit is weekly) — runtime grows linearly with table size; long-window hourly runs eventually exceed the run budget.