Raxx · internal docs

internal · gated

SC-A9 Audit Back-Fill Runbook

Issue: #1489 | Flag: FLAG_AUDIT_BACKFILL | Job: backend_v2/jobs/audit_backfill.py

Overview

The SC-A9 back-fill job migrates historical audit rows from five legacy streams into customer_audit_events before any legacy table can be dropped. This is a one-shot operator-triggered job, not a recurring cron. It is idempotent and resumable.

Option D semantics: Back-filled rows carry schema_version=1 and form a per-customer HMAC sub-chain segregated from the live (schema_version=2) chain. Live rows are never modified.


Prerequisites

Before enabling the flag and running the job:

  1. FLAG_UNIFIED_AUDIT_DUAL_WRITE=1 must be active on the target app (SC-A5, #1485).
  2. ADMIN_SERVICE_TOKEN must be set in Heroku config.
  3. For the Velvet stream: AWS_S3_LOG_DRAIN_BUCKET and AWS_S3_LOG_DRAIN_PREFIX.
  4. For the Console stream: CONSOLE_DATABASE_URL.

Running the job

curl -X POST \
  "https://api.raxx.app/api/internal/jobs/audit-backfill" \
  -H "X-Admin-Service-Token: $ADMIN_SERVICE_TOKEN"

Optional query parameters:

Param Values Default Effect
dry_run true / false false Compute HMACs, check idempotency — no INSERTs
streams comma-separated: raptor,console,velvet all Limit which streams to process

Sources and exclusions

Stream Source Included
Raptor audit_log WHERE actor_user_id IS NOT NULL Yes
Console audit_log WHERE target_type IN ('raptor_user', 'customer', 'billing_customer') Yes
Velvet S3 log drain archive (velvet_rotation_audit lines with customer_id) Yes
Reasonator n/a No — unbuilt pre-dual-write; no historical rows exist
Per-blueprint logs n/a No — not structured audit events

Idempotency window

Before each INSERT the job checks:

SELECT 1 FROM customer_audit_events
WHERE customer_id = :cid
  AND action      = :action
  AND at_utc      >= :at - INTERVAL '1 second'
  AND at_utc      <= :at + INTERVAL '1 second'

If a matching row exists, the source row is skipped. Re-running the job produces zero duplicate rows.

Known limitation (SC-A9 Finding 2)

The 1-second deduplication window can silently drop legitimately-distinct high-frequency legacy events such as session.refresh and session.issue when two or more events from the same customer share the same action and fall within the same 1-second bucket.

This is accepted for pre-launch historical data. The events lost are session-token churn — low forensic value relative to the complexity of a narrower per-row-hash deduplication scheme.

Post-launch mitigation options (not in scope for initial back-fill): - Switch to a per-row hash check (hash of legacy row PK + stream name) stored in a separate audit_backfill_seen table. - Narrow the window to 0 seconds with a compound unique index.

If precise session-event cardinality matters in a post-launch back-fill, use ?dry_run=true to count the rows that would be skipped, then decide whether to accept the loss or implement a finer deduplication key before running.


HMAC chain (option D)

Back-filled rows form a per-customer sub-chain:

The SC-A11 monthly full-chain checker recognises the expected schema_version 1→2 boundary with prev_event_hash = NULL and does not alert on it. See backend_v2/jobs/audit_integrity_check.py _verify_single_customer_chain.


Completion

On completion, a row is written to audit_archival_runs with run_type = 'back_fill'. Verify with:

SELECT * FROM audit_archival_runs
WHERE run_type = 'back_fill'
ORDER BY started_at DESC
LIMIT 5;

Refs: #1489 (SC-A9), #1485 (SC-A5 dual-write), #1487 (SC-A7 Velvet dual-write),

1491 (SC-A11 integrity checker), #1465 (audit v2 design).