Unified Customer Audit + Shadow-Table Architecture — v2
Status: Design v2 — operator + security feedback integrated; sub-cards pending filing
Owner: software-architect
Date: 2026-05-09 UTC
Supersedes: v1 (PR #1451)
Milestone: #6 — raxx.app v1 — first non-operator user
Related ADRs: 0058, 0059, 0060, 0061, 0062, 0063, 0064
Related designs: workflow-uuid-tracing.md, rbac-design.md, velvet/v2-rotation-flows.md, reasonator/design.md, docs/architecture/rbac-v2/design.md (parallel — see §RBAC V2 Dependency)
Security input: docs/security/customer-audit-unified-threat-model.md — all 18 threats, 4 blocking items integrated
Security issues: #1454 (HMAC chain), #1455 (role separation), #1456 (notification SLA), #1457 (RLS)
1. Context
Five audit streams exist today, each siloed with no common schema or join key:
| Stream | Table / mechanism | Writer |
|---|---|---|
| Customer-side viewer (#1050) | Raptor audit_log |
backend_v2 blueprints |
| Console operator audit | Console audit_log (write_audit service) |
Console blueprints, audit_ingest.py |
| Reasonator audit | Designed but unbuilt; will write to its own table | Reasonator service |
| Velvet rotation audit | velvet.audit Python logger (stdout → log drain) |
velvet/auth/audit.py |
| Per-blueprint logs | Python logger.* (no structured table) |
All surfaces |
This design converges all five into a single customer_audit_events table in Raptor's Postgres, introduces shadow tables for every customer-touching primary table, and defines the RBAC matrix, hash chain (HMAC-authenticated), notification paths, capacity tiers, and SOC-2 readiness posture.
v2 vs v1 changes in brief: HMAC-SHA-256 + AWS KMS replaces SHA-256 chain; Phase 0 role separation added; Postgres RLS added; ticket-state-aware two-path notification model; explicit per-action allowlist + deny-list for JSONB state diffs; numeric capacity thresholds with trigger paths; SOC-2 readiness scope added; full legacy back-fill (option D) replaces archive-then-drop; sub-card slate expanded from 13 to 19.
2. Invariants
The following are non-negotiable and take precedence over any design convenience.
| # | Invariant | Source |
|---|---|---|
| I-1 | No stored credentials. The audit schema never stores passkey IDs, OAuth tokens, broker API keys, session token values, or any replay-capable value. | ADR-0002 |
| I-2 | Append-only at the DDL level. REVOKE UPDATE, DELETE on customer_audit_events and all *_history shadow tables from all application roles. Only audit_archiver can DELETE, and only for retention expiry. |
ADR-0022 |
| I-3 | GDPR by default. Every PII field in audit rows is subject to retention limits, DSR erasure (pseudonymization within 30 days), portability, and DPA-ready logging. | ADR-0003 |
| I-4 | Audit trail for every state change that affects money, permissions, or data access — including operator reads of customer data. | System invariant |
| I-5 | Passkeys / WebAuthn only for authentication. RBAC governs authorization. This design does not alter the auth layer. | ADR-0001 |
| I-6 | RBAC governs access to audit dimensions. Feature flags gate UI surfaces; RBAC role grants gate data access. | ADR-0060 |
| I-7 | Operator audit access (Dimension 3) requires ticket linkage for support-role reads. Post-resolution reads (no ticket or closed ticket) are a security incident event, not a standard access. | This design §notification-paths |
| I-8 | Operator reads of customer data notify the customer. The notification path (welcoming vs. security-incident) is determined by the ticket state at the moment of the read. | Operator decision 2026-05-09; ADR-0061 |
| I-9 | Insider tamper resistance via HMAC-SHA-256 + AWS KMS. Application never holds the key material. An actor with DB write access cannot forge a valid HMAC without KMS access. | ADR-0058; #1454 |
| I-10 | All timestamps UTC. | feedback_utc_times.md |
| I-11 | The application runtime (raptor_app) is not the Heroku owner role. DATABASE_URL is the owner credential used only for migrations. Raptor connects as raptor_app, which has no DDL, no UPDATE/DELETE on audit tables. |
#1455; ADR-0058 §Phase 0 |
| I-12 | Customer notification has no opt-out. The notification is a privacy right (GDPR Art. 13/14 transparency obligation), not a preference. | Security-agent recommendation; operator confirmed |
3. Data Model
3.1 customer_audit_events (primary unified table)
CREATE TABLE customer_audit_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- v4 (random)
-- Dimension: which actor class produced this event
dimension TEXT NOT NULL
CHECK (dimension IN ('customer_self','system_automated','operator_interaction')),
-- Customer this event concerns
customer_id INTEGER NOT NULL REFERENCES users(id),
-- Who acted
actor_id TEXT NOT NULL,
actor_type TEXT NOT NULL
CHECK (actor_type IN ('customer','system_actor','operator_email')),
-- What happened (namespaced: domain.noun.verb, registered in allowlist)
action TEXT NOT NULL,
-- What was touched (deny-list + per-action allowlist enforced at write time)
target_resource JSONB,
before_state JSONB,
after_state JSONB,
-- Timing
at_utc TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Ticket linkage and ticket state (required for operator_interaction writes)
ticket_id TEXT,
ticket_state_at_read TEXT
CHECK (ticket_state_at_read IS NULL
OR ticket_state_at_read IN
('open','in_progress','pending','resolved','closed','none')),
-- Workflow correlation — UUID v4 (random, not time-ordered; see [ADR-0062](https://internal-docs.raxx.app/architecture/adr/0062-deny-list-plus-per-action-allowlist-state-diff-pii.html) §UUID)
replay_uuid TEXT,
-- Tamper evidence — HMAC-SHA-256 with AWS KMS key (not plain SHA-256)
event_hash TEXT NOT NULL, -- KMS HMAC-SHA-256 of canonical JSON of content fields
prev_event_hash TEXT, -- event_hash of previous event in per-customer chain
-- Schema evolution
schema_version SMALLINT NOT NULL DEFAULT 2
);
Indexes:
CREATE INDEX cae_customer_at ON customer_audit_events (customer_id, at_utc DESC);
CREATE INDEX cae_dimension ON customer_audit_events (customer_id, dimension, at_utc DESC);
CREATE INDEX cae_replay_uuid ON customer_audit_events (replay_uuid) WHERE replay_uuid IS NOT NULL;
CREATE INDEX cae_ticket ON customer_audit_events (ticket_id) WHERE ticket_id IS NOT NULL;
CREATE INDEX cae_action ON customer_audit_events (action, at_utc DESC);
CREATE INDEX cae_ticket_state ON customer_audit_events (ticket_state_at_read, at_utc DESC)
WHERE dimension = 'operator_interaction';
Role grants (run as migrations owner, after Phase 0 role creation):
-- Phase 0: raptor_app is not the owner; it gets restricted rights only
REVOKE ALL ON customer_audit_events FROM raptor_app;
GRANT INSERT, SELECT ON customer_audit_events TO raptor_app;
-- audit_archiver: DELETE for retention only (no UPDATE)
REVOKE ALL ON customer_audit_events FROM audit_archiver;
GRANT SELECT, DELETE ON customer_audit_events TO audit_archiver;
-- raptor_migrations: DDL only, no INSERT on audit tables (prevents forged historical rows)
REVOKE INSERT ON customer_audit_events FROM raptor_migrations;
ticket_state_at_read column: populated synchronously at write time for operator_interaction events only. The writer service queries the FreeScout ticket-state cache (§ticket-state-tracking) before inserting. The value determines which notification path fires (§notification-paths).
Fields deliberately absent: email addresses in plain text, broker API keys or token prefixes, passkey credential IDs, full IP addresses (prefix /24 only in target_resource if relevant), raw session token values.
3.2 Shadow Tables
Same shape as v1. Each *_history table captures the full row on every INSERT, UPDATE, DELETE via Postgres triggers. See v1 §3.2 for schema; the only change is the role grants:
REVOKE ALL ON <table>_history FROM raptor_app;
GRANT INSERT, SELECT ON <table>_history TO raptor_app;
GRANT SELECT, UPDATE ON <table>_history TO audit_archiver; -- UPDATE for DSR pseudonymization
The audit_archiver role gets UPDATE on shadow tables (not on customer_audit_events) because DSR pseudonymization must overwrite row_data JSONB fields in history rows. This is a controlled exception to the append-only invariant, scoped to the audit_archiver role and audited by pg_audit.
Shadow tables required:
| Primary table | Shadow table |
|---|---|
users |
users_history |
subscriptions |
subscriptions_history |
positions |
positions_history |
trades |
trades_history |
sessions |
sessions_history |
passkeys |
passkeys_history |
4. Deny-List + Per-Action Allowlist for State Diffs
Every before_state and after_state JSONB value written to customer_audit_events passes through two gates at the writer service layer (see ADR-0062):
Gate 1 — Global Deny-List (always applied, cannot be overridden)
Any key matching this list is replaced with "<REDACTED>" regardless of the registered action allowlist:
AUDIT_GLOBAL_DENY_LIST = {
"email", "password", "password_hash", "token", "secret", "api_key",
"api_secret", "credential", "passkey", "passkey_id", "webauthn_credential_id",
"seed", "otp", "mfa_secret", "totp_secret", "nonce", "private_key",
"bank_account", "bank_routing", "account_number", "ssn", "tax_id",
"dob", "date_of_birth", "card_number", "cvv", "event_hash", "prev_event_hash",
}
If any denied key is found in the payload, the writer logs a Sentry WARNING with the key name (not value) before stripping it. Repeated occurrences from the same code path escalate to ERROR to catch registration drift.
Gate 2 — Per-Action Allowlist (required for every registered action namespace)
Each action namespace registers an explicit allowlist of fields permitted in before_state / after_state. If an action namespace has no registered allowlist, the write is rejected with 422.
# Example registrations (non-exhaustive)
AUDIT_ACTION_ALLOWLISTS = {
"trade.submit": {"symbol", "quantity", "side", "order_type", "limit_price", "status"},
"trade.cancel": {"trade_id", "symbol", "reason", "status"},
"system.paper_gate.pass": {"cycles_profitable", "threshold", "result"},
"system.paper_gate.fail": {"cycles_profitable", "threshold", "result"},
"customer.data.read.in_ticket": {"ticket_id", "ticket_state", "data_scope"},
"customer.data.read.post_resolution": {"customer_id_hash", "severity"},
"passkey.add": {"credential_display_name", "aaguid"},
"passkey.remove": {"credential_display_name"},
"session.revoke": {"session_id", "reason"},
"operator.rbac.grant": {"role", "group", "actor_hash"},
"operator.rbac.revoke": {"role", "group", "actor_hash"},
# ... registered per surface, validated in CI
}
CI enforcement: a lint job verifies that every action string used in application code (grepped by audit_action= or "action": in audit writer calls) has a corresponding entry in AUDIT_ACTION_ALLOWLISTS. A missing registration fails the pipeline. See SC-A17.
Any field in before_state/after_state that is NOT on the allowlist (and not on the deny-list) is also replaced with "<REDACTED>". This ensures new fields added to primary tables do not silently flow into audit rows.
5. Tamper Evidence — HMAC-SHA-256 + AWS KMS
SHA-256 chain alone is insufficient: an actor with DB write access can delete rows N–M, insert forged rows, and recompute a valid SHA-256 chain. The integrity checker cannot distinguish the forged chain from the original. This is T-PEN-3 (composite score 9) from the threat model.
v2 mechanism: HMAC-SHA-256 with AWS KMS (#1454):
event_hash[n] = KMS.GenerateMac(canonical_json(content_fields(row[n])), KeyId=AUDIT_KMS_KEY_ARN)
prev_event_hash[n] = event_hash[n-1] (for genesis: KMS.GenerateMac("genesis:" + customer_id))
content_fields: id, dimension, customer_id, actor_id, actor_type, action, target_resource, before_state, after_state, at_utc, ticket_id, ticket_state_at_read, replay_uuid, schema_version. Excludes event_hash, prev_event_hash.
Canonical JSON: keys sorted alphabetically, values as UTF-8 strings, no whitespace.
KMS configuration:
- Key type: HMAC_SHA_256 symmetric key in AWS KMS
- Key ARN stored in Infisical at /raxx/prod/audit/AUDIT_KMS_KEY_ARN
- IAM policy: Raptor's execution role has kms:GenerateMac + kms:VerifyMac only. No kms:Decrypt, no kms:CreateKey.
- Cost: ~$1/month key + ~$1.07/month API calls at 1K customers. Approved as v1 dependency.
- Key rotation: quarterly, via KMS automatic rotation. Old key versions are retained for historical verification.
Verification: nightly integrity checker re-derives HMAC via kms:VerifyMac for all customers with events in the last 24h. First of each calendar month: full-chain verification from genesis. Any failure triggers a severity:critical breach-notification pipeline event.
GDPR pseudonymization + chain: pseudonymization anchor event recorded at break points per v1 design.
6. Postgres Row Level Security (RLS)
Per #1457 and T-PEN-8 (cross-customer audit leak):
-- Enable RLS on all audit tables
ALTER TABLE customer_audit_events ENABLE ROW LEVEL SECURITY;
ALTER TABLE customer_audit_events FORCE ROW LEVEL SECURITY;
-- Policy for raptor_app: can only see rows for the current customer
CREATE POLICY cae_customer_isolation ON customer_audit_events
FOR ALL TO raptor_app
USING (customer_id = current_setting('app.current_customer_id', true)::int);
-- Policy for audit_archiver: unrestricted (runs full-table scans for archival)
CREATE POLICY cae_archiver_all ON customer_audit_events
FOR ALL TO audit_archiver
USING (true);
-- Policy for raptor_audit_compliance role: unrestricted read (SOC-2 auditor)
CREATE POLICY cae_compliance_read ON customer_audit_events
FOR SELECT TO raptor_audit_compliance
USING (true);
Raptor sets SET LOCAL app.current_customer_id = <id> at the start of every customer-scoped request transaction. Admin-scoped requests (support, admin roles) set app.current_customer_id to the explicitly requested customer ID; the RLS policy still applies, and RBAC above it enforces whether that customer ID is permitted.
Defense-in-depth: the application also asserts post-query that all returned rows have customer_id == requested_customer_id. If this assertion fires (which RLS should prevent), a CRITICAL Sentry event is raised and the response returns 500.
Overhead: ~5–15% query overhead on filtered scans. Acceptable at all scale tiers.
7. Notification Paths — Ticket-State-Aware Two-Path Model
This is a fundamental redesign from v1's single nightly-batch notification. See ADR-0061.
The discriminator is ticket_state_at_read, a column on customer_audit_events populated synchronously at write time by querying the FreeScout ticket-state cache.
Path A — In-ticket read (welcoming)
Condition: ticket_state_at_read IN ('open', 'in_progress', 'pending')
Action:
1. Audit row written with action = 'customer.data.read.in_ticket'.
2. Postmark transactional email dispatched within 5 minutes via event-driven job (not nightly batch).
3. Tone: welcoming trust signal. Template: "Heads up — our support team accessed your account data while working your ticket [TICKET-ID]. This is a transparency notice — you're in good hands."
4. No Sentry alert. This is expected behavior.
Rationale: the customer opened the ticket. This read is part of the service they requested. Frame it as "we're working for you, here's the receipt."
Path B — Post-resolution / no-ticket read (security incident)
Condition: ticket_state_at_read IN ('resolved', 'closed', 'none') — or ticket_id IS NULL for admin reads
Action:
1. Audit row written with action = 'customer.data.read.post_resolution', severity = 'incident'.
2. Sentry CRITICAL alert fired immediately with: operator identity (email hash), customer ID, timestamp, ticket_id (if any).
3. Customer notified by email within 5 minutes, but with security incident framing (not welcoming). Template: "We detected that your account data was accessed outside of an active support session. Please review your account and contact ops@raxx.app if you have questions."
4. Architecture supports future manager-routing + HR-packet hooks; v1 emits the audit row + Sentry alert only.
Rationale: post-resolution access has no legitimate purpose. It is a privacy violation regardless of intent.
Ticket-state tracking dependency
The writer service must know the FreeScout ticket state at the moment of the read. Two options were considered:
- (a) Call FreeScout API synchronously at audit-write time — adds FreeScout as a synchronous dependency on every operator read. FreeScout downtime blocks audit writes.
- (b) Cache ticket state in Raptor's DB, refreshed by FreeScout webhook — decouples FreeScout availability from audit write path. Cache staleness risk: up to the webhook delay (typically seconds; worst case FreeScout webhook failure = cache becomes stale).
Decision: option (b), webhook-driven cache. See §ticket-state-tracking.
§ticket-state-tracking
Raptor maintains a freescout_ticket_cache table:
CREATE TABLE freescout_ticket_cache (
ticket_id TEXT PRIMARY KEY,
customer_id INTEGER REFERENCES users(id),
status TEXT NOT NULL, -- open | in_progress | pending | resolved | closed
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ttl_expires TIMESTAMPTZ NOT NULL -- cache expires 24h after last update
);
FreeScout sends webhook events to POST /api/internal/freescout-webhook on every conversation state change. Raptor upserts the cache row. The writer service reads from this cache (single indexed lookup) at audit-write time.
Fail-safe: if the cache row is missing or expired, treat as ticket_state_at_read = 'none' (Path B, security incident). Fail-closed.
This means: if FreeScout is down during a support read, the notification goes to Path B (security incident), not Path A. This is intentional: a known cost of the fail-closed design. The writer surface (Console) is responsible for refreshing ticket state before performing a read.
8. RBAC Matrix
See ADR-0060 (updated). RBAC V2 is a hard dependency for Phase 3 reader cutover. A parallel architect agent is designing RBAC V2 at docs/architecture/rbac-v2/. SC-A8 (reader endpoint) is marked needs:RBAC-V2.
| Actor | Dim 1 (customer_self) | Dim 2 (system_automated) | Dim 3 (operator_interaction) | Notification path |
|---|---|---|---|---|
| Customer (self) | Own events | Own system events | Own dim-3 events when ticket active | N/A |
| Support agent | Scoped to ticket customer | Scoped to ticket customer | Own dim-3 actions on this customer | Path A/B per ticket state |
| Superadmin | All customers | All customers | All customers, no ticket required | Path B always (no active ticket) |
| Auditor (compliance) | All customers | All customers | All customers | No notification (SOC-2 auditor reads are aggregate, not per-customer) |
New RBAC roles (to be implemented in RBAC V2):
| Role | Composes into group | Notes |
|---|---|---|
antlers-audit-self |
All antlers-user group members |
|
raptor-audit-support |
raxx-support-team |
ticket required for dim-3 |
raptor-audit-admin |
raxx-platform-admins |
always triggers Path B notification |
raptor-audit-compliance |
raxx-auditor-team (new group) |
SOC-2 auditor; no customer notification |
9. Capacity Planning
ADR-0063 documents the detailed tier decision. Summary table here.
Row volume model:
| Source | Events/customer/month |
|---|---|
| Customer self actions (trades, settings, logins) | ~200 |
| System automated (scheduled IC, paper-gate, session expiry) | ~150 |
| Operator interactions | ~5 |
| Total | ~355 |
Capacity table (key metrics by scale tier):
| Metric | v1 (100 customers) | 1K customers | 10K customers |
|---|---|---|---|
| Rows/month | 35,500 | 355,000 | 3,550,000 |
| Sustained writes/sec (avg) | 0.01 | 0.14 | 1.37 |
| Burst writes/sec (market open) | 0.5 | 5 | 50 |
| Trigger overhead per trade INSERT | ~0.5 ms | ~0.5 ms | ~1–2 ms |
| p99 INSERT latency (trades, estimated) | <5 ms | <5 ms | <15 ms |
| p99 INSERT latency budget (alert threshold) | 50 ms | 50 ms | 50 ms |
| Partitioning strategy | Indexes only | Declare hypertable, weekly chunks | Hypertable active + compression |
| Table rows at 1 year | ~425K | ~4.26M | ~42.6M |
| Partition threshold (declare hypertable) | 1M rows | — | — |
| Async promotion threshold | p99 > 50ms sustained 7 days | — | — |
| WAL CDC promotion threshold | 100M rows OR 10K customers | — | — |
Trigger-promotion path (synchronous → async):
- v1: synchronous Postgres triggers on
trades,positions,users,subscriptions,sessions,passkeys. Low overhead at v1 volume. - Metric to watch from Day 1: p99 INSERT latency on
tradesandpositionstables, via Sentry custom metricaudit.trigger.insert_p99_ms. Weekly alert if trending up. - Numeric threshold: when p99 INSERT latency > 50ms sustained for 7 days → convert shadow writes to async via
pg_notify+ background worker. The refactor path is documented inmigration-plan.md §trigger-to-async. Estimated 1–2 dev-days to execute. - WAL CDC promotion: at 100M rows OR 10K customers (first hit). Replace triggers with Postgres logical replication consumer. This removes the synchronous write overhead entirely.
Storage estimates (uncompressed):
| Scale tier | customer_audit_events |
Shadow tables (all 6) | Total hot storage |
|---|---|---|---|
| 100 customers × 1 yr | ~851 MB | ~510 MB | ~1.4 GB |
| 1K customers × 1 yr | ~8.5 GB | ~5.1 GB | ~13.6 GB |
| 10K customers × 1 yr | ~85 GB | ~51 GB | ~136 GB |
Timescale compression on chunks >90 days old: 8–12x on repeated customer_id + dimension + action patterns. Effective hot storage at 1K customers: ~1.1–1.7 GB.
10. SOC-2 Readiness
Operator decision 2026-05-09: SOC-2 is in v1 scope. The following land in this design's implementation slate:
| SOC-2 element | Design mechanism | Sub-card |
|---|---|---|
raptor-audit-compliance role |
RBAC V2 role, read-only, no customer notification | SC-A18 (RBAC V2 dependency) |
| 7-year cold storage SLA | ADR-0058 updated; S3 Glacier Instant Retrieval | SC-A13 |
| Tamper evidence (HMAC + KMS) | §5 above | SC-A11 |
| Quarterly attestation runbook | docs/ops/runbooks/audit-soc2-attestation.md |
SC-A16 |
| Auditor reads logged, no customer notification | RBAC policy (compliance role exempted from Path A/B notification) | SC-A18 |
| Departing employee deprovisioning runbook | docs/ops/runbooks/audit-deprovisioning.md |
SC-A15 |
| pg_audit extension | Independent Postgres-level DML log to external sink | SC-A14 |
Retention SLA (ADR-0058 updated): 7 years cold storage for trade-affecting events, to align with SEC 17a-4 posture pending BLR confirmation. Non-financial events: 2 years cold. Hot window: 30 days for all classes.
11. Phase 0 Role Separation
Status: complete (2026-05-19 UTC). ADR-0099 locked. SC-1 through SC-3 shipped. SC-4 (GRANT/REVOKE migration + REVOKE verification test, #2524) is in progress; SC-5 (this runbook, #2525) filed 2026-05-20.
See ADR-0099 (docs/architecture/adr/0099-raptor-app-least-privilege-role.md), ADR-0058 §Phase 0, and #1455. This is a new phase inserted before Phase 1.
Problem: Heroku's DATABASE_URL connects as the provisioned owner, which has superuser-equivalent access. Every REVOKE in the design is a paper control unless the application uses a restricted raptor_app role at runtime.
What Phase 0 creates:
- raptor_app Postgres role (LOGIN, no superuser, no CREATEDB, no CREATEROLE). Provisioned via heroku pg:credentials:create (not raw SQL — RDS restriction per ADR-0099 D1).
- raptor_app connection string (RAPTOR_APP_DATABASE_URL) stored in Infisical at /MooseQuest/raxx/<env>/RAPTOR_APP_DATABASE_URL (both prod and staging).
- Raptor's application code reads RAPTOR_APP_DATABASE_URL at startup (not DATABASE_URL). Controlled by FLAG_RAPTOR_APP_ROLE_SEPARATION. Fail-fast if flag is on and RAPTOR_APP_DATABASE_URL is absent in Postgres mode (ADR-0099 D4).
- DATABASE_URL is used only by Alembic migrations (owner-level access needed for DDL).
- raptor_migrations role: DDL permissions, no INSERT on audit tables.
Operational SOP: docs/ops/runbooks/raptor-db-credentials.md
Phase 0 is complete when SC-4 (#2524) is deployed to staging and verified. Phase 1 schema creation is blocked on Phase 0 completion.
12. Rollout Plan
| Phase | Gate | Flag | Description |
|---|---|---|---|
| Phase 0 | — | None | Role separation: raptor_app role created, Raptor switches to restricted connection string. Blocking prerequisite for Phase 1. |
| Phase 1 | Phase 0 complete | None | Schema dark: tables, triggers, shadow tables, roles, RLS, pg_audit extension. No writers, no readers. |
| Phase 2 | Phase 1 complete | FLAG_UNIFIED_AUDIT_DUAL_WRITE |
Dual-write: all 5 legacy writers also write to unified table. Legacy = source of truth for reads. Back-fill historical rows from legacy sources. |
| Phase 3 | RBAC V2 live + all 6 prerequisites green | FLAG_UNIFIED_AUDIT_READ |
Reader cutover: customer viewer + console operator viewer read from unified table. |
| Phase 4 | Phase 3 soak 90 days + operator approval | None (one-shot) | Legacy table drop: all 5 legacy streams back-filled; legacy tables dropped. |
Phase 3 prerequisites (all must be green before FLAG_UNIFIED_AUDIT_READ is enabled):
1. Phase 0 role separation (#1455) — complete
2. Postgres RLS (#1457) — complete
3. RBAC V2 — live in production
4. HMAC chain (#1454) — complete
5. UUID v4 for replay_uuid — complete
6. Mandatory date-range guard — complete
7. Per-customer rate limit — complete
FLAG_UNIFIED_AUDIT_READ must have a console_flag_promotions row per B1 enforcement. FLAG_UNIFIED_AUDIT_DUAL_WRITE likewise.
Reader safety: every reader endpoint stays behind FLAG_UNIFIED_AUDIT_READ until all 7 prerequisites are simultaneously green. There is no partial enablement path.
13. Security Considerations
GDPR + Infra Checklist:
- PII collected:
customer_id(PII-adjacent integer),actor_id(customer: user ID; operator: email hash truncated to 16 hex; system: subsystem name).target_resourceand state diff JSONB — protected by deny-list + action allowlist. - Retention: 30-day hot → 7-year cold (trade-affecting) / 2-year cold (non-financial). Pending BLR on SEC 17a-4 WORM requirement.
- DSR erasure: pseudonymize
customer_id+actor_id(customer type) within 30 days. Shadow tablerow_dataJSONB fields also pseudonymized viaaudit_archiverUPDATE. Hash chain records pseudonymization anchor. - Audit of the audit: not self-referential. Hash chain integrity failures write to
audit_integrity_log. - Stored credentials: none. CI grep from ADR-0002 extended to all audit migration files.
- Breach notification: if
customer_audit_eventsis exfiltrated, GDPR Art. 33 notification within 72 hours.customer_id+ action sequence = personal data. - Secret storage:
AUDIT_KMS_KEY_ARNin Infisical.AUDIT_INGEST_TOKENin AWS SSM.audit_archiverSSM credential rotatable without redeploy. All AWS-resident workload secrets in SSM perfeedback_aws_workloads_use_ssm_not_vault.md. - Kill-switch:
FLAG_UNIFIED_AUDIT_DUAL_WRITE=0stops new unified writes. Legacy tables continue.FLAG_UNIFIED_AUDIT_READ=0reverts readers to legacy. - No opt-out from notifications: I-12. Customer notification is a right, not a preference.
Security findings integrated from threat model:
| Threat | Countermeasure | Phase |
|---|---|---|
| T-PEN-3 (chain forgery) | HMAC-SHA-256 + KMS | Phase 1 (before writes) |
| T-PEN-2 (owner-role bypass) | Phase 0 role separation | Phase 0 |
| T-PEN-8 (cross-customer leak) | Postgres RLS | Phase 1 |
| T-PEN-5 (UUID enumeration) | UUID v4 for replay_uuid | Phase 1 |
| T-PEN-7 (write DoS) | Per-customer 100 writes/min rate limit | Phase 2 (writer) |
| T-SCALE-1 (volume DoS on reads) | Mandatory date-range guard (max 90 days) | Phase 3 (reader) |
| T-INS-1 (curiosity browsing) | Ticket-state-aware Path A/B notification | Phase 2 |
| T-INS-3 (row injection) | HMAC + action allowlist CI gate | Phase 1 |
| T-INS-7 (departing employee) | Deprovisioning runbook SC-A15 | SC-A15 |
| T-PEN-6 (TOCTOU ticket window) | Webhook cache + fail-closed on miss | Phase 2 |
| T-INS-6 (agent bot over-read) | RLS per service account | Phase 1 |
| T-INS-4 (bulk exfiltration) | pg_audit → external sink | Phase 1 |
| T-PEN-4 (PII in shadow tables) | audit_archiver pseudonymization on DSR | Phase 3 |
14. Open Questions (require decision before Phase 3)
-
BLR: SEC 17a-4 WORM requirement. Does Raxx's trade-audit cold storage need S3 Object Lock (WORM) to satisfy SEC Rule 17a-4(f)? If yes, the archival strategy changes from standard Glacier to Object Lock Compliance mode, making archives immutable even to the operator. Decide before Phase 4 (legacy drop).
-
BLR: GDPR Art. 4(1) scope of pseudonymized data. After pseudonymizing
customer_idandactor_id, does the remaining behavioral data still constitute personal data? If yes, retention limits apply even to anonymized events, potentially conflicting with the 7-year cold storage target. -
BLR: GDPR data-controller vs. data-processor posture. Raxx is almost certainly the controller for v1. Confirm before first customer onboard.
15. Sequence Diagrams
15.1 Operator Read — Path A (In-Ticket, Welcoming)
sequenceDiagram
participant O as Operator (Console)
participant R as Raptor API
participant FSC as FreeScout Cache
participant CAE as customer_audit_events
participant PM as Postmark
O->>R: GET /api/customer-audit/42 (operator bearer)
R->>FSC: SELECT status FROM freescout_ticket_cache WHERE customer_id=42
FSC-->>R: {ticket_id: "T-88", status: "open"}
R->>CAE: INSERT (dim=operator_interaction, action=customer.data.read.in_ticket,\n ticket_id=T-88, ticket_state_at_read=open,\n event_hash=KMS.HMAC(...))
R-->>O: 200 {events: [...]}
R->>PM: dispatch welcoming notification email (within 5 min SLA)
15.2 Operator Read — Path B (Post-Resolution, Security Incident)
sequenceDiagram
participant O as Operator (Console)
participant R as Raptor API
participant FSC as FreeScout Cache
participant CAE as customer_audit_events
participant S as Sentry
participant PM as Postmark
O->>R: GET /api/customer-audit/42 (operator bearer)
R->>FSC: SELECT status FROM freescout_ticket_cache WHERE customer_id=42
FSC-->>R: {ticket_id: "T-88", status: "resolved"}
R->>CAE: INSERT (dim=operator_interaction, action=customer.data.read.post_resolution,\n ticket_state_at_read=resolved, severity=incident,\n event_hash=KMS.HMAC(...))
R->>S: capture_message(CRITICAL, operator=hash, customer=42)
R-->>O: 200 {events: [...]}
R->>PM: dispatch security incident notification email (within 5 min SLA)
15.3 HMAC Write + Nightly Verify
sequenceDiagram
participant App as Raptor (raptor_app role)
participant KMS as AWS KMS
participant CAE as customer_audit_events
participant Chk as integrity_checker (job)
App->>KMS: GenerateMac(canonical_json, KeyId=ARN)
KMS-->>App: HMAC value
App->>CAE: INSERT (..., event_hash=HMAC, prev_event_hash=prev_HMAC)
Note over Chk: Nightly job (and monthly full-chain)
Chk->>CAE: SELECT events for customer ORDER BY at_utc
loop each event
Chk->>KMS: VerifyMac(canonical_json, MAC=event_hash)
KMS-->>Chk: VALID / INVALID
end
alt any INVALID
Chk->>Sentry: CRITICAL breach-notification event
end
RBAC V2 dependency note: SC-A8 (reader endpoint with RBAC + ticket gate) is explicitly gated on RBAC V2 landing. If RBAC V2 design lands before this PR merges, update SC-A8's body in migration-plan.md to reference the concrete RBAC V2 role/endpoint. If this PR lands first, SC-A8 references RBAC V2 abstractly; the RBAC V2 architect's PR will cross-link.