ADR 0059 — Shadow Tables: Postgres Triggers vs Application Dual-Write

Status: Accepted
Date: 2026-05-09 UTC
Deciders: software-architect
Refs: customer-audit-unified/design.md §3.2, ADR-0022

Context

Shadow tables (<table>_history) capture the full before/after row state on every INSERT, UPDATE, and DELETE to customer-touching primary tables. Two mechanisms can populate them:

Postgres triggers: a BEFORE/AFTER trigger function fires on the primary table, inserts a row into the shadow table, requires no application code change per primary table.
Application dual-write: every application path that mutates a primary table also explicitly calls the shadow writer. Changes must be wired at every callsite; a missed callsite silently produces an audit gap.

Decision

Postgres triggers (approach 1).

Each customer-touching primary table gets an AFTER INSERT OR UPDATE OR DELETE trigger that calls a table-specific fn_<table>_history() function. The function reads current_setting('app.current_actor', true) and current_setting('app.current_workflow_uuid', true) to carry actor identity and workflow UUID into the shadow row.

Application code is required to set these session-local variables at the start of each request transaction via SET LOCAL app.current_actor = '<value>'. This is a single middleware hook, not per-callsite wiring.

Consequences

Positive

Completeness guarantee: no application code path can mutate a primary table without the trigger firing. A missed callsite in the application cannot create an audit gap — the trigger is the invariant, not application discipline.
Separation of concerns: the trigger is the historian; the application is the actor. Adding a new code path that writes to trades does not require remembering to also write to trades_history.
Insider-threat resistance: a developer who adds a new mutation path cannot silently skip the shadow write. The trigger fires regardless of how the row is modified — including direct psql access (though that access is itself a break-glass event).
SET LOCAL app.current_actor is a single middleware change, not 50 callsite changes.

Negative

Trigger overhead: an AFTER trigger adds a synchronous write to every primary table mutation. At pre-launch volume (hundreds of mutations/day) this is negligible. At scale (10k+ mutations/day on positions or trades), this may add 1–3 ms per mutation. Scale-trigger mitigation: replace triggers with Postgres logical replication / WAL CDC at the point volume justifies it. The schema remains compatible.
Credential-value leakage risk: to_jsonb(NEW) captures the full row. If a primary table row contains a credential-adjacent field (e.g., a token column), it lands in the shadow table. Mitigation: primary table schema must never store credentials (enforced by ADR-0002 + CI grep). The CI lint gate from ADR-0002 is extended to cover primary table migrations — any new column with a name matching the deny-list fails the pipeline.
SET LOCAL context may not be set in all execution paths. If a mutation occurs outside a request context (e.g., a Celery task or a migration script that directly calls db.session.commit()), current_setting returns null and the shadow row records changed_by = null. This is acceptable — the mutation is still captured; the actor identity is unknown rather than fabricated. Celery tasks should set SET LOCAL app.current_actor = 'celery:<task_name>' as a best-practice convention.

Alternatives Considered

Application dual-write

Every application code path that mutates a primary table is also required to call a shadow writer function.

Rejected: completeness depends entirely on developer discipline. A single missed callsite (new feature, refactor, emergency hotfix) creates an undetected audit gap. The gap may not be noticed until a compliance review or a dispute that requires the shadow record. The trigger model makes the gap structurally impossible.

Postgres logical replication / WAL CDC (Change Data Capture)

A Postgres logical replication slot emits WAL events for every row mutation. A consumer process writes shadow rows asynchronously.

Rejected for v1: adds a new infrastructure component (replication consumer, replication slot management). Creates a lag window between mutation and shadow row availability (typically 100ms–1s, but could be longer under load). During the lag window, the shadow table is stale. For a forensic audit tool, "stale by up to 1 second" is usually acceptable, but the complexity cost is not justified at pre-launch scale. Flagged as the preferred migration path when trigger overhead becomes measurable.