ADR 0059 — Shadow Tables: Postgres Triggers vs Application Dual-Write
Status: Accepted
Date: 2026-05-09 UTC
Deciders: software-architect
Refs: customer-audit-unified/design.md §3.2, ADR-0022
Context
Shadow tables (<table>_history) capture the full before/after row state on every INSERT, UPDATE, and DELETE to customer-touching primary tables. Two mechanisms can populate them:
- Postgres triggers: a
BEFORE/AFTERtrigger function fires on the primary table, inserts a row into the shadow table, requires no application code change per primary table. - Application dual-write: every application path that mutates a primary table also explicitly calls the shadow writer. Changes must be wired at every callsite; a missed callsite silently produces an audit gap.
Decision
Postgres triggers (approach 1).
Each customer-touching primary table gets an AFTER INSERT OR UPDATE OR DELETE trigger that calls a table-specific fn_<table>_history() function. The function reads current_setting('app.current_actor', true) and current_setting('app.current_workflow_uuid', true) to carry actor identity and workflow UUID into the shadow row.
Application code is required to set these session-local variables at the start of each request transaction via SET LOCAL app.current_actor = '<value>'. This is a single middleware hook, not per-callsite wiring.
Consequences
Positive
- Completeness guarantee: no application code path can mutate a primary table without the trigger firing. A missed callsite in the application cannot create an audit gap — the trigger is the invariant, not application discipline.
- Separation of concerns: the trigger is the historian; the application is the actor. Adding a new code path that writes to
tradesdoes not require remembering to also write totrades_history. - Insider-threat resistance: a developer who adds a new mutation path cannot silently skip the shadow write. The trigger fires regardless of how the row is modified — including direct
psqlaccess (though that access is itself a break-glass event). SET LOCAL app.current_actoris a single middleware change, not 50 callsite changes.
Negative
- Trigger overhead: an
AFTERtrigger adds a synchronous write to every primary table mutation. At pre-launch volume (hundreds of mutations/day) this is negligible. At scale (10k+ mutations/day onpositionsortrades), this may add 1–3 ms per mutation. Scale-trigger mitigation: replace triggers with Postgres logical replication / WAL CDC at the point volume justifies it. The schema remains compatible. - Credential-value leakage risk:
to_jsonb(NEW)captures the full row. If a primary table row contains a credential-adjacent field (e.g., a token column), it lands in the shadow table. Mitigation: primary table schema must never store credentials (enforced by ADR-0002 + CI grep). The CI lint gate from ADR-0002 is extended to cover primary table migrations — any new column with a name matching the deny-list fails the pipeline. SET LOCALcontext may not be set in all execution paths. If a mutation occurs outside a request context (e.g., a Celery task or a migration script that directly callsdb.session.commit()),current_settingreturns null and the shadow row recordschanged_by = null. This is acceptable — the mutation is still captured; the actor identity is unknown rather than fabricated. Celery tasks should setSET LOCAL app.current_actor = 'celery:<task_name>'as a best-practice convention.
Alternatives Considered
Application dual-write
Every application code path that mutates a primary table is also required to call a shadow writer function.
Rejected: completeness depends entirely on developer discipline. A single missed callsite (new feature, refactor, emergency hotfix) creates an undetected audit gap. The gap may not be noticed until a compliance review or a dispute that requires the shadow record. The trigger model makes the gap structurally impossible.
Postgres logical replication / WAL CDC (Change Data Capture)
A Postgres logical replication slot emits WAL events for every row mutation. A consumer process writes shadow rows asynchronously.
Rejected for v1: adds a new infrastructure component (replication consumer, replication slot management). Creates a lag window between mutation and shadow row availability (typically 100ms–1s, but could be longer under load). During the lag window, the shadow table is stale. For a forensic audit tool, "stale by up to 1 second" is usually acceptable, but the complexity cost is not justified at pre-launch scale. Flagged as the preferred migration path when trigger overhead becomes measurable.