Raxx · internal docs

internal · gated

ADR 0063 — Scale Tier Latency Budget + Numeric Upgrade Triggers for Shadow-Table Writes

Status: Accepted
Date: 2026-05-09 UTC
Deciders: operator (Kristerpher), software-architect
Refs: customer-audit-unified/design.md §9, migration-plan.md §trigger-to-async, ADR-0059


Context

ADR-0059 chose synchronous Postgres triggers over application dual-write or WAL CDC for shadow table writes. The trade-off is trigger overhead: each INSERT/UPDATE/DELETE on a customer-touching primary table fires a trigger that synchronously inserts into the corresponding *_history table.

At pre-launch volume (hundreds of mutations per day), this overhead is negligible. At 10K customers with market-open burst write patterns, it becomes a measurable latency contribution. ADR-0059 noted "add WAL CDC as a scale-trigger upgrade path documented in a follow-up ADR." This is that ADR.

The operator decision is: start synchronous, watch one metric, upgrade on a numeric threshold — not on a subjective judgment call. Upgrade criteria must be specific enough that the team does not need to redesign under pressure when the threshold trips.


Decision

Three-tier upgrade path with explicit numeric thresholds:

Tier 1 — Synchronous triggers (v1 through ~1K customers)

Postgres AFTER triggers on customers, subscriptions, positions, trades, sessions, passkeys. Shadow INSERT fires within the same transaction as the primary mutation.

Metric to watch: audit.trigger.insert_p99_ms — Sentry custom measurement on the p99 INSERT latency of the trades table (the highest-volume table). Measured by the application, recorded weekly via sentry.set_measurement.

Capacity at this tier:

Metric v1 (100 customers) 1K customers
Sustained writes/sec 0.01 0.14
Burst writes/sec (market open) 0.5 5
Estimated trigger overhead per trade INSERT ~0.3–0.5 ms ~0.3–0.5 ms
Estimated p99 INSERT latency (trades table) < 5 ms < 8 ms

Tier 2 — Async shadow writes via pg_notify + background worker

Upgrade condition: p99 INSERT latency on trades table exceeds 50 ms sustained for 7 consecutive days.

The "7 consecutive days" window prevents reacting to transient spikes (hot Heroku dyno, network jitter). The 50 ms threshold is chosen such that the user experience of trade submission is not measurably degraded (typical perceived UI latency threshold is 100–200 ms; 50 ms leaves headroom).

What changes: - Triggers are replaced with NOTIFY calls: NOTIFY audit_shadow_channel, '<json_payload>'. - Background worker (jobs/shadow_writer.py) subscribes via LISTEN audit_shadow_channel. Inserts rows into *_history tables asynchronously. - Lag monitoring: queue depth measured as the worker's processed_at - notified_at delta. Alert if depth > 1,000 items (approximately 1 minute of sustained 10K-customer burst). - During cutover: brief dual-path window (both trigger + worker) to ensure no rows are dropped.

Trade-off accepted: shadow tables lag the primary table by up to ~100–500 ms under load. This is documented in runbooks: "shadow tables have eventual-consistency lag under high write load."

Capacity at this tier:

Metric 1K customers 10K customers
Shadow write overhead on primary INSERT ~0.05 ms (notify only) ~0.05 ms
Worker throughput 1K rows/sec (single worker) 5K rows/sec (3 workers)
Max lag under burst ~0.5s ~2s

Tier 3 — WAL Change Data Capture (CDC)

Upgrade condition: customer_audit_events exceeds 100 million rows OR 10,000 active customers (first hit).

At this scale, even async worker overhead from pg_notify may not keep pace with write bursts. Postgres logical replication (WAL CDC) is the appropriate mechanism.

What changes: - Create a Postgres logical replication slot. - CDC consumer process (e.g., Debezium or a custom consumer) subscribes to the WAL stream. - Shadow rows are written by the CDC consumer; triggers are removed entirely. - customer_audit_events remains writer-endpoint-based (no change; CDC is only for shadow tables).

Trade-off accepted: WAL CDC adds a new infrastructure component (replication slot, consumer process, monitoring). This is justified at 10K customers; it is not justified pre-launch.

Decision criteria to promote: when either numeric threshold is met, the team has a documented 1–2 week path (not a redesign) to Tier 3. The schema remains compatible — shadow tables do not change, only the write mechanism changes.


Partitioning Thresholds

Threshold Action
customer_audit_events reaches 1 million rows Declare TimescaleDB hypertable with weekly chunks on at_utc
customer_audit_events reaches 100 million rows Enable hypertable compression on chunks > 90 days. Review chunk interval.
Any single *_history table reaches 1 million rows Add same TimescaleDB hypertable treatment

Partitioning is declarative: the threshold triggers a schema migration (Alembic), not a code change.


Consequences

Positive

Negative


Alternatives Considered

Upgrade on subjective judgment ("when it feels slow")

No numeric threshold; team upgrades when query performance becomes a concern.

Rejected: "when it feels slow" is undefined and creates debate under pressure. Numeric thresholds let the team make the upgrade decision proactively and without friction.

Pre-emptive async from Day 1

Ship Tier 2 (async shadow writes via pg_notify) from the start, bypassing Tier 1.

Rejected: introduces a background worker dependency and eventual-consistency lag before they are needed. At pre-launch scale, synchronous triggers add < 1 ms per mutation. The simpler system is correct until simplicity becomes the wrong tradeoff.

Skip Tier 2, go directly to WAL CDC at 10K customers

No intermediate pg_notify tier; upgrade directly from triggers to WAL CDC.

Rejected: the gap between 50 ms trigger latency and 10K customers / 100M rows may be many months. During that period the team has no upgrade path; the only option would be an emergency CDC deployment. The pg_notify intermediate tier provides a fast, low-risk upgrade path that can be executed in 1–2 days.