Status: Draft — awaiting Kristerpher's open-question answers before sub-cards are filed
Owner: software-architect
Date: 2026-04-29
Related ADRs: 0021, 0022, 0023
Related designs: rbac-design.md, session-engine.md, auth.md
Every user workflow in Raxx — placing a trade, running a backtest, viewing positions, onboarding — is today a black box at replay time. If something goes wrong, support has only request logs and fragmented DB state. Users have no self-service window into their own history. Audit trails exist for individual mutations (per ADR-0003) but there is no way to reconstruct what a user experienced — what they saw, what they clicked, what the system did on their behalf, and what a support agent saw when they joined.
This document designs the identifier model, storage architecture, replay mechanism, support transparency layer, and tamper-resistance posture that together give every actor in the system — user, support agent, admin — a coherent, auditable, and trustworthy timeline of any workflow.
The motivation spans three dimensions Kristerpher named explicitly:
The following constraints are non-negotiable and take precedence over any design convenience:
<app>-<resource>-<level> naming from rbac-design.md). New roles compose into existing groups; the trace system does not introduce a parallel permission mechanism.surface:ai. Events from the deterministic order-firing path are tagged surface:deterministic. Support and admin views must display this distinction.Five UUID types compose into a complete, reconstructable timeline for any workflow.
wfl_*)A UUID v7 (time-ordered) minted when a user begins a distinct workflow. A workflow is a discrete user intent: sign-up, place-trade, run-backtest, edit-strategy, view-positions, initiate-withdrawal.
Lifetime: starts at the first user action in the workflow; ends at explicit completion (success or failure terminal state), session expiry, or 4-hour hard ceiling. A new page navigation within the same intent does not start a new workflow — the frontend must carry the Workflow ID across navigations.
Minting: browser mints a UUID v7 on the first user-triggered action and stores it in the session context. It is forwarded in every subsequent request as X-Workflow-ID. Raptor validates the format but treats it as client-asserted; for sensitive flows (live-trade, erasure), Raptor countersigns the ID at first use so the server has a tamper-evident claim on it.
Schema tag: wfl_<uuid_v7>
act_*)A UUID v7 minted for every individual user-initiated action within a workflow (button click, form submit, API call). Parented to a Workflow ID.
Carries: workflow_id, user_id, session_id, action_type, surface_tag (ai/deterministic), ts_emitted (client UTC), ts_received (server UTC).
Schema tag: act_<uuid_v7>
sys_*)A UUID v7 minted when Raxx acts on the user's behalf: scheduled trade fire, cron-triggered polling, paper-gate evaluation, automatic position roll. These are not user-initiated but are caused by user configuration.
Carries: originating_workflow_id (the workflow that created the configuration that triggered this action, may be null for pure system events), originating_config_id (e.g., strategy ID), subsystem (e.g., mq-a:scheduler, raptor:paper-gate), surface_tag, ts_emitted.
Traceback rule: a system action that cannot trace back to a user configuration or a user workflow must name the subsystem and the trigger condition explicitly. No anonymous system actions.
Schema tag: sys_<uuid_v7>
rnd_*)A UUID v7 minted every time a view is rendered and delivered to any actor. This captures the exact data-state the actor saw at a given moment.
Carries: workflow_id (if the render is part of a workflow), actor_id (user, support agent, or admin), actor_role (RBAC role active at render time), view_name, data_mask_state (JSON: which fields were masked for this actor), ts_rendered.
Why Render IDs matter: when a support agent and user are looking at the same session simultaneously, each gets a separate Render ID. If a dispute arises about "what did support see," the Render ID provides a concrete anchor. See §8 for the security and GDPR implications.
Granularity decision: see ADR-0023. The proposal is per-view (one per page load / SSE push), not per-component or per-field. Per-field granularity is post-MVP.
Schema tag: rnd_<uuid_v7>
sup_*)A UUID v7 minted whenever a support agent takes an action "alongside" or "on behalf of" a user. This includes: opening a user's timeline in the support tool, replaying a session, sending a message in context, adjusting account state.
Carries: support_agent_id, target_user_id, action_type (view/replay/message/adjustment), linked_workflow_id (if the support action is in the context of a specific user workflow), ts_emitted.
Privacy note for users: users can see their own support-action events. They can see the support agent's display name and the action type, but not internal support notes or agent-internal state. See §7.
A complete "place-trade" workflow timeline might look like:
wfl_01HZ... (place-trade, user U, session S)
rnd_01HZ... (trade-entry-view rendered to user U)
act_01HZ... (user typed symbol: "SPY")
act_01HZ... (user submitted trade form)
sys_01HZ... (paper-gate evaluation — PASSED)
sys_01HZ... (order routing to broker — DISPATCHED)
rnd_01HZ... (trade-confirmation-view rendered to user U)
----
sup_01HZ... (support agent A opens this workflow in support tool — 3 days later)
rnd_01HZ... (trade-entry-view rendered to support agent A, data_mask: {broker_account_id: masked})
All of these share workflow_id = wfl_01HZ.... Querying by Workflow ID returns the full timeline in emission order.
See ADR-0021 for the full decision rationale. Summary:
Recommendation: Postgres with TimescaleDB extension (Timescale).
Reasoning:
- Raxx is on Heroku today with a Postgres add-on. Timescale Cloud or the Timescale Heroku add-on is a drop-in replacement, not a new infra paradigm.
- Provides time-series hypertables, automatic chunk-based partitioning, tiered storage (hot → cold), and efficient time-range queries without a full Postgres table scan.
- Column compression on cold chunks achieves 10–20× compression for append-only event rows with repeated user_id + workflow_id values.
- EXPLAIN ANALYZE for "reconstruct user's day" queries is a standard Postgres tool; no new query language to learn.
- Pre-launch budget: the Timescale free tier handles tens of millions of rows. Tiered storage to S3 for rows older than 90 days keeps hot-tier costs flat as history grows.
- No vendor lock-in on schema: if we outgrow Timescale, the data is Postgres-compatible.
4-week MVP scope (if we had to ship in 4 weeks): skip Timescale, append events to an event_log table in the existing Raptor SQLite with a ts column and a composite index on (user_id, ts). No partitioning, no compression. This is sufficient for pre-launch low volume and can be migrated to Timescale in a follow-up card without losing data.
4-month production scope: Timescale Cloud, hypertables, tiered storage, retention policies configured per ADR-0003 retention schedule.
-- All trace tables are Timescale hypertables partitioned on ts_emitted.
-- 'id' is UUID v7 (time-ordered); no separate auto-increment column needed.
CREATE TABLE trace_events (
id TEXT PRIMARY KEY, -- act_*, sys_*, sup_* UUID v7
event_type TEXT NOT NULL, -- 'user_action' | 'system_action' | 'support_action' | 'render'
workflow_id TEXT, -- wfl_* reference (nullable for system events)
user_id TEXT NOT NULL, -- target user (never support agent's user_id)
actor_id TEXT NOT NULL, -- who caused this event
actor_role TEXT, -- RBAC role active at event time
surface_tag TEXT, -- 'ai' | 'deterministic' | 'system' | 'support'
action_type TEXT NOT NULL, -- e.g. 'trade.submit', 'backtest.run', 'support.view'
view_name TEXT, -- for render events
data_mask_json TEXT, -- JSON: masked field names for render events
context_json TEXT, -- sanitized context (no credentials, no raw PII)
subsystem TEXT, -- for sys_* events
ts_emitted TIMESTAMPTZ NOT NULL, -- client-reported UTC
ts_received TIMESTAMPTZ NOT NULL, -- server-stamped UTC
hash_prev TEXT, -- SHA-256 of previous event in this workflow's chain
sig TEXT, -- Ed25519 signature for sys_* events (subsystem key)
schema_version INTEGER NOT NULL DEFAULT 1
);
-- Hypertable: chunk by week. Compress chunks older than 90 days.
-- SELECT create_hypertable('trace_events', 'ts_emitted', chunk_time_interval => INTERVAL '1 week');
CREATE TABLE trace_workflows (
id TEXT PRIMARY KEY, -- wfl_* UUID v7
user_id TEXT NOT NULL,
session_id TEXT NOT NULL,
workflow_type TEXT NOT NULL, -- 'place-trade' | 'run-backtest' | 'sign-up' | etc.
state TEXT NOT NULL, -- 'active' | 'completed' | 'failed' | 'expired'
ts_started TIMESTAMPTZ NOT NULL,
ts_ended TIMESTAMPTZ,
schema_version INTEGER NOT NULL DEFAULT 1
);
-- Indexes (Timescale adds time-range indexes automatically per chunk)
CREATE INDEX idx_te_user_ts ON trace_events(user_id, ts_emitted DESC);
CREATE INDEX idx_te_workflow ON trace_events(workflow_id, ts_emitted DESC);
CREATE INDEX idx_te_actor ON trace_events(actor_id, ts_emitted DESC);
CREATE INDEX idx_tw_user_ts ON trace_workflows(user_id, ts_started DESC);
What is deliberately absent: email addresses, raw passkey credential IDs, broker API keys, session token values, full IP addresses (IP prefix only, per ADR-0003), raw request bodies.
Per Kristerpher's 2026-04-29 direction, the retention model leans toward aggressive cold-tier rollover patterned after Splunk's S2 (SmartStore) architecture: short hot tier, frequent migrations of older chunks to S3-backed cold storage, queryable on demand via Timescale's tiered storage hooks. This minimizes hot-tier storage cost without losing the full audit trail.
| Data | Hot tier (TimescaleDB chunks, fast queries) | Cold tier (S3 + tiered_storage policy, queryable) | Deletion |
|---|---|---|---|
| Render events | 30 days hot → cold (was 90; tightened per SmartStore-style) | 2 years cold | Pseudonymized on DSR; deleted at retention ceiling |
| User-action events | 30 days hot → cold | 7 years cold (Statement of Record for trade-affecting events) | Pseudonymized on DSR |
| System-action events | 30 days hot → cold | 7 years cold | Retained (subsystem, not PII) |
| Support-action events | 30 days hot → cold | 7 years cold | Pseudonymized on DSR |
| Workflow metadata | 30 days hot → cold | 7 years cold | Pseudonymized on DSR |
Trade-affecting events (any action_type prefixed trade.*, order.*, position.*) retain 7 years per ADR-0003's financial-adjacent rationale.
Splunk SmartStore reference: S2 separates indexer compute from object storage so old data is fetched on demand rather than kept warm. Translate to Timescale: chunk_time_interval = 1 day; tiered_storage policy moves chunks > 30 days old to S3; queries that span the boundary are transparent (Timescale does the fetch). Cost driver becomes egress on the rare cold-tier query, not hot-tier disk.
Full event-sourcing (replay every event from the beginning) is correct but impractical for sessions that span years. Pure snapshots (point-in-time DB copy) are fast but storage-intensive and miss the events between snapshots.
The hybrid approach:
- Daily materialized snapshot of relevant user state (positions, strategies, settings, account tier) is stored as a JSON blob keyed to (user_id, snapshot_date). Snapshots are computed nightly by a background job.
- Event delta from the snapshot forward is the trace_events stream. Replay = load the closest daily snapshot before the target timestamp, then apply events in order up to the target timestamp.
For support/admin queries ("what was this user's state at 2026-04-15T14:30Z?"), the worst-case event delta is at most one day's worth of events per user (~20–50 events at MVP volume). This is a cheap scan over the hypertable.
GET /api/admin/trace/replay/<user_id>?at=<iso_timestamp>
Authorization: requires raptor-trace-admin role (see §6.1).
Response:
{
"user_id": "...",
"replayed_at": "2026-04-15T14:30:00Z",
"snapshot_base": "2026-04-15T00:00:00Z",
"events_applied": 23,
"state": {
"active_workflows": [...],
"last_view": { "view_name": "...", "render_id": "rnd_..." },
"last_actions": [...]
},
"timeline": [
{ "id": "act_...", "action_type": "trade.submit", "ts_emitted": "...", ... },
...
]
}
Edge case — stale market data: the timeline accurately represents what the user saw and did at the time. The response includes a market_data_frozen_at timestamp indicating that market prices in the timeline reflect the data state at replayed_at, not current prices. The UI must display this clearly; the replay is a historical record, not a live simulation of "what would happen now."
The replay surface lives in the Console (support tool section). It presents:
The replay UI itself generates Render ID events when a support agent or admin opens it, ensuring the user can later see that their session was reviewed.
| Actor | What they see | RBAC role |
|---|---|---|
| User (self) | All events on their own timeline; support/admin presence events (actor display name, action type, timestamp); render events for their own views; system-action summaries (outcome, not internal detail) | antlers-trace-self |
| Support agent | Same as user view, plus internal system-action detail, context_json (with PII masking per RBAC policy). No raw broker credentials. | raptor-trace-support |
| Admin (Kristerpher) | Full event detail, unmasked context_json (minus stored-credential fields which are never stored), all actors, all workflows, cross-user queries. | raptor-trace-admin |
New RBAC roles compose into existing groups per rbac-design.md:
- antlers-trace-self is granted to all antlers-user group members (every authenticated user sees their own trace).
- raptor-trace-support is added to the raxx-support-team group.
- raptor-trace-admin is added to the raxx-platform-admins group.
Users see their own trace at GET /api/me/trace (list of workflows) and GET /api/me/trace/<workflow_id> (full timeline for one workflow). This surface is part of Antlers, not the Console.
The user-facing timeline: - Shows "a support agent viewed your session" events with the agent's display name and time. - Shows "an admin reviewed your account" events with the admin's display name and time. - Does NOT show internal system-action detail, internal context_json, or support-agent private notes. - Vendor names are not shown to users per invariant §2.8.
This feature is structural to trust. It is not a stretch goal. It ships with the first trace rollout.
When a support agent opens a user's live session in the support tool, a sup_* event is emitted immediately. The user sees this in their timeline in near-real-time (next page load or SSE push). This creates a clear "support was here" marker that is auditable and user-visible.
If the support agent takes any account-state-changing action (e.g., resets a setting, extends a trial), a separate sup_* event with action_type: support.adjustment is emitted. The user sees both the presence and the action.
When an admin queries any user's trace data via the admin API or replay UI, a Render ID event is written with actor_id: <admin_id> and actor_role: raptor-trace-admin. This event appears in the user's own trace timeline: "Raxx admin reviewed your account at 14:32 UTC."
Users can see who (by display name) reviewed their data and when. They cannot see what the admin queried. This is a privacy feature: users have a complete picture of who within Raxx has accessed their data.
Database-level: no UPDATE or DELETE grants on the trace_events table for any application user. The application DB user (raptor_app) has INSERT and SELECT only. DDL for the trace tables is managed by migration scripts run by a separate privileged user (raptor_migrations), never by raptor_app at runtime.
Migration files that attempt to add UPDATE/DELETE access to trace tables must be rejected in PR review. A CI lint step flags any migration that contains GRANT UPDATE or GRANT DELETE against trace_events or trace_workflows.
Each event in a workflow's trace stores hash_prev: the SHA-256 of the serialized previous event (all columns, canonical JSON order) in that workflow's chain.
hash_prev[n] = SHA-256(canonical_json(event[n-1]))
The first event in a workflow stores hash_prev = SHA-256("genesis:" + workflow_id).
Verification: an integrity checker can reconstruct the hash chain for any workflow by fetching events in ts_emitted order and verifying each hash_prev. A gap (missing event) or modification (altered column) breaks the chain at that position.
A nightly background job (jobs/trace_integrity_check.py) runs the chain verification for all workflows with events in the last 24 hours and writes a pass/fail row to audit_log. Any failure is a severity:critical event that triggers the breach-notification pipeline from ADR-0003.
System-action events (sys_*) carry an Ed25519 signature over the canonical JSON of the event payload. Each subsystem (MQ-A scheduler, Raptor paper-gate, Raptor order-router) has its own signing key stored in the secret store (Infisical), rotatable without service redeploy.
sig = Ed25519Sign(subsystem_private_key, canonical_json(event_payload))
A fake sys_* event inserted without a valid signature (or with a revoked key) fails verification in the chain checker. This prevents insertion of fabricated system actions (e.g., "the system fired a trade the user didn't authorize").
| Control | Pre-launch (4-week MVP) | Post-launch (production) |
|---|---|---|
| Append-only DB grant | Yes (required) | Yes |
| Hash chain | Recommended; ship if < 1 week effort | Required |
| Ed25519 signatures on sys_* | Optional; seed the key infrastructure | Required |
| Nightly integrity checker | Basic row-count audit | Full chain verification |
| Cold storage + tiered retention | No; all hot | Timescale tiered storage |
| GDPR pseudonymization job | Stub; manual for now | Automated (retention job) |
Why tracing matters for security:
Detect bad actors. When a session's action sequence is anomalous — many rapid trade submissions, unusual view patterns, a known-bad IP prefix — the event stream is queryable in near-real-time. Alerting rules over trace_events can fire on statistical deviations without reading application-layer code.
Forensic reconstruction. After a reported incident ("I didn't place those trades"), the event stream provides an ordered reconstruction of every action, every render, and every system action in the user's session. The hash chain makes it impossible to silently remove events.
Tamper-evident audit for regulatory + dispute scenarios. If a user disputes an order, the chain-verified event stream + Ed25519 signatures on system actions constitute a non-repudiation record. Raxx's position in a dispute is backed by cryptographic evidence, not a mutable log.
Insider-threat detection. Admin and support access to trace data is itself traced (Render IDs). If an admin is querying user timelines at unusual hours or at an unusual rate, that access pattern is itself visible in the trace_events table and queryable. The system watches the watchers.
Account compromise detection. A Workflow ID that spans multiple IP prefixes within minutes, or that contains action sequences inconsistent with the user's historical patterns, is an anomaly signal. The trace store provides the historical baseline.
Non-repudiation: a user who claims "I never placed that trade" is answered by the trace_events chain: the act_* event with action_type: trade.submit, the user's session ID, and the sys_* event from the order router, all with verified hash chain integrity. The event chain is the Statement of Record.
Compliance posture: FINRA Rule 17a-4 and Rule 4511 require broker-dealers to retain records in non-rewritable, non-erasable format ("WORM"). Raxx is not a broker-dealer, but Raxx's users route orders through a broker-dealer. Aligning the trace architecture with WORM semantics (append-only + hash chain) positions Raxx to demonstrate good recordkeeping hygiene if ever scrutinized. It also makes a future broker-dealer license application less of a retrofit.
Query performance: a "reconstruct user's day" query on the hypertable hits one or two weekly chunks maximum (indexed on user_id, ts_emitted). At 50 events/session × 5 sessions/day × 1,000 users, this is 250,000 rows/day — trivial for Timescale. The daily snapshot reduces the replay-path scan to at most one day's delta (~250 rows at MVP volume per user).
Cost: Timescale compression on 90-day-old chunks achieves ~10× compression on repetitive event rows. A user's 7-year trade-affecting history at 50 events/day compresses to under 1 MB. Cold storage tier (S3) for data beyond 90 days is cents/GB/month.
User self-audit. A user who asks "what did I do yesterday at 14:00 UTC?" gets a timeline view in Antlers: a list of their workflows, the actions within each, and the system actions taken on their behalf. This is not a debugging tool — it is a confidence surface. Users who understand their own history are less likely to dispute well-executed trades and more likely to trust the platform.
Support transparency. When a support agent joins a session, the user sees it in their timeline: "Support agent [name] reviewed your session at 14:32 UTC." This is a forcing function for support quality — agents know their presence is visible. It also eliminates the ambiguity of "did support have access to this data?" — the answer is always verifiable by the user.
Replay for debugging. A user who reports "something went wrong 5 minutes before this trade" can hand support a Workflow ID. Support replays the session to exactly that point, seeing what the user saw (with the "what did user see" toggle). The handoff from "user description" to "support investigation" shrinks from minutes to seconds.
Admin oversight as a privacy feature. Users can see when Raxx staff accessed their data. This converts a liability ("Raxx can read my data without my knowledge") into a feature ("Raxx tells me every time staff looks at my account"). This is structurally important to the trust model.
Cross-actor synchronized view. When a support agent opens a user's live session, both are looking at a state anchored to the same Render IDs. If the support agent says "I see your backtest result as X," the Render ID confirms that the agent actually rendered that view, preventing "I was looking at a different screen" confusion.
Migration file: migrations/0XXX_trace_tables.sql
Additive only. Creates trace_events and trace_workflows. No changes to existing tables.
Rollback: drop both tables. No existing data is affected.
If starting with SQLite (4-week MVP):
1. Create event_log table in SQLite with identical schema minus Timescale-specific commands.
2. Add composite index on (user_id, ts_emitted).
3. When migrating to Timescale: pg_dump + restore, then SELECT create_hypertable(...) on the existing table. Data is preserved; partitioning is applied retroactively.
If starting with Timescale:
1. Add Timescale extension to Postgres.
2. Run create_hypertable as part of migration.
3. Set compression policy: add_compression_policy('trace_events', INTERVAL '90 days').
4. Set retention/tiering policy per §4.2.
Existing audit_log rows have no hash chain. The new trace_events table starts the chain fresh at launch. There is no backfill of old audit rows into the hash chain; old audit rows remain in audit_log as-is. The chain's integrity guarantee applies only to events from the chain's genesis forward.
| Phase | Gate | Description |
|---|---|---|
| Dark | Internal | trace_events table exists; no events emitted. Migration tested on staging. |
| Flag-on (staging) | TRACE_ENABLED=1 |
Raptor emits action + system events. No render events. No user-facing UI. Integrity checker runs nightly. |
| Beta (prod, internal) | Kristerpher explicit OK | Admin replay API live. Support tool shows Workflow IDs in user records. User-facing timeline not yet visible. |
| Beta 2 (user-facing) | Support timeline UX reviewed | User can see GET /api/me/trace. Support co-presence events visible to users. |
| GA | Soak 30 days in Beta 2 with no integrity failures | Hash chain enabled for all new events. Ed25519 signatures on sys_* events. Nightly chain verifier runs in prod. |
| Post-GA | Volume trigger | Timescale tiered storage enabled. Cold-storage policy active. DPIA completed for Statement-of-Record posture. |
user_id (internal UUID), actor_id, actor_role, view_name, action_type, context_json (sanitized — no email, no credential material). ts_emitted is metadata, not PII. IP prefix (max /24) is stored per ADR-0003.POST /api/gdpr/erase, the retention job pseudonymizes user_id and actor_id in trace_events within 30 days (replacing with sha256(user_id || per-user-salt)). Events are not deleted; they are pseudonymized. The hash chain is preserved (the chain over pseudonymized data is still valid). The pseudonymization salt is destroyed at the end of the 2-year audit retention.trace_events themselves are not recursively traced (that would be infinite). Instead, failed integrity checks produce audit_log rows, which are the tamper-evidence anchor.trace_events is exfiltrated, the attacker gains action-type metadata and sanitized context (no credentials, no raw PII per schema). GDPR Art. 33 notification still applies within 72 hours because user_id + behavioral sequence constitutes personal data. Breach pipeline per ADR-0003 applies.sig_key_version).TRACE_ENABLED=0 disables all event emission without data loss. Events stop being written; existing data is unaffected. The support and user-facing trace UI gracefully degrades ("trace data unavailable for this period").These cards are in dependency order. They are defined here but NOT yet filed as GitHub issues. Kristerpher's approval on the open questions in §13 is required before filing.
| # | Scope | Size | Depends on | MVP-blocking |
|---|---|---|---|---|
| SC-1 | Schema migration: create trace_events and trace_workflows tables; SQLite-compatible (4-week path) with composite index. Include rollback migration. |
S | — | Yes |
| SC-2 | Raptor middleware: emit act_* and wfl_* events for all authenticated API calls. Forward X-Workflow-ID header. No UI. |
M | SC-1 | Yes |
| SC-3 | System-action emission: MQ-A scheduler and Raptor order-router emit sys_* events. Surface tag deterministic vs ai. |
M | SC-1 | Yes |
| SC-4 | Render ID emission: server-side middleware stamps a rnd_* event on every Antlers API response that results in a view render. |
S | SC-1 | No (post-MVP) |
| SC-5 | Hash chain implementation: each INSERT into trace_events computes hash_prev from the preceding event in the workflow chain. |
M | SC-1 | No (recommended pre-GA) |
| SC-6 | Nightly integrity checker job: verify hash chains for prior 24h; write pass/fail to audit_log; trigger breach pipeline on failure. |
S | SC-5 | No (required by GA) |
| SC-7 | Replay API: GET /api/admin/trace/replay/<user_id>?at=<ts> — load snapshot + apply event delta, return timeline JSON. Requires raptor-trace-admin RBAC role. |
L | SC-2, SC-3 | No (post-MVP) |
| SC-8 | User-facing trace API: GET /api/me/trace + GET /api/me/trace/<workflow_id>. Filtered view (no internal detail; no broker vendor names). |
M | SC-2, SC-3 | No (post-MVP) |
| SC-9 | Support co-presence: emit sup_* event when support agent opens a user's record in the Console support tool. Surface to user in their trace. |
M | SC-2, SC-8 | No (post-MVP) |
| SC-10 | Replay UI in Console: scrubbing timeline, step-forward/backward, split view, "what did user see" toggle. | L | SC-7, SC-9 | No (post-launch) |
| SC-11 | Timescale migration: convert SQLite event_log to Timescale hypertable, enable compression + tiered storage policies. |
M | SC-1 (SQLite path shipped) | No (triggered by volume) |
| SC-12 | Ed25519 subsystem signing: key provisioning in Infisical for MQ-A + Raptor order-router; sign sys_* events; verify in integrity checker. |
M | SC-3, SC-6 | No (required post-launch) |
| SC-13 | Live↔paper trading mode transitions: emit a named sys_* event (mode.live_enabled / mode.paper_returned) on every transition. RBAC enforces support agents can only act in paper-mode by default; live-mode actions require either a supervisory-workflow approval grant (two-person rule) OR a support-live-trained credential token issued by ops. The grant + credential are themselves traceable events. |
L | SC-2, SC-3, RBAC-design | No (gating ship of any support-on-live-account feature) |
| SC-14 | DPIA scoping: produce a Data Protection Impact Assessment per GDPR Art. 35 covering the trace architecture as systematic processing of personal data. Inputs: processing categories, legal basis, retention policy, data subjects' rights (Arts. 12–22), risk register, mitigations. Output: DPIA doc at docs/legal/dpia/trace-architecture-2026.md + executive summary for attorney review. Per Kristerpher 2026-04-29: scope this now to be ahead of the curve. |
M | — | Yes (required by GA) |
| SC-15 | Admin-access user notification: email the user when an admin views their trace ("A Raxx admin reviewed your account activity at 14:32 UTC."). Bundle into the regular account-change email cadence so it doesn't feel alarming. Include a 2-question micro-survey ("was this expected?" Y/N + free-text). Survey responses log as user-action events. | M | SC-2, email infra | No (required for trust-signal claim before GA) |
Total: 15 sub-cards. Expected implementation timeline: SC-1 through SC-3 + SC-14 in MVP sprint; SC-4 through SC-9 + SC-13 + SC-15 in the following 6–8 weeks; SC-10 through SC-12 post-launch.
Kristerpher accepted the V3 deferral (ADR-0022) but asked for an LOE estimate for migrating later. Rough budget:
| Phase | Scope | Effort |
|---|---|---|
| Add Ed25519 keypair material to all subsystem signers (Raptor, MQ-A, support tooling) — provision via Infisical, key rotation procedure | Schema + ops runbook | 3–5 days |
Backfill signature column on retroactive sys_* events. Either: (a) leave historical events unsigned with an explicit pre_v3 flag, or (b) re-sign with a "migration anchor" signature so the chain is verifiable end-to-end. (b) is more rigorous; (a) is faster |
Migration + audit doc | 2–4 days |
| Update integrity checker (SC-6) to verify both the hash chain (already present) AND the per-event signatures | App code + tests | 2 days |
| Update breach pipeline runbooks (the "what to do when a checker fails" SOP) to distinguish hash-chain breach from signature breach | Runbook + drill | 1–2 days |
| Total | — | 8–13 developer-days spread across one sprint |
This is small enough to bundle into a single post-launch sprint. The investment is worth it once Raxx hosts real customer trade data — at that point, dispute-resolution and regulatory inquiries become realistic and a tamper-evident chain materially shifts the recordkeeping posture.
These questions block or shape sub-cards. Sub-cards will not be filed until Kristerpher answers them.
Storage path decision: 4-week SQLite MVP or go straight to Timescale? The design supports both, but the migration adds a sprint. If Raptor is already on Heroku Postgres (not SQLite), the Timescale add-on is a one-line schema change with no migration needed. Which database is Raptor using for its primary store today?
Render ID granularity. ADR-0023 proposes one Render ID per page load / SSE push (not per component). Is that the right level of detail for MVP, or does Kristerpher want component-level or field-level granularity now? Component-level is ~10× more events and adds SC-4 complexity significantly.
User-facing trace UI placement. Where in Antlers does the user see their own timeline? Options: (a) a "My Activity" page under account settings, (b) contextual — each workflow surface has an activity drawer, (c) both. This is a UX call that shapes how SC-8 structures its API response.
Support agent role naming. The RBAC role raptor-trace-support needs to compose into the raxx-support-team group defined in rbac-design.md. Should support agents be able to see all users' traces or only users they are actively supporting (scoped by an active support ticket)? The scoped variant is more private; the unscoped variant is simpler to implement and still fully audited.
Admin trace access: proactive notification or pull-only? Currently designed as pull (user sees the event when they check their timeline). Should Raxx proactively email the user when admin accesses their trace? ("A Raxx admin reviewed your account activity at 14:32 UTC today.") This is a strong trust signal but adds email infrastructure and may feel alarming to users.
Retention for render events. Render events may be relatively high volume (every page load). The design retains them for 2 years. Is 2 years the right balance between support utility and storage cost? An alternative: 90-day hot, then delete (rather than cold-tier).
Paper-first gate decision in the trace. The design requires the paper-gate evaluation result to be a named sys_* event. If a user has an explicit bypass override, that override is also traced. Should the override also trigger a user-visible notification ("live trading was enabled without paper-gate for this workflow")?
DPIA timing. The trace architecture collects behavioral event sequences, which constitutes systematic processing of personal data. A Data Protection Impact Assessment may be required under GDPR Art. 35 before GA if the volume crosses "large scale" thresholds. Should this be scoped as a sub-card now, or treated as a post-GA task?
Kristerpher reviewed the design (PR #498) and resolved every open question in §13 plus the ADRs:
| # | Question | Decision | Source |
|---|---|---|---|
| Q1 | Storage path | Postgres + TimescaleDB (Heroku Postgres + Timescale extension assumed) | comment 02:18 UTC + ADR-0021 |
| Q2 | Render-ID granularity | Per-view at MVP; per-component deferred. ADR-0023 LGTM | ADR-0023 review |
| Q3 | User-facing trace UI placement | Default to (c) — both "My Activity" page + per-workflow drawer | architect interpretation |
| Q4 | Support agent scope | Scoped — only see users actively supported via ticket. Anti-spying rationale | inline 03:01 UTC #4 |
| Q5 | Admin trace access notification | Proactive email. Bundle with regular account-change cadence to feel routine. Add 2-question micro-survey | inline 03:01 UTC #5 |
| Q6 | Retention for render events | Aggressive cold-tier rollover patterned after Splunk S2 (SmartStore). 30-day hot → cold | inline 03:01 UTC #6 |
| Q7 | Paper-first gate trace | Yes — record live↔paper transitions as first-class events. Support paper-only by default. Live-trading by support requires supervisory workflow OR training-grant credential | inline 03:01 UTC #7 |
| Q8 | DPIA timing | Scope now — ahead of the curve | inline 03:01 UTC #8 |
ADR-0022 (cryptographic hash chain): Kristerpher accepted V3 deferral, leaning on Splunk-detection-engineering style flagging at V1/V2. Asked for LOE estimate to migrate to V3 — captured in §12.1 above (8–13 developer-days, post-launch sprint).
Three new sub-cards added per the decisions: - SC-13 — live↔paper transitions + supervisory workflow for support live-mode actions (per Q7) - SC-14 — DPIA scoping (per Q8) - SC-15 — admin-access notification email + 2-question survey (per Q5)
End of design doc. Sub-cards filed 2026-04-29 per §14 decisions; tracking issues linked from PR #498.