Raxx · internal docs

internal · gated ↑ index

Workflow UUID Tracing — Replay + Support Transparency Design

Status: Draft — awaiting Kristerpher's open-question answers before sub-cards are filed
Owner: software-architect
Date: 2026-04-29
Related ADRs: 0021, 0022, 0023
Related designs: rbac-design.md, session-engine.md, auth.md

1. Context

Every user workflow in Raxx — placing a trade, running a backtest, viewing positions, onboarding — is today a black box at replay time. If something goes wrong, support has only request logs and fragmented DB state. Users have no self-service window into their own history. Audit trails exist for individual mutations (per ADR-0003) but there is no way to reconstruct what a user experienced — what they saw, what they clicked, what the system did on their behalf, and what a support agent saw when they joined.

This document designs the identifier model, storage architecture, replay mechanism, support transparency layer, and tamper-resistance posture that together give every actor in the system — user, support agent, admin — a coherent, auditable, and trustworthy timeline of any workflow.

The motivation spans three dimensions Kristerpher named explicitly:

Security: detect bad actors, support forensic reconstruction, produce tamper-evident logs for regulatory and dispute resolution.
Performance + Statement of Record: non-repudiation for disputed trades, compliance posture for FINRA-adjacent recordkeeping requirements, and time-series query performance that does not degrade as history grows.
Experience: users should see nearly everything we see about their own session. Support agents should be identifiably present in a user's timeline. Admins who look at user data are visible to that user.

2. Invariants

The following constraints are non-negotiable and take precedence over any design convenience:

No stored credentials. The trace pipeline never stores passkey credential IDs, OAuth tokens, broker API keys, or any value that could replay an authentication or order. Context fields must be redacted before storage at the schema level.
Append-only event log. No UPDATE, no DELETE on trace rows. Corrections are new events that supersede prior ones. This is not a policy — migration files that add UPDATE/DELETE paths to trace tables must be rejected at review.
GDPR by default. Every trace event that contains PII (email, display name, IP prefix, session metadata) is subject to the same retention, erasure, and portability rules as the primary data store (ADR-0003). Erasure of a user account initiates pseudonymization of their trace rows within the ADR-0003 SLA.
Audit trail for every state change that touches money, permissions, or data access. Rendering user data to a support agent or admin is itself a state change that must be traced.
RBAC-composable. Support agent and admin access to trace data must resolve through the existing RBAC model (<app>-<resource>-<level> naming from rbac-design.md). New roles compose into existing groups; the trace system does not introduce a parallel permission mechanism.
Paper-first gating is visible in the trace. Any workflow that touches the live-trading code path must emit a trace event naming the gate check (passed, failed, bypassed-with-override). The gate decision is non-repudiable.
AI-augmented vs deterministic surfaces are distinguished. Per the project's deterministic-execution posture, events emitted by the AI-augmented surfaces (strategy parser, caveat engine) are tagged surface:ai. Events from the deterministic order-firing path are tagged surface:deterministic. Support and admin views must display this distinction.
Broker vendor names do not appear in user-facing trace UI. Admin and support views may show broker names; user-facing replay UI uses the Confidence Engine vocabulary (e.g., "your connected account" not "Alpaca").

3. Identifier Model

Five UUID types compose into a complete, reconstructable timeline for any workflow.

3.1 Workflow ID (`wfl_*`)

A UUID v7 (time-ordered) minted when a user begins a distinct workflow. A workflow is a discrete user intent: sign-up, place-trade, run-backtest, edit-strategy, view-positions, initiate-withdrawal.

Lifetime: starts at the first user action in the workflow; ends at explicit completion (success or failure terminal state), session expiry, or 4-hour hard ceiling. A new page navigation within the same intent does not start a new workflow — the frontend must carry the Workflow ID across navigations.

Minting: browser mints a UUID v7 on the first user-triggered action and stores it in the session context. It is forwarded in every subsequent request as X-Workflow-ID. Raptor validates the format but treats it as client-asserted; for sensitive flows (live-trade, erasure), Raptor countersigns the ID at first use so the server has a tamper-evident claim on it.

Schema tag: wfl_<uuid_v7>

3.2 Action ID (`act_*`)

A UUID v7 minted for every individual user-initiated action within a workflow (button click, form submit, API call). Parented to a Workflow ID.

Carries: workflow_id, user_id, session_id, action_type, surface_tag (ai/deterministic), ts_emitted (client UTC), ts_received (server UTC).

Schema tag: act_<uuid_v7>

3.3 System-Action ID (`sys_*`)

A UUID v7 minted when Raxx acts on the user's behalf: scheduled trade fire, cron-triggered polling, paper-gate evaluation, automatic position roll. These are not user-initiated but are caused by user configuration.

Carries: originating_workflow_id (the workflow that created the configuration that triggered this action, may be null for pure system events), originating_config_id (e.g., strategy ID), subsystem (e.g., mq-a:scheduler, raptor:paper-gate), surface_tag, ts_emitted.

Traceback rule: a system action that cannot trace back to a user configuration or a user workflow must name the subsystem and the trigger condition explicitly. No anonymous system actions.

Schema tag: sys_<uuid_v7>

3.4 Render ID (`rnd_*`)

A UUID v7 minted every time a view is rendered and delivered to any actor. This captures the exact data-state the actor saw at a given moment.

Carries: workflow_id (if the render is part of a workflow), actor_id (user, support agent, or admin), actor_role (RBAC role active at render time), view_name, data_mask_state (JSON: which fields were masked for this actor), ts_rendered.

Why Render IDs matter: when a support agent and user are looking at the same session simultaneously, each gets a separate Render ID. If a dispute arises about "what did support see," the Render ID provides a concrete anchor. See §8 for the security and GDPR implications.

Granularity decision: see ADR-0023. The proposal is per-view (one per page load / SSE push), not per-component or per-field. Per-field granularity is post-MVP.

Schema tag: rnd_<uuid_v7>

3.5 Support-Action ID (`sup_*`)

A UUID v7 minted whenever a support agent takes an action "alongside" or "on behalf of" a user. This includes: opening a user's timeline in the support tool, replaying a session, sending a message in context, adjusting account state.

Carries: support_agent_id, target_user_id, action_type (view/replay/message/adjustment), linked_workflow_id (if the support action is in the context of a specific user workflow), ts_emitted.

Privacy note for users: users can see their own support-action events. They can see the support agent's display name and the action type, but not internal support notes or agent-internal state. See §7.

3.6 Composition

A complete "place-trade" workflow timeline might look like:

wfl_01HZ... (place-trade, user U, session S)
  rnd_01HZ... (trade-entry-view rendered to user U)
  act_01HZ... (user typed symbol: "SPY")
  act_01HZ... (user submitted trade form)
  sys_01HZ... (paper-gate evaluation — PASSED)
  sys_01HZ... (order routing to broker — DISPATCHED)
  rnd_01HZ... (trade-confirmation-view rendered to user U)
  ----
  sup_01HZ... (support agent A opens this workflow in support tool — 3 days later)
  rnd_01HZ... (trade-entry-view rendered to support agent A, data_mask: {broker_account_id: masked})

All of these share workflow_id = wfl_01HZ.... Querying by Workflow ID returns the full timeline in emission order.

4. Storage Architecture

See ADR-0021 for the full decision rationale. Summary:

Recommendation: Postgres with TimescaleDB extension (Timescale).

Reasoning: - Raxx is on Heroku today with a Postgres add-on. Timescale Cloud or the Timescale Heroku add-on is a drop-in replacement, not a new infra paradigm. - Provides time-series hypertables, automatic chunk-based partitioning, tiered storage (hot → cold), and efficient time-range queries without a full Postgres table scan. - Column compression on cold chunks achieves 10–20× compression for append-only event rows with repeated user_id + workflow_id values. - EXPLAIN ANALYZE for "reconstruct user's day" queries is a standard Postgres tool; no new query language to learn. - Pre-launch budget: the Timescale free tier handles tens of millions of rows. Tiered storage to S3 for rows older than 90 days keeps hot-tier costs flat as history grows. - No vendor lock-in on schema: if we outgrow Timescale, the data is Postgres-compatible.

4-week MVP scope (if we had to ship in 4 weeks): skip Timescale, append events to an event_log table in the existing Raptor SQLite with a ts column and a composite index on (user_id, ts). No partitioning, no compression. This is sufficient for pre-launch low volume and can be migrated to Timescale in a follow-up card without losing data.

4-month production scope: Timescale Cloud, hypertables, tiered storage, retention policies configured per ADR-0003 retention schedule.

4.1 Core Schema

-- All trace tables are Timescale hypertables partitioned on ts_emitted.
-- 'id' is UUID v7 (time-ordered); no separate auto-increment column needed.

CREATE TABLE trace_events (
    id             TEXT        PRIMARY KEY,     -- act_*, sys_*, sup_* UUID v7
    event_type     TEXT        NOT NULL,        -- 'user_action' | 'system_action' | 'support_action' | 'render'
    workflow_id    TEXT,                        -- wfl_* reference (nullable for system events)
    user_id        TEXT        NOT NULL,        -- target user (never support agent's user_id)
    actor_id       TEXT        NOT NULL,        -- who caused this event
    actor_role     TEXT,                        -- RBAC role active at event time
    surface_tag    TEXT,                        -- 'ai' | 'deterministic' | 'system' | 'support'
    action_type    TEXT        NOT NULL,        -- e.g. 'trade.submit', 'backtest.run', 'support.view'
    view_name      TEXT,                        -- for render events
    data_mask_json TEXT,                        -- JSON: masked field names for render events
    context_json   TEXT,                        -- sanitized context (no credentials, no raw PII)
    subsystem      TEXT,                        -- for sys_* events
    ts_emitted     TIMESTAMPTZ NOT NULL,        -- client-reported UTC
    ts_received    TIMESTAMPTZ NOT NULL,        -- server-stamped UTC
    hash_prev      TEXT,                        -- SHA-256 of previous event in this workflow's chain
    sig            TEXT,                        -- Ed25519 signature for sys_* events (subsystem key)
    schema_version INTEGER     NOT NULL DEFAULT 1
);

-- Hypertable: chunk by week. Compress chunks older than 90 days.
-- SELECT create_hypertable('trace_events', 'ts_emitted', chunk_time_interval => INTERVAL '1 week');

CREATE TABLE trace_workflows (
    id             TEXT        PRIMARY KEY,     -- wfl_* UUID v7
    user_id        TEXT        NOT NULL,
    session_id     TEXT        NOT NULL,
    workflow_type  TEXT        NOT NULL,        -- 'place-trade' | 'run-backtest' | 'sign-up' | etc.
    state          TEXT        NOT NULL,        -- 'active' | 'completed' | 'failed' | 'expired'
    ts_started     TIMESTAMPTZ NOT NULL,
    ts_ended       TIMESTAMPTZ,
    schema_version INTEGER     NOT NULL DEFAULT 1
);

-- Indexes (Timescale adds time-range indexes automatically per chunk)
CREATE INDEX idx_te_user_ts      ON trace_events(user_id, ts_emitted DESC);
CREATE INDEX idx_te_workflow     ON trace_events(workflow_id, ts_emitted DESC);
CREATE INDEX idx_te_actor        ON trace_events(actor_id, ts_emitted DESC);
CREATE INDEX idx_tw_user_ts      ON trace_workflows(user_id, ts_started DESC);

What is deliberately absent: email addresses, raw passkey credential IDs, broker API keys, session token values, full IP addresses (IP prefix only, per ADR-0003), raw request bodies.

4.2 Retention

Per Kristerpher's 2026-04-29 direction, the retention model leans toward aggressive cold-tier rollover patterned after Splunk's S2 (SmartStore) architecture: short hot tier, frequent migrations of older chunks to S3-backed cold storage, queryable on demand via Timescale's tiered storage hooks. This minimizes hot-tier storage cost without losing the full audit trail.

Data	Hot tier (TimescaleDB chunks, fast queries)	Cold tier (S3 + tiered_storage policy, queryable)	Deletion
Render events	30 days hot → cold (was 90; tightened per SmartStore-style)	2 years cold	Pseudonymized on DSR; deleted at retention ceiling
User-action events	30 days hot → cold	7 years cold (Statement of Record for trade-affecting events)	Pseudonymized on DSR
System-action events	30 days hot → cold	7 years cold	Retained (subsystem, not PII)
Support-action events	30 days hot → cold	7 years cold	Pseudonymized on DSR
Workflow metadata	30 days hot → cold	7 years cold	Pseudonymized on DSR

Trade-affecting events (any action_type prefixed trade.*, order.*, position.*) retain 7 years per ADR-0003's financial-adjacent rationale.

Splunk SmartStore reference: S2 separates indexer compute from object storage so old data is fetched on demand rather than kept warm. Translate to Timescale: chunk_time_interval = 1 day; tiered_storage policy moves chunks > 30 days old to S3; queries that span the boundary are transparent (Timescale does the fetch). Cost driver becomes egress on the rare cold-tier query, not hot-tier disk.

5. Replay Mechanism

5.1 Approach: Hybrid Snapshots + Event Delta

Full event-sourcing (replay every event from the beginning) is correct but impractical for sessions that span years. Pure snapshots (point-in-time DB copy) are fast but storage-intensive and miss the events between snapshots.

The hybrid approach: - Daily materialized snapshot of relevant user state (positions, strategies, settings, account tier) is stored as a JSON blob keyed to (user_id, snapshot_date). Snapshots are computed nightly by a background job. - Event delta from the snapshot forward is the trace_events stream. Replay = load the closest daily snapshot before the target timestamp, then apply events in order up to the target timestamp.

For support/admin queries ("what was this user's state at 2026-04-15T14:30Z?"), the worst-case event delta is at most one day's worth of events per user (~20–50 events at MVP volume). This is a cheap scan over the hypertable.

5.2 Replay API

GET /api/admin/trace/replay/<user_id>?at=<iso_timestamp>

Authorization: requires raptor-trace-admin role (see §6.1).

Response:

{
  "user_id": "...",
  "replayed_at": "2026-04-15T14:30:00Z",
  "snapshot_base": "2026-04-15T00:00:00Z",
  "events_applied": 23,
  "state": {
    "active_workflows": [...],
    "last_view": { "view_name": "...", "render_id": "rnd_..." },
    "last_actions": [...]
  },
  "timeline": [
    { "id": "act_...", "action_type": "trade.submit", "ts_emitted": "...", ... },
    ...
  ]
}

Edge case — stale market data: the timeline accurately represents what the user saw and did at the time. The response includes a market_data_frozen_at timestamp indicating that market prices in the timeline reflect the data state at replayed_at, not current prices. The UI must display this clearly; the replay is a historical record, not a live simulation of "what would happen now."

5.3 Replay UI (Support Tool)

The replay surface lives in the Console (support tool section). It presents:

A scrubbing timeline (horizontal, time-ordered) of all events in the selected workflow or time range.
Each event type has a distinct visual: user-action (solid), system-action (outlined), render (hatched), support-action (colored differently to distinguish "we were present").
A "step backward / forward" button pair that advances one event at a time, rendering a text summary of the event and the resulting state delta.
A split view: left panel is the user's state at the selected event; right panel is the event detail (action type, surface tag, context fields visible to the actor's role).
A "what did user see" toggle that masks data fields the user would not have seen (e.g., internal system-action detail) so support can experience the session from the user's perspective.

The replay UI itself generates Render ID events when a support agent or admin opens it, ensuring the user can later see that their session was reviewed.

6. Support Transparency Model

6.1 Access Tiers

Actor	What they see	RBAC role
User (self)	All events on their own timeline; support/admin presence events (actor display name, action type, timestamp); render events for their own views; system-action summaries (outcome, not internal detail)	`antlers-trace-self`
Support agent	Same as user view, plus internal system-action detail, context_json (with PII masking per RBAC policy). No raw broker credentials.	`raptor-trace-support`
Admin (Kristerpher)	Full event detail, unmasked context_json (minus stored-credential fields which are never stored), all actors, all workflows, cross-user queries.	`raptor-trace-admin`

New RBAC roles compose into existing groups per rbac-design.md: - antlers-trace-self is granted to all antlers-user group members (every authenticated user sees their own trace). - raptor-trace-support is added to the raxx-support-team group. - raptor-trace-admin is added to the raxx-platform-admins group.

6.2 User-Visible Timeline

Users see their own trace at GET /api/me/trace (list of workflows) and GET /api/me/trace/<workflow_id> (full timeline for one workflow). This surface is part of Antlers, not the Console.

The user-facing timeline: - Shows "a support agent viewed your session" events with the agent's display name and time. - Shows "an admin reviewed your account" events with the admin's display name and time. - Does NOT show internal system-action detail, internal context_json, or support-agent private notes. - Vendor names are not shown to users per invariant §2.8.

This feature is structural to trust. It is not a stretch goal. It ships with the first trace rollout.

6.3 Support Co-Presence

When a support agent opens a user's live session in the support tool, a sup_* event is emitted immediately. The user sees this in their timeline in near-real-time (next page load or SSE push). This creates a clear "support was here" marker that is auditable and user-visible.

If the support agent takes any account-state-changing action (e.g., resets a setting, extends a trial), a separate sup_* event with action_type: support.adjustment is emitted. The user sees both the presence and the action.

6.4 Admin Visibility to Users

When an admin queries any user's trace data via the admin API or replay UI, a Render ID event is written with actor_id: <admin_id> and actor_role: raptor-trace-admin. This event appears in the user's own trace timeline: "Raxx admin reviewed your account at 14:32 UTC."

Users can see who (by display name) reviewed their data and when. They cannot see what the admin queried. This is a privacy feature: users have a complete picture of who within Raxx has accessed their data.

7. Tamper Resistance + Integrity

7.1 Append-Only Enforcement

Database-level: no UPDATE or DELETE grants on the trace_events table for any application user. The application DB user (raptor_app) has INSERT and SELECT only. DDL for the trace tables is managed by migration scripts run by a separate privileged user (raptor_migrations), never by raptor_app at runtime.

Migration files that attempt to add UPDATE/DELETE access to trace tables must be rejected in PR review. A CI lint step flags any migration that contains GRANT UPDATE or GRANT DELETE against trace_events or trace_workflows.

7.2 Hash Chain

Each event in a workflow's trace stores hash_prev: the SHA-256 of the serialized previous event (all columns, canonical JSON order) in that workflow's chain.

hash_prev[n] = SHA-256(canonical_json(event[n-1]))

The first event in a workflow stores hash_prev = SHA-256("genesis:" + workflow_id).

Verification: an integrity checker can reconstruct the hash chain for any workflow by fetching events in ts_emitted order and verifying each hash_prev. A gap (missing event) or modification (altered column) breaks the chain at that position.

A nightly background job (jobs/trace_integrity_check.py) runs the chain verification for all workflows with events in the last 24 hours and writes a pass/fail row to audit_log. Any failure is a severity:critical event that triggers the breach-notification pipeline from ADR-0003.

7.3 System-Action Signatures

System-action events (sys_*) carry an Ed25519 signature over the canonical JSON of the event payload. Each subsystem (MQ-A scheduler, Raptor paper-gate, Raptor order-router) has its own signing key stored in the secret store (Infisical), rotatable without service redeploy.

sig = Ed25519Sign(subsystem_private_key, canonical_json(event_payload))

A fake sys_* event inserted without a valid signature (or with a revoked key) fails verification in the chain checker. This prevents insertion of fabricated system actions (e.g., "the system fired a trade the user didn't authorize").

7.4 Pre-Launch vs Post-Launch Posture

Control	Pre-launch (4-week MVP)	Post-launch (production)
Append-only DB grant	Yes (required)	Yes
Hash chain	Recommended; ship if < 1 week effort	Required
Ed25519 signatures on sys_*	Optional; seed the key infrastructure	Required
Nightly integrity checker	Basic row-count audit	Full chain verification
Cold storage + tiered retention	No; all hot	Timescale tiered storage
GDPR pseudonymization job	Stub; manual for now	Automated (retention job)

8. Three POVs

A. Security POV

Why tracing matters for security:

Detect bad actors. When a session's action sequence is anomalous — many rapid trade submissions, unusual view patterns, a known-bad IP prefix — the event stream is queryable in near-real-time. Alerting rules over trace_events can fire on statistical deviations without reading application-layer code.
Forensic reconstruction. After a reported incident ("I didn't place those trades"), the event stream provides an ordered reconstruction of every action, every render, and every system action in the user's session. The hash chain makes it impossible to silently remove events.
Tamper-evident audit for regulatory + dispute scenarios. If a user disputes an order, the chain-verified event stream + Ed25519 signatures on system actions constitute a non-repudiation record. Raxx's position in a dispute is backed by cryptographic evidence, not a mutable log.
Insider-threat detection. Admin and support access to trace data is itself traced (Render IDs). If an admin is querying user timelines at unusual hours or at an unusual rate, that access pattern is itself visible in the trace_events table and queryable. The system watches the watchers.
Account compromise detection. A Workflow ID that spans multiple IP prefixes within minutes, or that contains action sequences inconsistent with the user's historical patterns, is an anomaly signal. The trace store provides the historical baseline.

B. Performance + Statement of Record POV

Non-repudiation: a user who claims "I never placed that trade" is answered by the trace_events chain: the act_* event with action_type: trade.submit, the user's session ID, and the sys_* event from the order router, all with verified hash chain integrity. The event chain is the Statement of Record.

Compliance posture: FINRA Rule 17a-4 and Rule 4511 require broker-dealers to retain records in non-rewritable, non-erasable format ("WORM"). Raxx is not a broker-dealer, but Raxx's users route orders through a broker-dealer. Aligning the trace architecture with WORM semantics (append-only + hash chain) positions Raxx to demonstrate good recordkeeping hygiene if ever scrutinized. It also makes a future broker-dealer license application less of a retrofit.

Query performance: a "reconstruct user's day" query on the hypertable hits one or two weekly chunks maximum (indexed on user_id, ts_emitted). At 50 events/session × 5 sessions/day × 1,000 users, this is 250,000 rows/day — trivial for Timescale. The daily snapshot reduces the replay-path scan to at most one day's delta (~250 rows at MVP volume per user).

Cost: Timescale compression on 90-day-old chunks achieves ~10× compression on repetitive event rows. A user's 7-year trade-affecting history at 50 events/day compresses to under 1 MB. Cold storage tier (S3) for data beyond 90 days is cents/GB/month.

C. Experience POV

User self-audit. A user who asks "what did I do yesterday at 14:00 UTC?" gets a timeline view in Antlers: a list of their workflows, the actions within each, and the system actions taken on their behalf. This is not a debugging tool — it is a confidence surface. Users who understand their own history are less likely to dispute well-executed trades and more likely to trust the platform.

Support transparency. When a support agent joins a session, the user sees it in their timeline: "Support agent [name] reviewed your session at 14:32 UTC." This is a forcing function for support quality — agents know their presence is visible. It also eliminates the ambiguity of "did support have access to this data?" — the answer is always verifiable by the user.

Replay for debugging. A user who reports "something went wrong 5 minutes before this trade" can hand support a Workflow ID. Support replays the session to exactly that point, seeing what the user saw (with the "what did user see" toggle). The handoff from "user description" to "support investigation" shrinks from minutes to seconds.

Admin oversight as a privacy feature. Users can see when Raxx staff accessed their data. This converts a liability ("Raxx can read my data without my knowledge") into a feature ("Raxx tells me every time staff looks at my account"). This is structurally important to the trust model.

Cross-actor synchronized view. When a support agent opens a user's live session, both are looking at a state anchored to the same Render IDs. If the support agent says "I see your backtest result as X," the Render ID confirms that the agent actually rendered that view, preventing "I was looking at a different screen" confusion.

9. Migrations

9.1 New Tables

Migration file: migrations/0XXX_trace_tables.sql

Additive only. Creates trace_events and trace_workflows. No changes to existing tables.

Rollback: drop both tables. No existing data is affected.

9.2 Timescale Upgrade Path

If starting with SQLite (4-week MVP): 1. Create event_log table in SQLite with identical schema minus Timescale-specific commands. 2. Add composite index on (user_id, ts_emitted). 3. When migrating to Timescale: pg_dump + restore, then SELECT create_hypertable(...) on the existing table. Data is preserved; partitioning is applied retroactively.

If starting with Timescale: 1. Add Timescale extension to Postgres. 2. Run create_hypertable as part of migration. 3. Set compression policy: add_compression_policy('trace_events', INTERVAL '90 days'). 4. Set retention/tiering policy per §4.2.

9.3 Hash Chain Backfill

Existing audit_log rows have no hash chain. The new trace_events table starts the chain fresh at launch. There is no backfill of old audit rows into the hash chain; old audit rows remain in audit_log as-is. The chain's integrity guarantee applies only to events from the chain's genesis forward.

10. Rollout Plan

Phase	Gate	Description
Dark	Internal	`trace_events` table exists; no events emitted. Migration tested on staging.
Flag-on (staging)	`TRACE_ENABLED=1`	Raptor emits action + system events. No render events. No user-facing UI. Integrity checker runs nightly.
Beta (prod, internal)	Kristerpher explicit OK	Admin replay API live. Support tool shows Workflow IDs in user records. User-facing timeline not yet visible.
Beta 2 (user-facing)	Support timeline UX reviewed	User can see `GET /api/me/trace`. Support co-presence events visible to users.
GA	Soak 30 days in Beta 2 with no integrity failures	Hash chain enabled for all new events. Ed25519 signatures on sys_* events. Nightly chain verifier runs in prod.
Post-GA	Volume trigger	Timescale tiered storage enabled. Cold-storage policy active. DPIA completed for Statement-of-Record posture.

PII collected: user_id (internal UUID), actor_id, actor_role, view_name, action_type, context_json (sanitized — no email, no credential material). ts_emitted is metadata, not PII. IP prefix (max /24) is stored per ADR-0003.
Retention: see §4.2. Trade-affecting events retained 7 years; render events 2 years; operational events 90 days hot + cold-tiered beyond.
DSR erasure: on POST /api/gdpr/erase, the retention job pseudonymizes user_id and actor_id in trace_events within 30 days (replacing with sha256(user_id || per-user-salt)). Events are not deleted; they are pseudonymized. The hash chain is preserved (the chain over pseudonymized data is still valid). The pseudonymization salt is destroyed at the end of the 2-year audit retention.
Audit trail for the audit trail: writes to trace_events themselves are not recursively traced (that would be infinite). Instead, failed integrity checks produce audit_log rows, which are the tamper-evidence anchor.
Stored credentials: no credential fields in the trace schema. CI grep extends to trace migration files.
Breach: if trace_events is exfiltrated, the attacker gains action-type metadata and sanitized context (no credentials, no raw PII per schema). GDPR Art. 33 notification still applies within 72 hours because user_id + behavioral sequence constitutes personal data. Breach pipeline per ADR-0003 applies.
Secrets: subsystem Ed25519 private keys in Infisical. Rotatable without redeploy. Key rotation produces a new key version; old signatures remain verifiable with the old public key (public key is stored alongside the event as sig_key_version).
Kill-switch: TRACE_ENABLED=0 disables all event emission without data loss. Events stop being written; existing data is unaffected. The support and user-facing trace UI gracefully degrades ("trace data unavailable for this period").

12. Sub-Cards for Feature-Developer

These cards are in dependency order. They are defined here but NOT yet filed as GitHub issues. Kristerpher's approval on the open questions in §13 is required before filing.

#	Scope	Size	Depends on	MVP-blocking
SC-1	Schema migration: create `trace_events` and `trace_workflows` tables; SQLite-compatible (4-week path) with composite index. Include rollback migration.	S	—	Yes
SC-2	Raptor middleware: emit `act_` and `wfl_` events for all authenticated API calls. Forward `X-Workflow-ID` header. No UI.	M	SC-1	Yes
SC-3	System-action emission: MQ-A scheduler and Raptor order-router emit `sys_*` events. Surface tag `deterministic` vs `ai`.	M	SC-1	Yes
SC-4	Render ID emission: server-side middleware stamps a `rnd_*` event on every Antlers API response that results in a view render.	S	SC-1	No (post-MVP)
SC-5	Hash chain implementation: each INSERT into `trace_events` computes `hash_prev` from the preceding event in the workflow chain.	M	SC-1	No (recommended pre-GA)
SC-6	Nightly integrity checker job: verify hash chains for prior 24h; write pass/fail to `audit_log`; trigger breach pipeline on failure.	S	SC-5	No (required by GA)
SC-7	Replay API: `GET /api/admin/trace/replay/<user_id>?at=<ts>` — load snapshot + apply event delta, return timeline JSON. Requires `raptor-trace-admin` RBAC role.	L	SC-2, SC-3	No (post-MVP)
SC-8	User-facing trace API: `GET /api/me/trace` + `GET /api/me/trace/<workflow_id>`. Filtered view (no internal detail; no broker vendor names).	M	SC-2, SC-3	No (post-MVP)
SC-9	Support co-presence: emit `sup_*` event when support agent opens a user's record in the Console support tool. Surface to user in their trace.	M	SC-2, SC-8	No (post-MVP)
SC-10	Replay UI in Console: scrubbing timeline, step-forward/backward, split view, "what did user see" toggle.	L	SC-7, SC-9	No (post-launch)
SC-11	Timescale migration: convert SQLite `event_log` to Timescale hypertable, enable compression + tiered storage policies.	M	SC-1 (SQLite path shipped)	No (triggered by volume)
SC-12	Ed25519 subsystem signing: key provisioning in Infisical for MQ-A + Raptor order-router; sign `sys_*` events; verify in integrity checker.	M	SC-3, SC-6	No (required post-launch)
SC-13	Live↔paper trading mode transitions: emit a named `sys_*` event (`mode.live_enabled` / `mode.paper_returned`) on every transition. RBAC enforces support agents can only act in paper-mode by default; live-mode actions require either a supervisory-workflow approval grant (two-person rule) OR a `support-live-trained` credential token issued by ops. The grant + credential are themselves traceable events.	L	SC-2, SC-3, RBAC-design	No (gating ship of any support-on-live-account feature)
SC-14	DPIA scoping: produce a Data Protection Impact Assessment per GDPR Art. 35 covering the trace architecture as systematic processing of personal data. Inputs: processing categories, legal basis, retention policy, data subjects' rights (Arts. 12–22), risk register, mitigations. Output: DPIA doc at `docs/legal/dpia/trace-architecture-2026.md` + executive summary for attorney review. Per Kristerpher 2026-04-29: scope this now to be ahead of the curve.	M	—	Yes (required by GA)
SC-15	Admin-access user notification: email the user when an admin views their trace ("A Raxx admin reviewed your account activity at 14:32 UTC."). Bundle into the regular account-change email cadence so it doesn't feel alarming. Include a 2-question micro-survey ("was this expected?" Y/N + free-text). Survey responses log as user-action events.	M	SC-2, email infra	No (required for trust-signal claim before GA)

Total: 15 sub-cards. Expected implementation timeline: SC-1 through SC-3 + SC-14 in MVP sprint; SC-4 through SC-9 + SC-13 + SC-15 in the following 6–8 weeks; SC-10 through SC-12 post-launch.

12.1 LOE estimate — V1/V2 → V3 cryptographic hash chain migration

Kristerpher accepted the V3 deferral (ADR-0022) but asked for an LOE estimate for migrating later. Rough budget:

Phase	Scope	Effort
Add Ed25519 keypair material to all subsystem signers (Raptor, MQ-A, support tooling) — provision via Infisical, key rotation procedure	Schema + ops runbook	3–5 days
Backfill `signature` column on retroactive `sys_*` events. Either: (a) leave historical events unsigned with an explicit `pre_v3` flag, or (b) re-sign with a "migration anchor" signature so the chain is verifiable end-to-end. (b) is more rigorous; (a) is faster	Migration + audit doc	2–4 days
Update integrity checker (SC-6) to verify both the hash chain (already present) AND the per-event signatures	App code + tests	2 days
Update breach pipeline runbooks (the "what to do when a checker fails" SOP) to distinguish hash-chain breach from signature breach	Runbook + drill	1–2 days
Total	—	8–13 developer-days spread across one sprint

This is small enough to bundle into a single post-launch sprint. The investment is worth it once Raxx hosts real customer trade data — at that point, dispute-resolution and regulatory inquiries become realistic and a tamper-evident chain materially shifts the recordkeeping posture.

13. Open Questions (Require Kristerpher's Decision)

These questions block or shape sub-cards. Sub-cards will not be filed until Kristerpher answers them.

Storage path decision: 4-week SQLite MVP or go straight to Timescale? The design supports both, but the migration adds a sprint. If Raptor is already on Heroku Postgres (not SQLite), the Timescale add-on is a one-line schema change with no migration needed. Which database is Raptor using for its primary store today?
Render ID granularity. ADR-0023 proposes one Render ID per page load / SSE push (not per component). Is that the right level of detail for MVP, or does Kristerpher want component-level or field-level granularity now? Component-level is ~10× more events and adds SC-4 complexity significantly.
User-facing trace UI placement. Where in Antlers does the user see their own timeline? Options: (a) a "My Activity" page under account settings, (b) contextual — each workflow surface has an activity drawer, (c) both. This is a UX call that shapes how SC-8 structures its API response.
Support agent role naming. The RBAC role raptor-trace-support needs to compose into the raxx-support-team group defined in rbac-design.md. Should support agents be able to see all users' traces or only users they are actively supporting (scoped by an active support ticket)? The scoped variant is more private; the unscoped variant is simpler to implement and still fully audited.
Admin trace access: proactive notification or pull-only? Currently designed as pull (user sees the event when they check their timeline). Should Raxx proactively email the user when admin accesses their trace? ("A Raxx admin reviewed your account activity at 14:32 UTC today.") This is a strong trust signal but adds email infrastructure and may feel alarming to users.
Retention for render events. Render events may be relatively high volume (every page load). The design retains them for 2 years. Is 2 years the right balance between support utility and storage cost? An alternative: 90-day hot, then delete (rather than cold-tier).
Paper-first gate decision in the trace. The design requires the paper-gate evaluation result to be a named sys_* event. If a user has an explicit bypass override, that override is also traced. Should the override also trigger a user-visible notification ("live trading was enabled without paper-gate for this workflow")?
DPIA timing. The trace architecture collects behavioral event sequences, which constitutes systematic processing of personal data. A Data Protection Impact Assessment may be required under GDPR Art. 35 before GA if the volume crosses "large scale" thresholds. Should this be scoped as a sub-card now, or treated as a post-GA task?

14. Decision Log — 2026-04-29

Kristerpher reviewed the design (PR #498) and resolved every open question in §13 plus the ADRs:

#	Question	Decision	Source
Q1	Storage path	Postgres + TimescaleDB (Heroku Postgres + Timescale extension assumed)	comment 02:18 UTC + ADR-0021
Q2	Render-ID granularity	Per-view at MVP; per-component deferred. ADR-0023 LGTM	ADR-0023 review
Q3	User-facing trace UI placement	Default to (c) — both "My Activity" page + per-workflow drawer	architect interpretation
Q4	Support agent scope	Scoped — only see users actively supported via ticket. Anti-spying rationale	inline 03:01 UTC #4
Q5	Admin trace access notification	Proactive email. Bundle with regular account-change cadence to feel routine. Add 2-question micro-survey	inline 03:01 UTC #5
Q6	Retention for render events	Aggressive cold-tier rollover patterned after Splunk S2 (SmartStore). 30-day hot → cold	inline 03:01 UTC #6
Q7	Paper-first gate trace	Yes — record live↔paper transitions as first-class events. Support paper-only by default. Live-trading by support requires supervisory workflow OR training-grant credential	inline 03:01 UTC #7
Q8	DPIA timing	Scope now — ahead of the curve	inline 03:01 UTC #8

ADR-0022 (cryptographic hash chain): Kristerpher accepted V3 deferral, leaning on Splunk-detection-engineering style flagging at V1/V2. Asked for LOE estimate to migrate to V3 — captured in §12.1 above (8–13 developer-days, post-launch sprint).

Three new sub-cards added per the decisions: - SC-13 — live↔paper transitions + supervisory workflow for support live-mode actions (per Q7) - SC-14 — DPIA scoping (per Q8) - SC-15 — admin-access notification email + 2-question survey (per Q5)

End of design doc. Sub-cards filed 2026-04-29 per §14 decisions; tracking issues linked from PR #498.

Auto-generated from docs/ in raxx-app/TradeMasterAPI. Gated behind Cloudflare Access. Re-deployed on every push to main.