Raxx · internal docs

internal · gated

Customer-Facing Error Code Traceability — Audit and Recommendation

Status: Decision pending
Owner: software-architect
Date: 2026-05-20 UTC
Related ADR: 0104
Related designs: workflow-uuid-tracing.md, support-raxx-app.md
Parent card: #2619 (SC-D12 troubleshooting.md)


1. Context

T-3 days to v1 launch (2026-05-23 UTC). When a customer hits an error — in the app, on the marketing site, or in the demo flow — the current system gives them either a raw HTTP status code, a generic "Something went wrong" string, or occasionally a raw Python exception message that may include internal details (including vendor names). None of these are quotable to support@raxx.app.

This document audits the current state across all customer-facing surfaces, documents the gaps, and recommends a traceability scheme that is ready for v1 and survives into GA.


2. Invariants

The following non-negotiable constraints apply to this design:

  1. No stored credentials. Error codes and trace IDs must never encode or expose credential material, session tokens, or API keys.
  2. No vendor names in customer-facing copy. Error messages shown to users must not name Alpaca, SnapTrade, Alpha Vantage, or any other third-party service. This applies to error toasts, error boundaries, JSON message fields in API responses that bubble to the UI, and any support-quoted code.
  3. Audit trail for every state change that affects money, permissions, or data access. Error events that terminate a trade workflow are state changes; they must be traceable server-side.
  4. GDPR by default. Error context logged for support lookups must not contain raw PII beyond what is already retained per ADR-0003. Error codes themselves are not PII; trace IDs reference records that are.
  5. Error codes are rotatable artifacts. A code-to-description mapping lives in a config file, not hardcoded in templates. Support tooling reads the same mapping. The mapping must survive a support agent turnover without losing context.

3. Audit Findings by Surface

3.1 raxx.app (Antlers — authenticated app)

Error Boundary (ErrorBoundary.js)

PageStateCard (PageStateCard.js)

TradeForm (TradeForm.js)

TradingModeModal / TradingModeToggle

Header / Dashboard — broker connection status

Settings page (Settings.js, line 550)

Backtesting page (Backtesting.js)

HistoricalData API client (historicalDataAPI.js)

3.2 getraxx.com (marketing — CF-Access-gated)

WaitlistSection (WaitlistSection.js)

3.3 demo.raxx.app (Demo flow)

3.4 Backend API — systematic raw-exception passthrough

The pattern return jsonify({"error": str(e)}), 500 appears approximately 100 times across route files. This means:

Files with highest raw-exception count: historical_data.py (13 instances), backtest.py (13 instances), trading.py (7 instances), market_data.py (6 instances).

One confirmed vendor name leak: trading.py line 136 returns the literal string "Live trading mode requires valid trading credentials (ALPACA_API_KEY and ALPACA_API_SECRET)." with specific env var names. This is rendered in the mode-switch modal.

3.5 Sentry integration — current state

ErrorBoundary.js calls Sentry.captureException(error) when the Sentry SDK is initialized. The Sentry event ID is available (Sentry.lastEventId()) but is not surfaced to the user anywhere. The backend error handlers do not tag request_id on Sentry events.

3.6 X-Request-ID — current state

The logging middleware (logging.py) mints a UUID request_id for every request and sets it on g.request_id. The error handlers include "request_id": _request_id() in the JSON response. The response header X-Request-ID is set. The trace middleware also propagates X-Workflow-ID when flags are enabled.

However: - No frontend component reads X-Request-ID or request_id from error responses to show the user. - The request_id in the error JSON is invisible because frontend error renderers show only error.message or data.message, not data.request_id. - X-Workflow-ID is propagated via response header but never displayed.


4. Gap Summary

Gap Surface Severity Pre-launch-blocking
G1: ErrorBoundary leaks raw exception + component stack in production HTML raxx.app High Yes
G2: str(e) passthrough — ~100 backend error returns expose internal exception text including vendor names Raptor API High Yes
G3: No error code or trace ID surfaced to customer on any error All surfaces High Yes
G4: alpaca_api_message / alpaca_api_status key name can carry broker exception text to rendered UI raxx.app (Header, Dashboard, Settings) Medium Yes
G5: No error code shown in TradeForm (money-affecting flow) raxx.app High Yes
G6: Sentry event ID available but not shown to customer raxx.app Medium No (post-launch)
G7: WaitlistSection generic error — no code, no email hint getraxx.com Low No
G8: Demo flow errors have no quotable anchor demo.raxx.app Low No
G9: request_id in error JSON is never rendered by any frontend component All surfaces Medium Yes (infra exists, just not wired)

Pre-launch-blocking gaps: G1, G2, G3, G4, G5, G9 (6 gaps).


5. Option Analysis

Option A — Surface existing workflow UUID directly

Surface g.trace_workflow_id (format: wfl_<32-hex>) in error responses and show it in the error UI. Support looks it up in trace_workflows.

Pros: no new infrastructure; existing trace_middleware already mints and stores workflow IDs.

Cons: wfl_a3f91b2c4d5e6f70... is 36 characters — not human-quotable over a support email. Requires trace middleware to be flag-enabled and database-backed, which is controlled by FLAG_WORKFLOW_TRACE_SCHEMA and FLAG_TRACE_MIDDLEWARE. At v1 launch, these flags may or may not be on.

Option B — New RAX-NNN scheme

Define a registry of RAX-001 through RAX-NNN codes mapped to error classes. Store the code on the Sentry tag and in the log line. Show it in the UI.

Pros: extremely human-quotable. Familiar pattern (Stripe, Twilio, GitHub).

Cons: maintenance burden — every new error type needs a registry entry. The registry becomes a source of truth that drifts. With 100+ raw exception passthrough sites, bootstrapping the registry is a sprint-sized task, not a day's work. Cannot ship in 2 days.

Option C — Hybrid: short domain prefix + truncated request ID

Format: RAX-<DOMAIN>-<8-hex> where DOMAIN is a 3–4 letter surface/category code and the 8-hex is the last 8 characters of the request_id UUID already minted by logging middleware.

Examples: - RAX-TRD-a3f91b2c — trade domain error - RAX-BCK-7e4d9f01 — backtest domain error - RAX-AUTH-c8b21e44 — auth domain error - RAX-SYS-000000ff — generic system error

Support looks up request_id ending in a3f91b2c in the log drain (Heroku log search or Sentry). The full request_id UUID is already in every log line via logging.py. The error JSON already contains request_id.

Pros: - Human-quotable (12 characters vs 36). - Leverages existing request_id infrastructure — no new DB writes. - Domain prefix tells support which subsystem to look in immediately. - 8-hex suffix has ~1-in-4-billion collision rate at current request volume. - Rollout is 1–2 days: wire the code into the error JSON and frontend display. - No flag dependency — request_id is always minted.

Cons: slightly more complex display than a pure integer code. Domain mapping adds a small maintenance surface.

Recommendation: Option C.

It is the only option implementable before the 2026-05-23 UTC launch. Options A and B require either flag-dependent infrastructure (A) or a new registry sprint (B). Option C ships with two focused sub-cards: one backend (compute and include the error_code field in error responses) and one frontend (surface it in the UI). It is also composable with Option A post-launch: once trace middleware is confirmed on, the full wfl_* ID can be appended alongside the short code.


6. Design: Option C Implementation

6.1 Domain codes

Domain Prefix Covers
auth AUTH WebAuthn, session, RBAC
trading TRD order placement, mode switches, positions
backtest BCK backtest run, comparison, export
historical data HST data fetch, source queries
market data MKT quote, price feed
onboarding ONB wizard, account setup
billing / DSR BIL subscriptions, erasure requests
system SYS all other / generic

6.2 Backend: error_code field in all error responses

The _request_id() helper in error_handlers.py already has the full UUID. Add a _error_code(domain) helper that returns RAX-<DOMAIN>-<last-8-hex>. The domain is derived from the request path prefix using the same prefix map used by trace_middleware._derive_action_type.

The error handlers, route-level return jsonify({"error": str(e)}), 500, and the Exception catch-all must all include error_code.

For the str(e) passthrough sites: the immediate fix is to replace {"error": str(e)} with {"error": "An unexpected error occurred.", "error_code": <code>}. This kills two birds: no more raw exception leakage, and a quotable code appears.

For the vendor name leak at trading.py:136: replace with "Your trading account could not be connected. Check your credentials in Settings.".

For alpaca_api_status / alpaca_api_message response keys: rename to broker_connection_status / broker_connection_message and ensure the message value is sanitized through the same redaction list used by logging.py (_SENSITIVE_PATTERNS). A separate broker-name redaction pass (strip known vendor names) should run on this field before it is serialized.

6.3 Frontend: error code display

The ErrorBoundary must: 1. Remove the inline component stack from the production render (expose it only when process.env.NODE_ENV === 'development'). 2. Show a support-reference line: "Reference code: RAX-SYS-XXXXXXXX — quote this when contacting support@raxx.app." 3. Read the Sentry event ID if available and append it as a secondary reference.

PageStateCard error state must accept an optional errorCode prop and render it below the message.

TradeForm must read error.response?.data?.error_code and show it alongside the order failure message.

All API client wrappers that propagate error.message should be updated to propagate error.response?.data?.error_code as a secondary field so callers can display it.

6.4 Sequence: customer reports error to support

sequenceDiagram
    participant C as Customer
    participant A as Antlers (UI)
    participant R as Raptor (API)
    participant L as Log drain
    participant S as Support agent

    C->>A: action triggers error
    A->>R: POST /api/trading/orders
    R-->>A: 500 {"error":"An unexpected error occurred.","error_code":"RAX-TRD-a3f91b2c","request_id":"...full-uuid..."}
    A-->>C: "Order failed. Reference code: RAX-TRD-a3f91b2c — quote this when contacting support@raxx.app"
    C->>S: email support@raxx.app "I got RAX-TRD-a3f91b2c"
    S->>L: search logs for request_id ending a3f91b2c
    L-->>S: full request log line with user_id, path, traceback, duration_ms
    S->>C: "Found your request — the order was rejected because..."

6.5 Relationship to workflow tracing

When FLAG_TRACE_MIDDLEWARE is on, g.trace_workflow_id is also available. The error response can include workflow_id alongside error_code. The frontend can then offer a two-level reference: the short human-readable code for phone/email, and the full workflow ID for paste-into-ticket scenarios.

This composition does not require any changes to the trace schema.


7. Migrations

No database migrations required for Option C. The request_id UUID is already minted in-memory per request by logging.py. The error_code field is derived from it at response time and is not stored (it is reconstructible from any log line containing request_id).

Post-launch (ADR-0104 follow-up): when FLAG_TRACE_MIDDLEWARE is confirmed on in production, wire workflow_id into the error response so the longer trace lookup is also possible. This is additive, no migration needed.


8. Rollout Plan

Phase Gate Description
Pre-launch (v1) Deploy before 2026-05-23 UTC Backend: error_code in all error responses; vendor-name sanitization on str(e) passthrough sites; trading.py credential message fixed. Frontend: ErrorBoundary hides stack in prod; shows error code + support email. TradeForm shows error code.
Post-launch (v1.1) After soak Add workflow_id to error responses when trace middleware is confirmed on. Update support tooling to accept both the short code and the full workflow ID as lookup keys.
Post-launch (v1.2) After SC-D12 ships Troubleshooting docs (SC-D12) reference the RAX-<DOMAIN>-<8-hex> format with examples. Support runbook documents log search procedure.

9. Security Considerations


10. Open Questions

None blocking the recommended pre-launch sub-cards. Post-launch questions are noted below for the operator's awareness but do not block v1.

OQ-1: Should workflow_id appear in error responses at v1 launch, or only post-launch when FLAG_TRACE_MIDDLEWARE is confirmed stable? Recommendation: include it conditionally (if g.trace_workflow_id: response["workflow_id"] = ...) so it appears when the flag is on and is absent otherwise. No customer-facing copy should reference it until the troubleshooting docs are updated.

OQ-2: The alpaca_api_status / alpaca_api_message key rename affects the Header.js, Dashboard.js, and Settings.js frontend consumers. This is a breaking change to the /api/system/status response shape. Should this be shipped as part of the error-code sub-cards or deferred to a separate cleanup card? Recommendation: ship with the error-code sub-cards since it resolves a pre-launch-blocking vendor name leak.


11. Sub-Cards

# Title Size Blocking
SC-ERR-1 Raptor: add error_code field to all error responses; sanitize str(e) passthrough; fix trading credential message S Yes — pre-launch
SC-ERR-2 Antlers: ErrorBoundary hides stack in prod, shows error code + support@ hint; TradeForm + PageStateCard surface error_code S Yes — pre-launch
SC-ERR-3 Raptor + Antlers: rename alpaca_api_* to broker_connection_* in /api/system/status + all frontend consumers XS Yes — pre-launch