Raxx · internal docs

internal · gated

Personal Trade-Context Journal — Shape 1 Strategy Brief

Shape ID: sentiment-journal-shape-1 Status: ready-for-feature-dev Date: 2026-06-05 Analyst: Data-scientist agent (Raxx) Scope: v1 launch


0. Vocabulary Pivot — 2026-06-05

The original strategy brief used emotion-labeled vocabulary for the post-trade taxonomy: Disciplined / Patient / Adjusted / Panicked / Surprised. That vocabulary has been replaced with structural/behavioral vocabulary effective 2026-06-05.

Reasons for the change:

  1. Brand voice. Labels like "Panicked" put a verdict-feel on the user. Raxx's brand position is structural enforcement — Raxx enforces the structure you already decided on, before emotion gets a vote. Labeling the user's close with a feeling-word contradicts that position. Structural language ("Override Rule," "Followed Plan") stays on-brand: it describes what the trade did relative to the structure, not what the user felt.

  2. Legal posture (CCPA §1798.140(ae)). BLR's parallel compliance research (PR #3239) classified emotion labels as potentially within the "psychological trends" Sensitive Personal Information category. Switching to structural-only vocabulary sidesteps the SPI category entirely, lowers the disclaimer burden, and produces a cleaner CCPA posture. See PR #3239 for the full SPI analysis and the specific disclaimer sections that still need redlining under the new vocabulary (flagged in §10 of this document).

What did NOT change:

Memory reference: raxx-shape-service-phases-1-2-3-staged


1. Post-Trade Taxonomy Design

1.1 Design rationale

The pre-trade label set (Bullish / Bearish / Neutral / High-Uncertainty) captures the trader's directional thesis at decision time — before market open, before the market has a price to anchor on.

The post-trade label captures a different axis: structural quality of the close — whether the exit honored the pre-committed exit rule or departed from it, and if so how. A trade that hit target because the user closed early (ahead of the rule) is still an Override Rule close. A trade that was stopped out as planned is still a Followed Plan close, regardless of the P/L sign.

This separation is the core value: it lets users query "was my P/L driven by structural adherence or by rule departures?" That query is not possible from P/L data alone.

The behavioral finance grounding is unchanged from the original design. Kahneman and Tversky's prospect theory (1979) establishes that traders weight losses ~2x more heavily than equivalent gains, creating a systematic tendency to cut winners early and hold losers. Tom Howard's "Behavioral Portfolio Management" (2014) frames the antidote as maintaining a pre-committed process log that persists past the outcome. The post-trade label is that log — retrospective, self-asserted, uncorrectable after 24 hours (see Section 2.4).

1.2 Pre-trade label decision (2026-06-05)

Decision: retain Bullish / Bearish / Neutral / High-Uncertainty.

Rationale (3-4 sentences): These labels describe the user's read on the market setup, not on themselves. "Bullish" means "I read this setup as directionally long-biased" — it is a market characterization drawn from the same vocabulary options traders use every day (bullish/bearish skew, bullish setup, etc.). The four-state set covers direction plus conviction orthogonally and maps cleanly to how iron condor, credit spread, and directional equity structures are actually constructed. Replacing them with setup-intent terms (Income / Momentum / Hedge / etc.) would overlap structurally with the strategy_type field already on the order record and add a second categorical dimension without additional analytical leverage.

The UI frames this label as "your read on this setup" — not "how you feel." That framing keeps the distinction clean without changing the vocabulary.

1.3 Taxonomy — five post-trade labels

Five labels. Not six, not four. Five is the empirically defensible minimum to distinguish meaningfully different behavioral states without creating so much granularity that users waffle or under-use the feature (decision fatigue causes abandonment — Howard 2014, ch. 3 on journaling dropout rates).

Label Code Operational definition When to pick it
FollowedPlan FP Trade closed exactly per the pre-committed exit rule Target hit, stop honored, or DTE roll executed on schedule — no deviation
HeldThroughPressure HP Pre-committed rule fired; meaningful pull to exit early existed but was not acted on There was pressure to exit early but the user held until the rule fired
AdjustedWithReason AR User modified position or exit rule mid-trade with a stated process reason Rolled a strike, resized, or changed DTE target with an articulable structural rationale
OverrodeRule OR User closed or held past the pre-committed exit rule without a process reason Exited early on discomfort, held past stop hoping for recovery, or held past target hoping for more
UnexpectedOutcome UO Pre-committed rules were followed; market moved outside the stated pre-trade setup thesis Structure and execution were sound; the setup thesis was invalidated by an external event not in the pre-trade model

Mapping from old vocabulary (for reference only — old labels were never shipped to prod):

Old label New label Notes
Disciplined FollowedPlan Exact structural equivalent
Patient HeldThroughPressure Exact structural equivalent
Adjusted AdjustedWithReason Renamed for clarity
Panicked OverrodeRule Structural equivalent; no feeling-word
Surprised UnexpectedOutcome Structural equivalent; no feeling-word

1.4 Pitfalls mitigated by this taxonomy

Hindsight relabeling: A user who made $800 on a trade is tempted to retroactively call it "Followed Plan" even if they held past their target on a hunch that paid off. The taxonomy makes the distinction explicit: if the exit rule was violated, it is Override Rule regardless of whether the trade was profitable. The UI description emphasizes structural adherence, not outcome. This is the core legal posture: the user is asserting a structural fact about their exit behavior, not a prediction.

Rationalization of rule departures: Rule departures tend to attract post-hoc process language ("I adjusted my thesis"). The AdjustedWithReason label provides a legitimate home for genuine structural adjustments without making it a catch-all excuse. The UI copy distinguishes: Adjusted With Reason means the user can describe the modification in structural terms (strike selection, credit available, theta decay rate); Override Rule means the user cannot. Users who habitually assign Adjusted With Reason to departures with no journal note get a pattern signal in query results ("n >= 10, AdjustedWithReason rate

40% with no journal note — consider recording your adjustment rationale").

Omission: Post-trade labeling is optional but prompted. If a user consistently skips labels, the backtest filter returns "no label data" for those trades. The feature does not infer a label. System never auto-classifies (operator constraint — unchanged).

1.5 Behavioral-finance literature support


2. Data Model

2.1 Schema decision: new table, not trades extension

The existing trades / paper_orders tables in Raptor are execution records — they belong to the execution audit trail. Trade-context labels are user-authored behavioral annotations. These are distinct concerns with different owners, different cadences, and potentially different retention rules (a user can delete their journal without affecting the execution record). A separate trade_sentiment_labels table with a foreign key to the trade is the correct design. (The internal table name retains sentiment — renaming a table in a pre-production feature for vocabulary consistency is not worth a migration risk.)

The join key on the backtest side is trade_id (the internal trade identifier). For paper orders this is paper_orders.id; for live orders this will be the broker order ID cross- referenced to the internal order record. The schema anticipates both with a source_table discriminator.

2.2 Column specification

See reference-impl/sentiment-journal/schema.sql for the full DDL. Summary:

Column Type Notes
id INTEGER PK AUTOINCREMENT
user_id INTEGER NOT NULL FK → users; row-level isolation
source_table TEXT NOT NULL paper_orders or live_orders (v1: paper_orders only)
trade_id INTEGER NOT NULL FK → the source table's PK
pre_label TEXT NOT NULL Bullish | Bearish | Neutral | HighUncertainty
pre_label_recorded_at TEXT NOT NULL ISO-8601 UTC; must be <= order submit time + 15 min
post_label TEXT Nullable; FollowedPlan | HeldThroughPressure | AdjustedWithReason | OverrodeRule | UnexpectedOutcome
post_label_recorded_at TEXT ISO-8601 UTC; nullable until label is set
post_label_locked_at TEXT ISO-8601 UTC; set when 24h lock fires (see 2.4)
journal_note TEXT Nullable; max 2000 chars; free-text post-trade note
taxonomy_version INTEGER NOT NULL DEFAULT 1 Versioning hook (see 2.5)
created_at TEXT NOT NULL ISO-8601 UTC
updated_at TEXT NOT NULL ISO-8601 UTC

Unique constraint: (user_id, source_table, trade_id) — one label row per trade per user.

2.3 Recording cadence

Pre-trade label: Recorded at order-entry confirmation — specifically, when the user submits the trade through the Trade Window. The UI presents the pre-label picker as a required step in the order ticket before "Submit." The system sets pre_label_recorded_at to the server-side UTC timestamp at the moment of write. It cannot be recorded after order submission.

Rationale: If recorded before order entry, the user can change it before submitting and the record loses meaning. If allowed after order submission, it becomes hindsight. The prompt at order-ticket-submit is the only defensible moment.

Post-trade label: Recorded any time after the position is closed (status = filled and exit recorded), up to 24 hours after close. After 24 hours, post_label_locked_at is set and the label is frozen. The UI shows a banner on the trade card: "Label this close — 18 hours left." If the user never labels, the row stays with post_label = NULL and the backtest query omits it from filtered slices (it still appears in unfiltered results).

Journal note: Optional, same 24-hour write window as post-label. Stored as plaintext, max 2000 chars. No NLP processing in v1. Tagged for future personal-RAG retrieval with taxonomy_version so a future embedding model can filter by schema era.

2.4 24-hour lock and immutability

Once post_label_locked_at is set, the row is read-only. This design choice serves two goals: 1. Prevents retroactive rationalization relabeling after observing longer-term outcomes. 2. Gives the backtest query a stable, user-asserted dataset — labels are facts about the moment, not current assessments of old trades.

The lock is enforced at the API layer (not just the DB). A PATCH /api/sentiment-labels/:id request checks post_label_locked_at IS NULL before allowing a write.

2.5 Taxonomy versioning

taxonomy_version = 1 covers the current five-label set. If labels are added, renamed, or removed in a future sprint, the new rows ship with taxonomy_version = 2. The backtest query filters by a caller-specified version range (default: all versions). Version metadata lives in a sentiment_taxonomy_versions config table (see schema.sql) so the UI can render the correct picker for historical trades without a code ship.


3. Backtest-Query API Design

3.1 Transport choice: REST

The existing Raptor backtest layer is REST (POST /api/backtest/run). The trade-context query follows the same pattern. GraphQL is not warranted — the query shape is narrow and predictable. An internal Python function handles aggregation; the REST endpoint wraps it.

3.2 Endpoint

GET /api/backtest/sentiment-filter

Auth: require_session middleware (same as all backtest routes). Never available on marketing surfaces. Flag-gated: FLAG_SENTIMENT_JOURNAL must be 1.

Query parameters:

Parameter Type Required Description
pre_label string No Filter by pre-trade label. One of: Bullish, Bearish, Neutral, HighUncertainty. Omit for all.
post_label string No Filter by post-trade label. One of: FollowedPlan, HeldThroughPressure, AdjustedWithReason, OverrodeRule, UnexpectedOutcome. Omit for all.
source_table string No paper_orders (default) or live_orders. v1 only supports paper_orders.
strategy_type string No Filter to a specific strategy type (e.g., iron_condor). Omit for all.
date_from string No ISO-8601 date. Inclusive lower bound on trade close date.
date_to string No ISO-8601 date. Inclusive upper bound on trade close date.
taxonomy_version integer No Filter to a specific taxonomy version. Default: all.

Minimum-N threshold: n = 10. Slices with fewer than 10 labeled trades return a sample_too_small: true flag in the response alongside whatever data is available. The slice data is still returned — the flag is a quality signal, not a gate. Rationale: - At n < 5, any win-rate figure is within the margin of a coin flip by definition. - At n = 10, a 70% win rate vs. a 30% win rate is distinguishable with p ≈ 0.09 (Fisher exact, one-sided) — borderline, but enough to prompt reflection. - At n = 20, p < 0.05 for a 70%/30% split. The UI should nudge users toward n = 20 before drawing strong conclusions, but 10 is the minimum for any display. - We do not hide results below n = 10; we warn. Paternalistic hiding reduces feature utility and contradicts the "user asserts, system reflects" posture.

3.3 Response shape

{
  "filter": {
    "pre_label": "Bullish",
    "post_label": "OverrodeRule",
    "source_table": "paper_orders",
    "strategy_type": null,
    "date_from": null,
    "date_to": null
  },
  "sample": {
    "n": 7,
    "sample_too_small": true,
    "sample_too_small_threshold": 10
  },
  "aggregates": {
    "win_rate": 0.43,
    "avg_pnl": -42.15,
    "median_pnl": -18.00,
    "total_pnl": -295.05,
    "avg_credit_received": 210.00,
    "max_single_loss": -312.00,
    "max_drawdown_pct": -0.148,
    "avg_dte_at_entry": 21.4
  },
  "trades": [
    {
      "trade_id": 42,
      "close_date": "2026-04-15",
      "strategy_type": "iron_condor",
      "symbol": "SPY",
      "pnl": -312.00,
      "pre_label": "Bullish",
      "post_label": "OverrodeRule",
      "journal_note_present": true
    }
  ],
  "baseline": {
    "description": "All labeled trades (no filter applied)",
    "n": 63,
    "win_rate": 0.67,
    "avg_pnl": 88.50
  },
  "generated_at": "2026-06-05T14:23:00Z"
}

Key design decisions in the response: - journal_note_present is boolean, not the note text itself. The full note is available via a separate GET /api/sentiment-labels/:id endpoint. This keeps the list response compact and prevents accidental exposure of free-text in logs. - baseline is always included — it contextualizes the filtered slice against the user's overall labeled-trade history. Without a baseline, users cannot tell if "Override Rule trades lost $42 avg" is meaningful. - max_drawdown_pct is per-trade max loss as percentage of credit received, not a portfolio drawdown. The backtest filter does not imply a portfolio construction model.

3.4 Feeding the backtest result display

The response slots into the existing backtest result display via a new "Trade Context Filter" tab alongside the existing aggregate view. The trades array maps directly to the existing trade-row component. The aggregates object maps to the existing metrics panel. The baseline section renders as a comparison row under the metrics. Feature-developer wires this — no changes to the backtest engine itself.


4. Reference Python Implementation

See: - docs/data-science/reference-impl/sentiment_journal/taxonomy.py — enums + JSON schema - docs/data-science/reference-impl/sentiment_journal/schema.sql — DDL - docs/data-science/reference-impl/sentiment_journal/query.py — backtest query function - docs/data-science/reference-impl/sentiment_journal/demo.py — worked example with toy data

The reference implementation uses stdlib + pandas/numpy only. No exotic dependencies. query.py required zero changes — it is vocabulary-agnostic (operates on column values, not hardcoded enum strings).


5. Cold-Start Strategy

Recommendation: optional retro-labeling with a pre-populated suggestion seeded from the rule engine's exit-reason log.

First user has zero labeled trades. Three options were considered:

Option A — Empty state with synthetic preview: Show what the filter result would look like with fake data. Rejected: this risks misleading users about their own data.

Option B — Block feature until n = 10 real labels: Rejected: delays value, increases abandonment risk before the habit forms. The journaling habit needs to feel useful from the first label, not the tenth.

Option C — Optional retro-labeling + immediate utility: Recommended. When a user enables the feature, the system offers a one-time "Label your past trades" prompt that walks through their last 20 paper trades (most recent first) and asks for a post-trade label on each. The exit_reason field on the existing order record can pre-suggest (not pre-fill) a label category: e.g., exit_reason = "stop_loss_hit" suggests checking "FollowedPlan" or "HeldThroughPressure." The user still asserts, the system suggests. The system never auto-fills a label.

The retro-labeling session uses the same 24-hour lock: retro labels on past trades are locked on save (no 24h window — they are already historical). This is disclosed in the UI.

Empty state copy (suggestion for feature-developer):

"Your trade-context filter is ready. Label your next trade close to start building your pattern history. Your results here are retrospective only — based entirely on what you recorded at the time."


6. Handoff Packet for Feature-Developer

Schema migration

Next available migration number is 046. Two SQL files needed (SQLite dev path and Alembic Postgres path).

SQLite (dev/test): backend_v2/db/migrations/046_trade_sentiment_labels.sql Alembic (Postgres): alembic/versions/0016_trade_sentiment_labels.py (next Alembic rev after 0015 — verify against current HEAD before filing)

The migration creates: 1. trade_sentiment_labels — main label table (see schema.sql) 2. sentiment_taxonomy_versions — config table (2 rows: version 1 pre-labels, version 1 post-labels) 3. Indexes on (user_id, source_table, trade_id), (user_id, post_label), (user_id, pre_label), (user_id, post_label_locked_at) for query performance

No changes to existing paper_orders, strategies, or backtest_runs tables.

API surface

Three new routes, all behind FLAG_SENTIMENT_JOURNAL:

POST   /api/sentiment-labels           Create label at order submission
PATCH  /api/sentiment-labels/:id       Update post_label / journal_note (lock check)
GET    /api/sentiment-labels/:id       Fetch full label including journal_note
GET    /api/backtest/sentiment-filter  Backtest query endpoint (see Section 3)

All routes: require_session, FLAG_SENTIMENT_JOURNAL = 1 guard, user-scoped (user cannot read/write another user's labels).

UI hooks needed

Hook Placement Notes
Pre-label picker Order ticket, before Submit button Required field; cannot submit without selecting. Bullish / Bearish / Neutral / High-Uncertainty with one-line descriptions. Framed as "your read on this setup."
Post-label prompt Trade card, after close confirmation Banner with "Label this close" + 5 label buttons + optional note textarea. 24h countdown shown. Labels: Followed Plan / Held Through Pressure / Adjusted With Reason / Override Rule / Unexpected Outcome.
Retro-label flow Triggered once on first feature enable Walk through last 20 paper trades; skip allowed per trade.
Trade-context filter tab Backtest result view Sits alongside existing tab set. Loads GET /api/backtest/sentiment-filter with filter UI (dropdowns for pre/post label, date range).
Empty state Trade-context filter tab, zero labels Short copy + CTA to label next trade. No synthetic data.

Feature flag

Suggested name: FLAG_SENTIMENT_JOURNAL

Suggested soak: 7 days on paper accounts before any live-account exposure. Risk classification: high (customer-facing, touches order ticket flow). Default OFF. Flip via heroku config:set FLAG_SENTIMENT_JOURNAL=1.

Console migration required in same PR: console/migrations/versions/0145_promote_sentiment_journal.py

Estimated feature-developer effort

Vocabulary pivot is a ~0 delta from the original estimate. The architecture, routes, data model, and component hooks are all unchanged. The only code differences are: (a) enum string values in taxonomy.py and schema.sql CHECK constraint, and (b) UI label strings in the picker components. Both are string literals, not logic changes.

Work item Estimate
Schema migration (SQLite + Alembic) 0.5 days
API routes (POST, PATCH, GET label, GET filter) 2 days
Query aggregation function (port from reference impl) 0.5 days
Order ticket UI — pre-label picker 1 day
Trade card UI — post-label prompt + lock countdown 1.5 days
Retro-label flow 1 day
Trade-context filter tab in backtest view 1.5 days
Console migration + flag registration 0.5 days
Tests (route tests, query function tests) 1 day
Total ~9.5 days

7. Risk and Failure Modes

Self-report bias: The journal is only as honest as the user. The system cannot detect whether a label is accurate. Mitigation: the 24-hour lock and the label descriptions reduce but cannot eliminate motivated relabeling.

Sparse data: Users who label inconsistently get a filtered backtest with missing denominator. The sample_too_small flag handles the display; the bigger risk is that users draw conclusions from 3-trade slices. The threshold of 10 with prominent warning addresses this.

Taxonomy staleness: If the five-label taxonomy evolves (e.g., AdjustedWithReason is split into "RollAdjustment" and "SizeAdjustment"), the taxonomy_version column allows the old labels to coexist in the DB. The backtest query can filter to a version range, or aggregate across versions when the label codes are equivalent.

Performance: The GET /api/backtest/sentiment-filter query scans the trade_sentiment_labels table filtered by user_id and optional label columns. At 1000 trades per user (generous estimate for v1 paper traders), this is sub-millisecond with the proposed indexes. No caching needed at v1 scale.


8. Compliance Notes


9. Cited References


10. BLR PR #3239 — Disclaimer Lines Requiring Redline

The following lines in docs/legal/research/shape-1-personal-sentiment-journal-compliance-2026-06-05.md (on branch blr/shape-1-sentiment-compliance-2026-06-05) contain emotion-vocabulary references that need redlining now that the post-trade labels use structural language.

BLR should not apply the data-scientist's redlines directly — this list is a handoff so BLR can revise legal interpretation and disclaimer text with full context of the vocabulary change.

Disclaimer text that references specific old label names:

Disclaimer text that characterizes the labels as emotional-state data (now no longer accurate):

These lines describe the label data as reflecting emotional or psychological state. With structural vocabulary, the legal characterization changes: labels now describe exit behavior, not psychological state. BLR needs to reassess whether the SPI conservative position still applies or whether it can be relaxed.

Lines that are unaffected by the vocabulary change (legal reasoning does not depend on the specific label names or emotional framing):


This document is a research specification. It does not constitute investment advice. Filter results describe what occurred in the user's own historical paper-trading record during the period covered. Past results on paper-trading data do not predict live-trading outcomes.