Raxx · internal docs

internal · gated ↑ index

Natural-Language AI Strategy Execution — Feedback Investigation

Date: 2026-04-29 UTC Source: User feedback collected by Kristerpher via iMessage; screenshot at docs/research/user-feedback/2026-04-29-friend-feedback-ai-execute-strategy.png Refs: Issue #479


1. What Does Raxx Already Have That Supports This?

Order Execution Path (Raptor)

backend_v2/api/services/trading_runtime.py — resolves paper vs. live credentials, constructs an Alpaca REST client (paper: https://paper-api.alpaca.markets, live: https://api.alpaca.markets), fetches account state. The fetch_alpaca_account() function is live-tested against Alpaca.

backend_v2/api/routes/trading.pyPOST /api/trading/orders exists and validates symbol, qty, side, order type. However, the execution body currently calls a mock generator (generate_mock_orders), not the Alpaca SDK. Actual order submission to Alpaca via api.submit_order() is stubbed with # TODO comments throughout trading.py. This is the single biggest gap between current state and a schedulable execution loop.

alpaca_integration.py — credential resolution, mode detection (paper default with live fallback), and readiness probes are production-quality. The integration scaffolding is real; only the order-submission wiring is missing.

Scheduler / Background-Task Infra

No application-level scheduler (APScheduler, Celery, or similar) exists in Raptor today. The background-task patterns present are:

Conclusion: there is no recurring scheduler in Raptor. One must be added. The most pragmatic MVP path is either (a) a clock dyno with APScheduler, or (b) a Heroku Scheduler addon calling a protected internal endpoint.

LLM Integration (Claude API)

The Claude API is used today exclusively for agent dispatch in the development toolchain (.claude/ directory). There is no anthropic SDK import anywhere in backend_v2/ or the frontend. Zero LLM integration exists in the product today.

For MVP, Claude's role would be translation: convert the user's natural-language description into a validated DSL struct (condition + threshold + action + schedule). This is a one-time call at strategy-creation time, not per-execution-tick. That keeps inference cost tractable (see Section 3 — Cost).

Risk Gates / Safety Rails

Current state in Raptor:

Gaps to close before an automated executor can be trusted with real money: actual PDT check, max-shares-per-order cap, daily notional limit per user, manual kill-switch (pause all strategies for a user), and a human-confirm step before first live execution per strategy.

Backtesting Infra

backend_v2/api/routes/backtest.py (932 lines) is genuinely functional. Four built-in strategies (MA crossover, RSI, Bollinger, MACD) run against real Alpaca historical bars via run_strategy_backtest(). The engine computes equity curves, trade lists, and performance metrics. The comparison harness (run_strategy_comparison) runs N strategies in one call.

The backtest engine is the natural "dry run" surface for a new user-authored strategy before it goes live. The deploy-to-paper bridge described in Epic 2's backlog (#79) is the downstream connection point.


2. Minimum Viable Version

The narrowest loop that proves the concept end-to-end:

Instrument class: ETFs only. Alpaca's /v2/assets returns asset type metadata; the symbols service already has crude ETF detection by name. No options at MVP.

Strategy authoring: A constrained DSL, not free-form English execution. The user types a sentence; Claude parses it into a validated JSON struct:

{
  "trigger": { "metric": "price_vs_nav", "operator": "lt", "threshold_pct": -0.5 },
  "action":  { "side": "buy", "qty": 10 },
  "schedule": { "day_of_week": "Friday", "time_utc": "17:00" }
}

If Claude cannot map the input to this struct with high confidence, it returns an error and asks the user to rephrase — no ambiguous execution. The struct is human-readable and shown to the user for confirmation before saving.

Paper trading only at MVP. Live execution requires a per-strategy opt-in toggle that is deliberately buried (confirm dialog + re-authentication). No live execution in MVP at all; the toggle can exist as a disabled UI element.

One scheduler granularity: daily check at a named UTC time, plus optional day-of-week filter. No intraday. No cron expressions exposed to users.

Strategy limit: 3 active strategies per user at MVP. Prevents runaway Claude calls and scheduler sprawl before cost and infra are understood.

Execution path: the strategy executor calls the existing POST /api/trading/orders endpoint, but with the Alpaca SDK wired in (the # TODO completed). The executor runs in a clock Heroku dyno or equivalent, wakes on schedule, evaluates the condition against live Alpaca quote data, and fires the order if the condition is met.

Human confirm: before any first execution per strategy, a digest notification is pushed to the user summarizing the pending trigger. A "confirm for this week" action is required. Subsequent auto-runs proceed without confirmation until the user pauses the strategy.

Scope estimate: 4-6 weeks for a 1-person sprint covering (a) DSL + Claude parse endpoint, (b) strategy CRUD in DB, (c) clock dyno + APScheduler integration, (d) live order wiring in trading.py, (e) confirm / digest notification flow, (f) basic Antlers UI for strategy creation and status. This assumes no options, no intraday, no live mode.


3. Risk Landscape

Regulatory — Investment Advice

If Raxx accepts a natural-language strategy description, passes it to an LLM, and executes orders on the user's behalf, the question of investment-adviser registration becomes active. The key distinction under the Investment Advisers Act of 1940 is whether the platform "provides investment advice for compensation." The user-authored framing (the LLM is a parser, not a proposer) is the strongest argument for the platform being a tool rather than an adviser. But the moment Raxx's AI suggests modifications, scores strategies, or recommends instruments, that framing weakens.

This needs a dedicated review by an attorney familiar with fintech and RIA registration. Do not scope the MVP to include any AI-generated strategy recommendations. Flag to Kristerpher: engage Matthew Crosby or a securities-law specialist before any live-trading mode ships publicly. The business-legal-researcher agent should scope this question in parallel.

Customer Harm — Hallucinated Execution

Claude could misparse a strategy description. A user who writes "buy 10 shares at a discount" and means $10 below yesterday's close could get an order fired at a $0.001 threshold if the DSL parsing goes wrong. Required mitigations:

Cost — Claude Calls per User

At MVP scope (3 strategies per user, daily named-time schedule), Claude is called exactly once at strategy creation time (the parse call). No per-tick inference.

If the confirm step also re-validates the DSL via Claude (a lightweight classification call), that is 1 call per scheduled execution window. At 3 strategies per user × 52 weeks × ~4 Friday windows per month = ~624 calls/user/year, or ~52 calls/user/month. At Claude 3 Haiku pricing (~$0.25/1M input tokens), a 500-token parse prompt costs roughly $0.000125 per call. At 52 calls/month that is $0.0065/user/month. Well under the $5 threshold.

Risk: if the confirm step is upgraded to include a market-context summary (pulling quotes, computing NAV premium/discount, summarizing market conditions), token count could balloon to 2,000-5,000 per call. At that level, using Haiku at 52 calls/month, cost is still ~$0.03/user/month. Not a problem. The concern only becomes real at Sonnet-class pricing with frequent intraday re-evaluation, which MVP scope explicitly avoids.


4. Competitive Scan

Composer.trade

The most direct competitor. Composer offers a no-code strategy composer with a visual drag-and-drop logic tree; strategies execute automatically via an Alpaca broker connection. Strong execution reliability, slick UI. Gap: the authoring surface is still a visual form, not natural language. A user must learn Composer's DSL visually. An LLM-native interface that accepts "if ETF is at a discount, buy at noon Fridays" and surfaces the parsed logic for confirmation is a meaningfully lower-effort entry point.

Tradetron

India-based automated strategy marketplace. Users can buy/sell strategy templates; execution via multiple broker APIs. Strong on multi-broker support and strategy marketplace dynamics. Gap: US ETF/equity focus is secondary, onboarding is technically steep, and there is no LLM authoring layer.

TradeStation EasyLanguage

EasyLanguage is a proprietary scripting language with 30+ years of history. Sophisticated, deeply integrated with TradeStation's execution infrastructure. Gap: it is a programming language, not a natural-language interface. Target user in this feedback would not self-select into EasyLanguage without significant technical motivation.

Build Alpha

Institutional-grade strategy builder with a form-based logic composer. Strong on walk-forward testing and robustness checks. Gap: complexity-to-value ratio is high for a retail user who wants one recurring conditional order. No LLM layer.

QuantConnect

Cloud-based backtesting and live trading with a full Python SDK (LEAN engine). Most powerful platform in this category. Gap: requires Python fluency. A retail trader who cannot code is not the QuantConnect audience. The natural-language-to- strategy gap is exactly the opportunity QuantConnect leaves open.

Common gap across all five: none of them let a user type a plain-English conditional order, receive an explicit confirmation of the parsed intent, and schedule it against a paper account in under 2 minutes. That is the MVP surface.


5. Strategic Question for Kristerpher

Is natural-language AI strategy execution the headline feature of Raxx — the thing that defines what the product is and how it is positioned to users — or is it an adjacent feature that lives alongside the existing options-income-strategy roadmap without displacing it?

The answer shapes everything: pricing tier, marketing copy, investor framing, engineering priority, and which epics get resourced first. It needs a decision before any implementation cards are filed.


6. Proposed Cards (NOT Filed — Pending Headline/Adjacent Decision)

Listed in dependency order. Each is sized for one PR.

Card A: Wire Alpaca order submission in trading.py Complete the # TODO blocks in POST /api/trading/orders to submit real market and limit orders via the Alpaca SDK. Paper mode only. This unblocks every downstream card. Estimated size: S (2-3 days).

Card B: Strategy DSL schema + Claude parse endpoint Define the JSON struct for {trigger, action, schedule}. Add POST /api/strategies/parse — accepts a natural-language string, calls Claude Haiku, returns the parsed DSL or a structured error. Include unit tests with fixture responses (no live API calls in CI). Estimated size: M (3-5 days).

Card C: Strategy CRUD + SQLite persistence strategies table: user_id, name, dsl_json, status (active/paused), created_at, last_executed_at. CRUD endpoints. This is the persistence layer that the scheduler reads. Estimated size: S-M (2-4 days).

Card D: Clock dyno + APScheduler integration Add a clock process to the Procfile. On startup, load all active strategies from DB, register APScheduler jobs keyed by schedule field. On trigger time, evaluate the condition against a live Alpaca quote and fire the order if met. Includes execution logging to DB. Estimated size: M-L (4-6 days).

Card E: Human-confirm flow + kill-switch Before first execution: push a digest notification (in-app; email later) with the parsed plain-English confirmation. User must "confirm for this week" before the executor fires. POST /api/strategies/:id/pause endpoint. Antlers UI: one-tap pause from notification. Estimated size: M (3-5 days).

Card F: Antlers strategy creation + status UI New "Strategies" section (or sub-section under Automation). Text input, confirmation screen showing parsed DSL, active strategy list with last-run status, pause/resume controls. No Antlers work can land before Cards B and C merge. Estimated size: M (3-5 days).

Card G: Position-size and notional safety rails Per-execution notional cap, daily max-fire limit, automatic suspension after 3 consecutive errors, PDT check (real Alpaca account flag, not hardcoded). This is a risk-gate card, not a feature card — it should merge before or with Card D. Estimated size: S-M (2-3 days).

Total MVP estimate assuming serial execution with one engineer: 19-31 dev days (4-6 weeks). Parallelizable to ~3-4 weeks with two engineers owning Cards B/C in parallel with A/G.