Natural-Language AI Strategy Execution — Feedback Investigation

Date: 2026-04-29 UTC Source: User feedback collected by Kristerpher via iMessage; screenshot at docs/research/user-feedback/2026-04-29-friend-feedback-ai-execute-strategy.png Refs: Issue #479

1. What Does Raxx Already Have That Supports This?

Order Execution Path (Raptor)

backend_v2/api/services/trading_runtime.py — resolves paper vs. live credentials, constructs an Alpaca REST client (paper: https://paper-api.alpaca.markets, live: https://api.alpaca.markets), fetches account state. The fetch_alpaca_account() function is live-tested against Alpaca.

backend_v2/api/routes/trading.py — POST /api/trading/orders exists and validates symbol, qty, side, order type. The execution body still returns a mock order (placing real orders is out of scope here). GET /api/trading/orders and DELETE /api/trading/orders/<id> are now wired to the live broker when credentials are present (mock fallback only when credentials are absent); the previous generate_mock_orders-only TODO blocks were resolved in the 2026-05-28 orders-wiring fix. The remaining gap for a schedulable execution loop is wiring POST to fetch_alpaca_orders / Alpaca order submission, tracked separately.

alpaca_integration.py — credential resolution, mode detection (paper default with live fallback), and readiness probes are production-quality. The integration scaffolding is real; only the order-submission wiring is missing.

Scheduler / Background-Task Infra

No application-level scheduler (APScheduler, Celery, or similar) exists in Raptor today. The background-task patterns present are:

backend_v2/api/__init__.py: a threading.Thread for one-shot symbol preload at startup — not a recurring scheduler.
backend_v2/api/routes/symbols.py: a single threading.Thread for cache refresh — also one-shot.
GitHub Actions cron at 0 8 7 * * * (nightly security scan) — durable but not usable for per-user strategy scheduling.
Heroku Procfile: only a web dyno; no worker or clock dyno defined. No Heroku Scheduler addon is in app.json.

Conclusion: there is no recurring scheduler in Raptor. One must be added. The most pragmatic MVP path is either (a) a clock dyno with APScheduler, or (b) a Heroku Scheduler addon calling a protected internal endpoint.

LLM Integration (Claude API)

The Claude API is used today exclusively for agent dispatch in the development toolchain (.claude/ directory). There is no anthropic SDK import anywhere in backend_v2/ or the frontend. Zero LLM integration exists in the product today.

For MVP, Claude's role would be translation: convert the user's natural-language description into a validated DSL struct (condition + threshold + action + schedule). This is a one-time call at strategy-creation time, not per-execution-tick. That keeps inference cost tractable (see Section 3 — Cost).

Risk Gates / Safety Rails

Current state in Raptor:

alpaca_integration.py: DEFAULT_TRADING_MODE defaults to paper. Live mode requires explicit ALPACA_API_KEY/ALPACA_API_SECRET env vars; falls back to paper if missing. This is the primary safety gate today.
trading.py: pattern_day_trader: False is hard-coded in the mock account response — PDT detection is not actually implemented against live Alpaca data.
No position-size caps, no per-day order limits, no ticker blacklist, and no kill-switch endpoint in the current codebase.

Gaps to close before an automated executor can be trusted with real money: actual PDT check, max-shares-per-order cap, daily notional limit per user, manual kill-switch (pause all strategies for a user), and a human-confirm step before first live execution per strategy.

Backtesting Infra

backend_v2/api/routes/backtest.py (932 lines) is genuinely functional. Four built-in strategies (MA crossover, RSI, Bollinger, MACD) run against real Alpaca historical bars via run_strategy_backtest(). The engine computes equity curves, trade lists, and performance metrics. The comparison harness (run_strategy_comparison) runs N strategies in one call.

The backtest engine is the natural "dry run" surface for a new user-authored strategy before it goes live. The deploy-to-paper bridge described in Epic 2's backlog (#79) is the downstream connection point.

2. Minimum Viable Version

The narrowest loop that proves the concept end-to-end:

Instrument class: ETFs only. Alpaca's /v2/assets returns asset type metadata; the symbols service already has crude ETF detection by name. No options at MVP.

Strategy authoring: A constrained DSL, not free-form English execution. The user types a sentence; Claude parses it into a validated JSON struct:

{
  "trigger": { "metric": "price_vs_nav", "operator": "lt", "threshold_pct": -0.5 },
  "action":  { "side": "buy", "qty": 10 },
  "schedule": { "day_of_week": "Friday", "time_utc": "17:00" }
}

If Claude cannot map the input to this struct with high confidence, it returns an error and asks the user to rephrase — no ambiguous execution. The struct is human-readable and shown to the user for confirmation before saving.

Paper trading only at MVP. Live execution requires a per-strategy opt-in toggle that is deliberately buried (confirm dialog + re-authentication). No live execution in MVP at all; the toggle can exist as a disabled UI element.

One scheduler granularity: daily check at a named UTC time, plus optional day-of-week filter. No intraday. No cron expressions exposed to users.

Strategy limit: 3 active strategies per user at MVP. Prevents runaway Claude calls and scheduler sprawl before cost and infra are understood.

Execution path: the strategy executor calls the existing POST /api/trading/orders endpoint, but with the Alpaca SDK wired in (the # TODO completed). The executor runs in a clock Heroku dyno or equivalent, wakes on schedule, evaluates the condition against live Alpaca quote data, and fires the order if the condition is met.

Human confirm: before any first execution per strategy, a digest notification is pushed to the user summarizing the pending trigger. A "confirm for this week" action is required. Subsequent auto-runs proceed without confirmation until the user pauses the strategy.

Scope estimate: 4-6 weeks for a 1-person sprint covering (a) DSL + Claude parse endpoint, (b) strategy CRUD in DB, (c) clock dyno + APScheduler integration, (d) live order wiring in trading.py, (e) confirm / digest notification flow, (f) basic Antlers UI for strategy creation and status. This assumes no options, no intraday, no live mode.

3. Risk Landscape

Regulatory — Investment Advice

If Raxx accepts a natural-language strategy description, passes it to an LLM, and executes orders on the user's behalf, the question of investment-adviser registration becomes active. The key distinction under the Investment Advisers Act of 1940 is whether the platform "provides investment advice for compensation." The user-authored framing (the LLM is a parser, not a proposer) is the strongest argument for the platform being a tool rather than an adviser. But the moment Raxx's AI suggests modifications, scores strategies, or recommends instruments, that framing weakens.

This needs a dedicated review by an attorney familiar with fintech and RIA registration. Do not scope the MVP to include any AI-generated strategy recommendations. Flag to Kristerpher: engage Matthew Crosby or a securities-law specialist before any live-trading mode ships publicly. The business-legal-researcher agent should scope this question in parallel.

Customer Harm — Hallucinated Execution

Claude could misparse a strategy description. A user who writes "buy 10 shares at a discount" and means $10 below yesterday's close could get an order fired at a $0.001 threshold if the DSL parsing goes wrong. Required mitigations:

Structured output with a mandatory human-confirm screen showing the parsed DSL in plain English before saving ("Every Friday at 17:00 UTC: if SPY trades more than 0.5% below its NAV, buy 10 shares at market. Does this look right?")
Maximum order notional cap per execution (e.g., $5,000 per fire at MVP)
Daily max-fire limit per strategy (1 execution per strategy per scheduled window)
Kill-switch endpoint (POST /api/strategies/:id/pause) accessible from the Antlers UI, reachable in one tap from the daily digest notification
Automatic strategy suspension after 3 consecutive execution errors

Cost — Claude Calls per User

At MVP scope (3 strategies per user, daily named-time schedule), Claude is called exactly once at strategy creation time (the parse call). No per-tick inference.

If the confirm step also re-validates the DSL via Claude (a lightweight classification call), that is 1 call per scheduled execution window. At 3 strategies per user × 52 weeks × ~4 Friday windows per month = ~624 calls/user/year, or ~52 calls/user/month. At Claude 3 Haiku pricing (~$0.25/1M input tokens), a 500-token parse prompt costs roughly $0.000125 per call. At 52 calls/month that is $0.0065/user/month. Well under the $5 threshold.

Risk: if the confirm step is upgraded to include a market-context summary (pulling quotes, computing NAV premium/discount, summarizing market conditions), token count could balloon to 2,000-5,000 per call. At that level, using Haiku at 52 calls/month, cost is still ~$0.03/user/month. Not a problem. The concern only becomes real at Sonnet-class pricing with frequent intraday re-evaluation, which MVP scope explicitly avoids.

4. Competitive Scan

Composer.trade

The most direct competitor. Composer offers a no-code strategy composer with a visual drag-and-drop logic tree; strategies execute automatically via an Alpaca broker connection. Strong execution reliability, slick UI. Gap: the authoring surface is still a visual form, not natural language. A user must learn Composer's DSL visually. An LLM-native interface that accepts "if ETF is at a discount, buy at noon Fridays" and surfaces the parsed logic for confirmation is a meaningfully lower-effort entry point.

Tradetron

India-based automated strategy marketplace. Users can buy/sell strategy templates; execution via multiple broker APIs. Strong on multi-broker support and strategy marketplace dynamics. Gap: US ETF/equity focus is secondary, onboarding is technically steep, and there is no LLM authoring layer.

TradeStation EasyLanguage

EasyLanguage is a proprietary scripting language with 30+ years of history. Sophisticated, deeply integrated with TradeStation's execution infrastructure. Gap: it is a programming language, not a natural-language interface. Target user in this feedback would not self-select into EasyLanguage without significant technical motivation.

Build Alpha

Institutional-grade strategy builder with a form-based logic composer. Strong on walk-forward testing and robustness checks. Gap: complexity-to-value ratio is high for a retail user who wants one recurring conditional order. No LLM layer.

QuantConnect

Cloud-based backtesting and live trading with a full Python SDK (LEAN engine). Most powerful platform in this category. Gap: requires Python fluency. A retail trader who cannot code is not the QuantConnect audience. The natural-language-to- strategy gap is exactly the opportunity QuantConnect leaves open.

Common gap across all five: none of them let a user type a plain-English conditional order, receive an explicit confirmation of the parsed intent, and schedule it against a paper account in under 2 minutes. That is the MVP surface.

5. Strategic Question for Kristerpher

Is natural-language AI strategy execution the headline feature of Raxx — the thing that defines what the product is and how it is positioned to users — or is it an adjacent feature that lives alongside the existing options-income-strategy roadmap without displacing it?

The answer shapes everything: pricing tier, marketing copy, investor framing, engineering priority, and which epics get resourced first. It needs a decision before any implementation cards are filed.

6. Proposed Cards (NOT Filed — Pending Headline/Adjacent Decision)

Listed in dependency order. Each is sized for one PR.

Card A: Wire Alpaca order submission in trading.py GET /api/trading/orders and DELETE /api/trading/orders/<id> are now live (2026-05-28). Remaining work: complete POST /api/trading/orders to submit real market and limit orders via the Alpaca broker API. Paper mode only. This unblocks every downstream card. Estimated size: S (2-3 days).

Card B: Strategy DSL schema + Claude parse endpoint Define the JSON struct for {trigger, action, schedule}. Add POST /api/strategies/parse — accepts a natural-language string, calls Claude Haiku, returns the parsed DSL or a structured error. Include unit tests with fixture responses (no live API calls in CI). Estimated size: M (3-5 days).

Card C: Strategy CRUD + SQLite persistence strategies table: user_id, name, dsl_json, status (active/paused), created_at, last_executed_at. CRUD endpoints. This is the persistence layer that the scheduler reads. Estimated size: S-M (2-4 days).

Card D: Clock dyno + APScheduler integration Add a clock process to the Procfile. On startup, load all active strategies from DB, register APScheduler jobs keyed by schedule field. On trigger time, evaluate the condition against a live Alpaca quote and fire the order if met. Includes execution logging to DB. Estimated size: M-L (4-6 days).

Card E: Human-confirm flow + kill-switch Before first execution: push a digest notification (in-app; email later) with the parsed plain-English confirmation. User must "confirm for this week" before the executor fires. POST /api/strategies/:id/pause endpoint. Antlers UI: one-tap pause from notification. Estimated size: M (3-5 days).

Card F: Antlers strategy creation + status UI New "Strategies" section (or sub-section under Automation). Text input, confirmation screen showing parsed DSL, active strategy list with last-run status, pause/resume controls. No Antlers work can land before Cards B and C merge. Estimated size: M (3-5 days).

Card G: Position-size and notional safety rails Per-execution notional cap, daily max-fire limit, automatic suspension after 3 consecutive errors, PDT check (real Alpaca account flag, not hardcoded). This is a risk-gate card, not a feature card — it should merge before or with Card D. Estimated size: S-M (2-3 days).

Total MVP estimate assuming serial execution with one engineer: 19-31 dev days (4-6 weeks). Parallelizable to ~3-4 weeks with two engineers owning Cards B/C in parallel with A/G.

7. Data-Science Research Layer

The following data-science artifacts were produced in response to this investigation. They are not implementation cards — they are the scientific substrate that implementation cards would build on.

ETF NAV Discount strategy (the concrete specimen from user feedback):

docs/data-science/strategies/etf-nav-discount/spec.md — full strategy specification including hypothesis, parameters, universe, and backtest design
docs/data-science/strategies/etf-nav-discount/data-schema.md — data models (StrategyDefinition, ConditionEvaluation, ExecutionRecord, HumanConfirmRequest, NAVRecord, ParseAttempt, backtest artifact manifest)
docs/data-science/strategies/etf-nav-discount/backtest-config.json — reproducible backtest configuration (2014–2024, SPY/IVV/GLD/TLT/HYG, in-sample 2014–2020 / OOS 2021–2024)
docs/data-science/strategies/etf-nav-discount/failure-modes.md — data failure modes (stale NAV, source offline, bond ETF iNAV divergence), signal failure modes (low trigger frequency, stress-regime accumulation), parsing failure modes, and historical regime breaks (Aug 2015, Mar 2020, Sep 2022)
docs/data-science/strategies/etf-nav-discount/model-card.md — productization handoff card with input/output schemas, compute budget, monitoring signals, and 5 open questions that must be resolved before feature-dev begins
docs/data-science/reference-impl/etf_nav_discount/ — reference Python implementation: models, NAV loader (CSV fixture + live scrape stubs), condition evaluator, backtest engine, reference entry point

NL → Strategy DSL parsing research:

docs/architecture/research/nl-strategy-dsl-parsing.md — comparison of 4 parsing approaches (LLM structured output, grammar-based, hybrid, fine-tuned); scoring matrix; system prompt design constraints; parse quality metrics; metric expansion roadmap; alignment with Raxx's "AI is opt-in adjacent" position

Data-science inventory updated:

docs/data-science/README.md — etf-nav-discount registered as a new strategy in research status with 5 open questions

Backtest status: the reference implementation is ready but the backtest has not been run on real data. Run time estimate is ~3 minutes on 10 years of daily data. The backtest can be run for research validation once OQ-3 is assigned (feature-developer or Kristerpher with Alpaca sandbox access). Productization is blocked on OQ-1 (regulatory clearance) regardless of backtest results.

8. Recommendation Summary

This section distills the six investigation sections into the minimum actionable set for Kristerpher.

The decision Kristerpher must make before anything else:

Is natural-language AI strategy execution the headline feature of Raxx, or is it an adjacent feature alongside the existing options-income roadmap?

This is Section 5, restated for emphasis. Everything else flows from this.

Parallel actions that should not wait for the decision:

Dispatch business-legal-researcher to scope the investment-adviser registration question (does "LLM-as-parser, user-as-author" hold under §202(a)(11)?) and the NAV data commercial licensing question. These have long lead times and should start now.
No implementation cards should be filed until (a) Kristerpher makes the headline/adjacent call, and (b) BLR returns the regulatory scope. Both are prerequisites for knowing what to build.

What does NOT need to wait:

The reference implementation and backtest config are available. If Kristerpher or a developer wants to run the ETF NAV discount backtest for curiosity/research purposes (not productization), it can be run today on paper with Alpaca historical data. Instructions are in docs/data-science/reference-impl/etf_nav_discount/reference.py.

Competitive advantage summary:

None of the five surveyed competitors (Composer, Tradetron, TradeStation EasyLanguage, Build Alpha, QuantConnect) offers a natural-language-to-scheduled- conditional-order surface that requires no technical skills and produces a human-readable confirmation before any execution. That is the gap. Whether it is Raxx's headline or a complement is the only open question.