Natural-Language AI Strategy Execution — Feedback Investigation
Date: 2026-04-29 UTC
Source: User feedback collected by Kristerpher via iMessage; screenshot at
docs/research/user-feedback/2026-04-29-friend-feedback-ai-execute-strategy.png
Refs: Issue #479
1. What Does Raxx Already Have That Supports This?
Order Execution Path (Raptor)
backend_v2/api/services/trading_runtime.py — resolves paper vs. live credentials,
constructs an Alpaca REST client (paper: https://paper-api.alpaca.markets, live:
https://api.alpaca.markets), fetches account state. The fetch_alpaca_account()
function is live-tested against Alpaca.
backend_v2/api/routes/trading.py — POST /api/trading/orders exists and validates
symbol, qty, side, order type. The execution body still returns a mock order (placing
real orders is out of scope here). GET /api/trading/orders and
DELETE /api/trading/orders/<id> are now wired to the live broker when credentials
are present (mock fallback only when credentials are absent); the previous
generate_mock_orders-only TODO blocks were resolved in the 2026-05-28 orders-wiring
fix. The remaining gap for a schedulable execution loop is wiring POST to
fetch_alpaca_orders / Alpaca order submission, tracked separately.
alpaca_integration.py — credential resolution, mode detection (paper default with
live fallback), and readiness probes are production-quality. The integration
scaffolding is real; only the order-submission wiring is missing.
Scheduler / Background-Task Infra
No application-level scheduler (APScheduler, Celery, or similar) exists in Raptor today. The background-task patterns present are:
backend_v2/api/__init__.py: athreading.Threadfor one-shot symbol preload at startup — not a recurring scheduler.backend_v2/api/routes/symbols.py: a singlethreading.Threadfor cache refresh — also one-shot.- GitHub Actions cron at
0 8 7 * * *(nightly security scan) — durable but not usable for per-user strategy scheduling. - Heroku
Procfile: only awebdyno; noworkerorclockdyno defined. No Heroku Scheduler addon is inapp.json.
Conclusion: there is no recurring scheduler in Raptor. One must be added. The most
pragmatic MVP path is either (a) a clock dyno with APScheduler, or (b) a Heroku
Scheduler addon calling a protected internal endpoint.
LLM Integration (Claude API)
The Claude API is used today exclusively for agent dispatch in the development
toolchain (.claude/ directory). There is no anthropic SDK import anywhere in
backend_v2/ or the frontend. Zero LLM integration exists in the product today.
For MVP, Claude's role would be translation: convert the user's natural-language
description into a validated DSL struct (condition + threshold + action + schedule).
This is a one-time call at strategy-creation time, not per-execution-tick. That keeps
inference cost tractable (see Section 3 — Cost).
Risk Gates / Safety Rails
Current state in Raptor:
alpaca_integration.py:DEFAULT_TRADING_MODEdefaults topaper. Live mode requires explicitALPACA_API_KEY/ALPACA_API_SECRETenv vars; falls back to paper if missing. This is the primary safety gate today.trading.py:pattern_day_trader: Falseis hard-coded in the mock account response — PDT detection is not actually implemented against live Alpaca data.- No position-size caps, no per-day order limits, no ticker blacklist, and no kill-switch endpoint in the current codebase.
Gaps to close before an automated executor can be trusted with real money: actual PDT check, max-shares-per-order cap, daily notional limit per user, manual kill-switch (pause all strategies for a user), and a human-confirm step before first live execution per strategy.
Backtesting Infra
backend_v2/api/routes/backtest.py (932 lines) is genuinely functional. Four
built-in strategies (MA crossover, RSI, Bollinger, MACD) run against real Alpaca
historical bars via run_strategy_backtest(). The engine computes equity curves,
trade lists, and performance metrics. The comparison harness (run_strategy_comparison)
runs N strategies in one call.
The backtest engine is the natural "dry run" surface for a new user-authored strategy
before it goes live. The deploy-to-paper bridge described in Epic 2's backlog
(#79) is the downstream connection point.
2. Minimum Viable Version
The narrowest loop that proves the concept end-to-end:
Instrument class: ETFs only. Alpaca's /v2/assets returns asset type metadata;
the symbols service already has crude ETF detection by name. No options at MVP.
Strategy authoring: A constrained DSL, not free-form English execution. The user types a sentence; Claude parses it into a validated JSON struct:
{
"trigger": { "metric": "price_vs_nav", "operator": "lt", "threshold_pct": -0.5 },
"action": { "side": "buy", "qty": 10 },
"schedule": { "day_of_week": "Friday", "time_utc": "17:00" }
}
If Claude cannot map the input to this struct with high confidence, it returns an error and asks the user to rephrase — no ambiguous execution. The struct is human-readable and shown to the user for confirmation before saving.
Paper trading only at MVP. Live execution requires a per-strategy opt-in toggle that is deliberately buried (confirm dialog + re-authentication). No live execution in MVP at all; the toggle can exist as a disabled UI element.
One scheduler granularity: daily check at a named UTC time, plus optional day-of-week filter. No intraday. No cron expressions exposed to users.
Strategy limit: 3 active strategies per user at MVP. Prevents runaway Claude calls and scheduler sprawl before cost and infra are understood.
Execution path: the strategy executor calls the existing POST /api/trading/orders
endpoint, but with the Alpaca SDK wired in (the # TODO completed). The executor
runs in a clock Heroku dyno or equivalent, wakes on schedule, evaluates the
condition against live Alpaca quote data, and fires the order if the condition is met.
Human confirm: before any first execution per strategy, a digest notification is pushed to the user summarizing the pending trigger. A "confirm for this week" action is required. Subsequent auto-runs proceed without confirmation until the user pauses the strategy.
Scope estimate: 4-6 weeks for a 1-person sprint covering (a) DSL + Claude parse
endpoint, (b) strategy CRUD in DB, (c) clock dyno + APScheduler integration,
(d) live order wiring in trading.py, (e) confirm / digest notification flow,
(f) basic Antlers UI for strategy creation and status. This assumes no options,
no intraday, no live mode.
3. Risk Landscape
Regulatory — Investment Advice
If Raxx accepts a natural-language strategy description, passes it to an LLM, and executes orders on the user's behalf, the question of investment-adviser registration becomes active. The key distinction under the Investment Advisers Act of 1940 is whether the platform "provides investment advice for compensation." The user-authored framing (the LLM is a parser, not a proposer) is the strongest argument for the platform being a tool rather than an adviser. But the moment Raxx's AI suggests modifications, scores strategies, or recommends instruments, that framing weakens.
This needs a dedicated review by an attorney familiar with fintech and RIA
registration. Do not scope the MVP to include any AI-generated strategy
recommendations. Flag to Kristerpher: engage Matthew Crosby or a securities-law
specialist before any live-trading mode ships publicly. The business-legal-researcher
agent should scope this question in parallel.
Customer Harm — Hallucinated Execution
Claude could misparse a strategy description. A user who writes "buy 10 shares at a discount" and means $10 below yesterday's close could get an order fired at a $0.001 threshold if the DSL parsing goes wrong. Required mitigations:
- Structured output with a mandatory human-confirm screen showing the parsed DSL in plain English before saving ("Every Friday at 17:00 UTC: if SPY trades more than 0.5% below its NAV, buy 10 shares at market. Does this look right?")
- Maximum order notional cap per execution (e.g., $5,000 per fire at MVP)
- Daily max-fire limit per strategy (1 execution per strategy per scheduled window)
- Kill-switch endpoint (
POST /api/strategies/:id/pause) accessible from the Antlers UI, reachable in one tap from the daily digest notification - Automatic strategy suspension after 3 consecutive execution errors
Cost — Claude Calls per User
At MVP scope (3 strategies per user, daily named-time schedule), Claude is called exactly once at strategy creation time (the parse call). No per-tick inference.
If the confirm step also re-validates the DSL via Claude (a lightweight classification call), that is 1 call per scheduled execution window. At 3 strategies per user × 52 weeks × ~4 Friday windows per month = ~624 calls/user/year, or ~52 calls/user/month. At Claude 3 Haiku pricing (~$0.25/1M input tokens), a 500-token parse prompt costs roughly $0.000125 per call. At 52 calls/month that is $0.0065/user/month. Well under the $5 threshold.
Risk: if the confirm step is upgraded to include a market-context summary (pulling quotes, computing NAV premium/discount, summarizing market conditions), token count could balloon to 2,000-5,000 per call. At that level, using Haiku at 52 calls/month, cost is still ~$0.03/user/month. Not a problem. The concern only becomes real at Sonnet-class pricing with frequent intraday re-evaluation, which MVP scope explicitly avoids.
4. Competitive Scan
Composer.trade
The most direct competitor. Composer offers a no-code strategy composer with a visual drag-and-drop logic tree; strategies execute automatically via an Alpaca broker connection. Strong execution reliability, slick UI. Gap: the authoring surface is still a visual form, not natural language. A user must learn Composer's DSL visually. An LLM-native interface that accepts "if ETF is at a discount, buy at noon Fridays" and surfaces the parsed logic for confirmation is a meaningfully lower-effort entry point.
Tradetron
India-based automated strategy marketplace. Users can buy/sell strategy templates; execution via multiple broker APIs. Strong on multi-broker support and strategy marketplace dynamics. Gap: US ETF/equity focus is secondary, onboarding is technically steep, and there is no LLM authoring layer.
TradeStation EasyLanguage
EasyLanguage is a proprietary scripting language with 30+ years of history. Sophisticated, deeply integrated with TradeStation's execution infrastructure. Gap: it is a programming language, not a natural-language interface. Target user in this feedback would not self-select into EasyLanguage without significant technical motivation.
Build Alpha
Institutional-grade strategy builder with a form-based logic composer. Strong on walk-forward testing and robustness checks. Gap: complexity-to-value ratio is high for a retail user who wants one recurring conditional order. No LLM layer.
QuantConnect
Cloud-based backtesting and live trading with a full Python SDK (LEAN engine). Most powerful platform in this category. Gap: requires Python fluency. A retail trader who cannot code is not the QuantConnect audience. The natural-language-to- strategy gap is exactly the opportunity QuantConnect leaves open.
Common gap across all five: none of them let a user type a plain-English conditional order, receive an explicit confirmation of the parsed intent, and schedule it against a paper account in under 2 minutes. That is the MVP surface.
5. Strategic Question for Kristerpher
Is natural-language AI strategy execution the headline feature of Raxx — the thing that defines what the product is and how it is positioned to users — or is it an adjacent feature that lives alongside the existing options-income-strategy roadmap without displacing it?
The answer shapes everything: pricing tier, marketing copy, investor framing, engineering priority, and which epics get resourced first. It needs a decision before any implementation cards are filed.
6. Proposed Cards (NOT Filed — Pending Headline/Adjacent Decision)
Listed in dependency order. Each is sized for one PR.
Card A: Wire Alpaca order submission in trading.py
GET /api/trading/orders and DELETE /api/trading/orders/<id> are now live
(2026-05-28). Remaining work: complete POST /api/trading/orders to submit real
market and limit orders via the Alpaca broker API. Paper mode only. This unblocks
every downstream card. Estimated size: S (2-3 days).
Card B: Strategy DSL schema + Claude parse endpoint
Define the JSON struct for {trigger, action, schedule}. Add
POST /api/strategies/parse — accepts a natural-language string, calls Claude
Haiku, returns the parsed DSL or a structured error. Include unit tests with fixture
responses (no live API calls in CI). Estimated size: M (3-5 days).
Card C: Strategy CRUD + SQLite persistence
strategies table: user_id, name, dsl_json, status (active/paused), created_at,
last_executed_at. CRUD endpoints. This is the persistence layer that the scheduler
reads. Estimated size: S-M (2-4 days).
Card D: Clock dyno + APScheduler integration
Add a clock process to the Procfile. On startup, load all active strategies from
DB, register APScheduler jobs keyed by schedule field. On trigger time, evaluate
the condition against a live Alpaca quote and fire the order if met. Includes
execution logging to DB. Estimated size: M-L (4-6 days).
Card E: Human-confirm flow + kill-switch
Before first execution: push a digest notification (in-app; email later) with the
parsed plain-English confirmation. User must "confirm for this week" before the
executor fires. POST /api/strategies/:id/pause endpoint. Antlers UI: one-tap
pause from notification. Estimated size: M (3-5 days).
Card F: Antlers strategy creation + status UI New "Strategies" section (or sub-section under Automation). Text input, confirmation screen showing parsed DSL, active strategy list with last-run status, pause/resume controls. No Antlers work can land before Cards B and C merge. Estimated size: M (3-5 days).
Card G: Position-size and notional safety rails Per-execution notional cap, daily max-fire limit, automatic suspension after 3 consecutive errors, PDT check (real Alpaca account flag, not hardcoded). This is a risk-gate card, not a feature card — it should merge before or with Card D. Estimated size: S-M (2-3 days).
Total MVP estimate assuming serial execution with one engineer: 19-31 dev days (4-6 weeks). Parallelizable to ~3-4 weeks with two engineers owning Cards B/C in parallel with A/G.
7. Data-Science Research Layer
The following data-science artifacts were produced in response to this investigation. They are not implementation cards — they are the scientific substrate that implementation cards would build on.
ETF NAV Discount strategy (the concrete specimen from user feedback):
docs/data-science/strategies/etf-nav-discount/spec.md— full strategy specification including hypothesis, parameters, universe, and backtest designdocs/data-science/strategies/etf-nav-discount/data-schema.md— data models (StrategyDefinition, ConditionEvaluation, ExecutionRecord, HumanConfirmRequest, NAVRecord, ParseAttempt, backtest artifact manifest)docs/data-science/strategies/etf-nav-discount/backtest-config.json— reproducible backtest configuration (2014–2024, SPY/IVV/GLD/TLT/HYG, in-sample 2014–2020 / OOS 2021–2024)docs/data-science/strategies/etf-nav-discount/failure-modes.md— data failure modes (stale NAV, source offline, bond ETF iNAV divergence), signal failure modes (low trigger frequency, stress-regime accumulation), parsing failure modes, and historical regime breaks (Aug 2015, Mar 2020, Sep 2022)docs/data-science/strategies/etf-nav-discount/model-card.md— productization handoff card with input/output schemas, compute budget, monitoring signals, and 5 open questions that must be resolved before feature-dev beginsdocs/data-science/reference-impl/etf_nav_discount/— reference Python implementation: models, NAV loader (CSV fixture + live scrape stubs), condition evaluator, backtest engine, reference entry point
NL → Strategy DSL parsing research:
docs/architecture/research/nl-strategy-dsl-parsing.md— comparison of 4 parsing approaches (LLM structured output, grammar-based, hybrid, fine-tuned); scoring matrix; system prompt design constraints; parse quality metrics; metric expansion roadmap; alignment with Raxx's "AI is opt-in adjacent" position
Data-science inventory updated:
docs/data-science/README.md—etf-nav-discountregistered as a new strategy inresearchstatus with 5 open questions
Backtest status: the reference implementation is ready but the backtest has not been run on real data. Run time estimate is ~3 minutes on 10 years of daily data. The backtest can be run for research validation once OQ-3 is assigned (feature-developer or Kristerpher with Alpaca sandbox access). Productization is blocked on OQ-1 (regulatory clearance) regardless of backtest results.
8. Recommendation Summary
This section distills the six investigation sections into the minimum actionable set for Kristerpher.
The decision Kristerpher must make before anything else:
Is natural-language AI strategy execution the headline feature of Raxx, or is it an adjacent feature alongside the existing options-income roadmap?
This is Section 5, restated for emphasis. Everything else flows from this.
Parallel actions that should not wait for the decision:
-
Dispatch
business-legal-researcherto scope the investment-adviser registration question (does "LLM-as-parser, user-as-author" hold under §202(a)(11)?) and the NAV data commercial licensing question. These have long lead times and should start now. -
No implementation cards should be filed until (a) Kristerpher makes the headline/adjacent call, and (b) BLR returns the regulatory scope. Both are prerequisites for knowing what to build.
What does NOT need to wait:
- The reference implementation and backtest config are available. If Kristerpher
or a developer wants to run the ETF NAV discount backtest for curiosity/research
purposes (not productization), it can be run today on paper with Alpaca historical
data. Instructions are in
docs/data-science/reference-impl/etf_nav_discount/reference.py.
Competitive advantage summary:
None of the five surveyed competitors (Composer, Tradetron, TradeStation EasyLanguage, Build Alpha, QuantConnect) offers a natural-language-to-scheduled- conditional-order surface that requires no technical skills and produces a human-readable confirmation before any execution. That is the gap. Whether it is Raxx's headline or a complement is the only open question.