ETF NAV Discount Strategy — Failure Modes
Strategy ID: etf-nav-discount
Version: 0.1 (research)
1. Data Failure Modes
FM-D1: Stale NAV Data
What happens: the NAV record used at evaluation time is more than nav_lag_days
old (e.g., a long weekend or data provider failure). The discount computation uses
a stale denominator, potentially triggering an order on a false discount.
Detection: ConditionEvaluation.data_quality_flags includes stale_nav when
nav_lag_hours > nav_lag_days * 24.
Mitigation: abort the evaluation (do not fire the order) when stale_nav is
flagged. Emit an alert. Do not silently skip — log the missed window.
Severity: HIGH. A stale NAV can be significantly wrong during market stress.
FM-D2: NAV Source Goes Offline
What happens: fund family websites (SPDR, iShares) go offline or change their HTML structure, breaking the NAV scraper.
Detection: NAVRecord has no entry for yesterday's date at evaluation time.
Mitigation: multi-source fallback (try ETF.com if primary fails). If all sources fail, abort evaluation and alert. Track source reliability over time.
Severity: HIGH. Strategy cannot execute safely without NAV data.
FM-D3: Bond ETF iNAV Divergence
What happens: for bond ETFs (TLT, HYG), the intraday iNAV lags the true fair value because underlying bond prices update slowly. During market stress, the published NAV may be hours stale at the time of evaluation, making discounts appear larger than they really are.
Detection: compare iNAV vs. prior-day NAV; flag if divergence exceeds 1.5%.
Mitigation: apply a wider nav_lag_days threshold for bond ETFs, or exclude
bond ETFs from intraday evaluation. In MVP, use end-of-day NAV for all tickers
(avoids this entirely).
Severity: MEDIUM for MVP (end-of-day NAV). HIGH if real-time iNAV is added.
FM-D4: Alpaca Quote Stale or Unavailable
What happens: the real-time quote fetch from Alpaca returns a stale quote (market closed, network error, API outage).
Detection: quote timestamp more than 30 minutes before evaluation time.
Mitigation: abort evaluation, flag quote_stale, emit alert.
Severity: HIGH. Market price is the other half of the discount computation.
2. Signal Failure Modes
FM-S1: Signal Fires Too Infrequently
What happens: for liquid ETFs (SPY, IVV), the AP arbitrage mechanism is so efficient that the -0.5% threshold is almost never breached in normal markets. The strategy accumulates no positions, and the user sees no activity.
Evidence from literature: SPY discount/premium is typically within ±0.1% on normal trading days. Discounts >0.5% occur primarily during market stress (March 2020, August 2015, etc.).
Impact: user perceives the strategy as broken when it is actually correct. UI must communicate "no condition met this week" clearly to distinguish from errors.
Mitigation: expose signal frequency in the strategy status UI ("0 of 52 evaluations triggered this year"). Recommend lower thresholds (e.g., -0.25%) for higher-liquidity ETFs, or switching to bond ETFs where discounts are wider.
Severity: LOW for correctness; HIGH for user experience / retention.
FM-S2: Signal Fires Exclusively During Stress
What happens: the strategy only triggers during high-volatility / market-stress periods (when discounts are >0.5%). These are exactly the periods when buying is most uncomfortable for users and when additional capital deployment is most risky from a risk-management standpoint.
Impact: the strategy executes correctly per its rules, but the timing feels counterintuitive and may coincide with the user's maximum stress.
Mitigation: this is by design — mean-reversion strategies are contrarian. The user confirmation flow and plain-English notification ("SPY is trading 0.7% below NAV right now — your strategy will buy 10 shares at market") helps the user understand the rationale at execution time.
Severity: MEDIUM. This is a user-expectation issue, not a technical fault.
FM-S3: Regime Dependency — Sustained Discounts
What happens: in extreme stress regimes (March 2020, 2008 credit crisis), ETF discounts can persist for days or weeks as AP desks halt creations. The mean-reversion hypothesis breaks down; the strategy may buy repeatedly into a falling position.
Evidence: Bond ETFs showed discounts of 3-5% during March 2020 liquidity crisis (Ben-David, Franzoni, Moussawi, 2020 NBER Working Paper 27573).
Mitigation: - Position accumulation cap: e.g., max 50 shares per ticker per strategy (5 weekly buys at 10 shares each) - Optional VIX gate: pause strategy when VIX > 35 (extreme fear regime) - In backtest, document performance breakdown by VIX regime bucket
Severity: HIGH. The strategy's worst drawdown will coincide with extreme market stress — this must be documented prominently.
FM-S4: Over-Accumulation Without Exit
What happens: the MVP does not define an exit. If the condition triggers every week for 3 months, the user accumulates 130 shares (10/week × 13 weeks) costing ~$53,000 at SPY ~$408. Most users will not have intended this.
Mitigation: - Position accumulation cap (see FM-S3) - Hard notional cap per strategy (default $5,000 total open notional) - Weekly notification showing current accumulated position and unrealized P&L - V2 exit condition (sell when premium > NAV by 0.5%)
Severity: HIGH from user-harm perspective. Highest priority safety rail.
3. NL Parsing Failure Modes
FM-P1: Ambiguous Quantity
What happens: the user writes "buy some shares" or "buy a few ETFs." Claude cannot resolve the quantity to a specific integer.
Detection: ParseAttempt.error_code = "ambiguous_action",
parsed_dsl.action.qty is null.
Mitigation: refuse to save the strategy; return a clarification prompt ("How many shares would you like to buy?").
Severity: LOW (blocked before execution).
FM-P2: Hallucinated Ticker
What happens: the user writes "buy that tech ETF." Claude infers "QQQ" or "XLK" with high confidence, but the user meant a different ETF.
Detection: ticker not in the approved ETF universe list; or confidence < 0.85.
Mitigation: the confirmation screen shows the resolved ticker explicitly. User must confirm before the strategy is saved.
Severity: MEDIUM. Caught at confirmation if UI is clear.
FM-P3: Conflated Condition and Action
What happens: the user writes "buy SPY when it dips." Claude might interpret "dips" as price-vs-NAV, price-vs-SMA, or RSI-based. The wrong metric is mapped.
Detection: confidence < 0.70; multiple valid interpretations exist.
Mitigation: when confidence is below threshold, Claude returns a disambiguation prompt ("Do you mean: (a) SPY price below its 20-day average, or (b) SPY price below its NAV?") rather than picking one.
Severity: HIGH if executed incorrectly; MEDIUM if disambiguation UI is present.
FM-P4: Schedule Outside Market Hours
What happens: user writes "buy at midnight" or a non-market time. Claude parses the time as requested.
Detection: schedule.time_utc outside 14:30–21:00 UTC (market hours for NYSE).
Mitigation: DSL schema validation rejects time outside market hours with a clear error message.
Severity: LOW (caught by schema validation before saving).
4. Regulatory Failure Modes
FM-R1: Investment Advice Framing
What happens: if Claude's confirmation summary or error messages include forward-looking language ("this strategy typically generates positive returns" or "historically, ETF discounts revert within 5 days"), it crosses from tool-behavior into advice-giving.
Mitigation: all Claude-generated text in the user-facing confirmation is constrained to factual description of the parsed intent, not interpretation or prediction. System prompt explicitly prohibits performance language.
Severity: HIGH from regulatory risk standpoint. Must be reviewed by Matthew Crosby or securities counsel before any AI-generated UI copy is shipped.
FM-R2: Pattern Day Trading Rule Violation
What happens: if the user's brokerage account has fewer than $25,000 in equity, the strategy must not trigger 4 or more round-trip trades within a 5-day rolling window.
Detection: PDT flag must be read from live Alpaca account status, not hardcoded.
(Current trading.py has pattern_day_trader: False hardcoded — this is a known
gap, tracked as Card G in the product brief.)
Mitigation: before each execution, check Alpaca account's
pattern_day_trader flag. If PDT-flagged and account equity < $25,000, block
order and alert user.
Severity: HIGH. PDT violations result in margin calls and trading restrictions.
5. Known Regime Breaks in Historical Data
| Period | What Happened | Strategy Impact |
|---|---|---|
| Aug 2015 (Flash Crash) | ETF prices decoupled from NAV for ~30 min at open | Strategy fires at extreme discount; mean-reverts within days. Favorable outcome but frightening in real-time |
| Mar 2020 (COVID) | Bond ETFs (HYG, TLT) showed 3-5% discounts lasting 1-2 weeks as APs paused | Accumulation cap breached; significant unrealized loss at trough; full recovery over ~6 weeks |
| Sep 2022 (Rate Shock) | TLT discount persisted 3+ days as rates rose sharply | Repeated triggers; accumulation risk; strategy correct directionally but painful |
Conclusion for backtest: all three periods must appear in the out-of-sample window (2021–2024 includes Sep 2022; March 2020 is in-sample boundary). The walk-forward split should be set so at least one stress episode appears in each half.
6. References
- Ben-David, I., et al. (2020). "Implications of Tail Risks from ETFs." NBER Working Paper 27573. Documents bond ETF discount persistence in March 2020.
- Madhavan, A., & Sobczyk, A. (2016). "Price Dynamics and Liquidity of Exchange- Traded Funds." Journal of Investment Management 14(2).
- ICI Factbook 2024: ETF creation/redemption activity and AP behavior data.