Rolling-Income Monte Carlo Simulator + Regime-Aware Entry Gate
Strategy / Infrastructure ID: mc-regime-gate
Status: Research — reference implementation complete; walk-forward backtest pending data licensing resolution
Date: 2026-05-06
Author: Data-scientist agent (Raxx / MQ-A layer)
GitHub issue: #246
Related epic: #79 (Backtesting Lab)
Depends on: markov-fit-analysis.md, 2026-05-05-layered-covered-call-strategy.md, historical-options-data-vendors.md
Reference implementation: docs/data-science/reference-impl/mc_regime_gate/
1. Goal and Invariants
What this package does
The Monte Carlo simulator + regime-aware entry gate is a structure tool for MQ-A. Given the user's stated strategy parameters (entry credit threshold, DTE band, delta band, position sizing, exit rule, roll rule), it:
- Classifies the current market into one of four volatility regimes derived from the HMM work in
markov-fit-analysis.md. - Queries the historical record of how the user's specific rule has triggered and resolved within that regime over the most recent 24 months of paper-trade or backtest data.
- Runs a Monte Carlo bootstrap over that regime-conditional history to produce a distribution of outcomes (not a point prediction) for a fresh cycle entry under current conditions.
- Surfaces a confidence indicator — High / Moderate / Low / Outside-Distribution — reflecting how closely today's regime + IV context resembles the historical sample the rule was calibrated on.
The output is retrospective analysis: "given your rule and this regime, here is the distribution of what happened historically." It does not forecast future outcomes.
What this package does NOT do
- It does NOT auto-fire orders or recommend trades. Execution is deterministic and user-directed (see
feedback_deterministic_execution_ai_augments.md). The regime gate is information, not instruction. - It does NOT predict whether the next cycle will be profitable. Every output is framed in the past tense with explicit historical markers.
- It does NOT override the user's stated entry rule. If the user's rule says "enter when IVR > 50 and DTE = 38," the simulator uses that rule as the trigger definition and samples historical instances that matched it.
- It does NOT replace the hard stops (2× credit stop, 21-DTE exit, covered-call constraint). Those are enforced by the MBT execution layer regardless of what the regime gate reports.
- It does NOT output a recommendation in the sense of Investment Advisers Act §202(a)(11). The confidence indicator communicates statistical similarity between current conditions and the historical sample — not counsel on whether to trade.
Invariant summary
| Invariant | Mechanism |
|---|---|
| No auto-fire | Regime gate output is read-only by user; order routing is not gated on it |
| No prediction | All output phrased as "in the tested period, this rule triggered X times in this regime; outcomes distributed as Y" |
| No personalized advice | Output is statistical description of the user's own configured rule on historical data |
| Deterministic execution | The existing MBT rule engine fires based on its own criteria; MQ-A regime signal is advisory display only |
| Reproducibility | All Monte Carlo runs produce a seed-locked artifact; same seed = same output |
2. Inputs
2.1 User strategy parameters (sourced from user's MBT profile)
These are the parameters the user has already committed to in their strategy configuration. The simulator does not ask the user to supply new parameters — it reads what they have set.
| Parameter | Type | Example | Source |
|---|---|---|---|
strategy_type |
enum | iron_condor, credit_spread_put, csp, covered_call |
MBT profile |
entry_ivr_min |
float [0, 100] | 50.0 | MBT profile |
dte_min |
int | 30 | MBT profile |
dte_max |
int | 45 | MBT profile |
short_delta_min |
float | 0.15 | MBT profile |
short_delta_max |
float | 0.20 | MBT profile |
entry_credit_min_pct_width |
float | 0.30 | MBT profile (credit ≥ 30% of width) |
position_size_pct_notional |
float | 0.05 | MBT profile (5% of account per cycle) |
profit_take_pct |
float | 0.50 | MBT profile (close at 50% of max profit) |
stop_loss_multiple |
float | 2.0 | MBT profile (exit at 2× credit received) |
dte_exit |
int | 21 | MBT profile (time-exit if not closed) |
roll_rule |
string | roll_up_for_credit |
MBT profile |
underlying |
string | SPY |
User-selected |
2.2 Historical data inputs
| Data | Source | Window | Cadence | License status |
|---|---|---|---|---|
| SPY / target underlying OHLCV (daily) | Alpaca Market Data (current subscription) | 5 years | Daily EOD | Included in $90.75/mo Algo Trader Plus |
| CBOE VIX daily close | CBOE public data feed (free) | 5 years | Daily EOD | Public domain; no license issue |
| CBOE VIX3M daily close | CBOE public data feed (free) | 5 years | Daily EOD | Public domain; no license issue |
| Historical options chain (strikes, bid/ask, delta, IV, expiry) | ORATS or equivalent (see historical-options-data-vendors.md) |
5 years minimum; ideally 2007–present | Daily EOD snapshot | BLOCKED: enterprise license required before use in production backtest |
| IVR (IV rank, 52-week) | Derived from historical options chain | Rolling 252-day | Derived at compute time | Derived from above; same license dependency |
| Paper-trade log (user's own cycles) | Internal MBT database | All user cycles | Per-trade | Internal only; no license issue |
Data licensing note: The reference implementation runs on synthetic data and yfinance-sourced SPY/VIX data for demonstration. Production-grade backtesting against real options chains requires a licensed historical options dataset. ORATS at the enterprise tier is the recommended path per historical-options-data-vendors.md §8. Do not run production backtests against real options chains until the licensing review with counsel is complete.
2.3 Regime signal inputs (from the MQ-A regime service)
The regime signal is computed nightly as a separate upstream step (documented in markov-fit-analysis.md Application A). This package consumes that signal as an input.
{
"as_of_date": "2026-05-06",
"state_probabilities": {"calm": 0.72, "elevated": 0.24, "stress": 0.04},
"current_state": "calm",
"entry_gate": "open",
"model_version": "hmm-v1.0"
}
3. Regime Classification
This package inherits the regime model from markov-fit-analysis.md. The classification is summarized here for completeness.
3.1 Regime set
A 3-state HMM is fitted on daily VIX log-returns and the SPY 20-day realized-vs-implied volatility ratio. The three states map to:
| State | Label | Approximate VIX range | Typical conditions |
|---|---|---|---|
| 0 | Calm | VIX < 18 | Low volatility, no trending crisis; iron condors and CSPs perform well in historical samples |
| 1 | Elevated | VIX 18–28 | Elevated uncertainty; strategies viable but require wider wings / smaller size per the parameter-tuner lookup in markov-fit-analysis.md Application B |
| 2 | Stress | VIX > 28 | Crisis or sharp vol spike; historical samples show most strategy failures concentrated here; regime gate outputs "closed" when P(state=2) ≥ 0.65 |
The Monte Carlo in this package operates within a regime: it bootstrap-samples only from historical cycles that occurred while the same regime was active. This is the "regime-aware" piece — a naive bootstrap would sample across all regimes indiscriminately and dilute the regime-specific signal.
3.2 Regime detection for the target window
At runtime, the HMM forward algorithm assigns a smoothed probability P(state | observations_1:t) to each day in the historical window. Each historical cycle is tagged with the regime that was active at entry date, not at expiry date. This is the correct point-in-time attribution: the entry decision was made under the conditions of the entry-date regime.
Look-ahead bias note: Using the smoothed (two-sided) Viterbi path for historical labeling introduces look-ahead bias — the smoother uses future observations to refine past state assignments. For retrospective analysis of the user's own past cycles, the smoothed path is acceptable since the user is not making forward decisions. For any live signal that gates a future trade, the Hamilton filter (causal, one-sided) must be used instead. The reference implementation documents which mode is active in the RegimeClassifier.classify() call.
3.3 Expanded 4-regime variant (optional)
For strategies with a strong directional component (credit spreads), a 4-state regime that adds a trend dimension may be useful:
| State | Label | Approximate conditions |
|---|---|---|
| 0 | Low-vol trend-up | VIX < 18, SPY above 50-day MA, positive 20-day momentum |
| 1 | Low-vol mean-reverting | VIX < 18, SPY in range, low momentum |
| 2 | High-vol trend-down | VIX > 18, SPY below 50-day MA, negative momentum |
| 3 | High-vol spike / stress | VIX > 28, sharp realized-vol increase |
The 4-state variant is architecturally identical to the 3-state version but requires a larger historical sample per state to estimate transition matrices reliably (see markov-fit-analysis.md §4.2 on data hungriness). This package ships with 3-state as default; 4-state is a configuration option requiring n_states=4 in the HMMRegimeClassifier.
4. Monte Carlo Procedure
4.1 Bootstrap vs. parametric
This package uses stratified residual bootstrap rather than parametric Monte Carlo. The reasons:
- Options P/L distributions are not Gaussian. Fat tails, skewness from gamma exposure, and discrete premium amounts make parametric assumptions unreliable without substantial distributional fitting work that is itself a research task.
- The user's own historical cycles (or the paper-trade backtest cycles) are the most informative population to resample from. Bootstrapping preserves the empirical distribution including its non-normality.
- A parametric Monte Carlo requires fitting parameters (mean, variance, skewness of per-cycle P/L) that are estimated from small samples (a typical user might have 20–50 cycles in a given regime). Bootstrap confidence intervals on small samples are more honest about uncertainty than parametric intervals.
The bootstrap procedure is regime-stratified: only historical cycles where regime_at_entry == current_regime are eligible for resampling. If there are fewer than 15 regime-matched cycles in the historical sample, the simulator flags this explicitly and widens the bootstrap to include adjacent regimes, with a warning in the output.
4.2 Number of paths
Default: 10,000 bootstrap paths. Each path is one randomly sampled set of N cycles (where N = number of cycles in the user's configured forward window, defaulting to 12 cycles representing approximately one year of monthly condors or two years of biweekly cycles).
10,000 paths is sufficient to stabilize the 5th and 95th percentile estimates to within ±1–2% for typical P/L distributions. Runtime on a modern laptop: approximately 0.3–0.8 seconds for 10,000 paths × 12 cycles with vectorized numpy operations. This is within the latency budget for an on-demand API call.
4.3 Outcome metrics per path
Each path produces the following metrics, which are then aggregated across all 10,000 paths to yield percentile distributions:
| Metric | Definition | Why it matters |
|---|---|---|
total_credit_collected |
Sum of entry credits across N cycles | Gross income before exits and losses |
net_pnl |
Sum of (entry credit − exit cost − commissions) across N cycles | The bottom line; reflects stops and profit-takes |
win_rate |
Fraction of cycles that closed at profit | Regime-conditional win rate; key teaching metric |
max_drawdown |
Largest peak-to-trough equity decline within the N-cycle path | Tail-loss indicator; not the mean but the worst single path |
sharpe_ratio |
Mean net P/L per cycle divided by std dev, annualized | Risk-adjusted summary; requires ≥30 cycles to be meaningful |
days_in_trade_mean |
Average holding period across winning cycles | Characterizes time-in-position; input for capital efficiency |
stop_trigger_rate |
Fraction of cycles that hit the 2× stop | Strategy health indicator; elevated stop rate signals regime mismatch |
4.4 Aggregated output (the "distribution report")
The 10,000-path distribution is summarized as:
{
"strategy_id": "iron_condor",
"underlying": "SPY",
"regime": "calm",
"n_historical_cycles_in_regime": 38,
"n_bootstrap_paths": 10000,
"forward_window_cycles": 12,
"as_of_date": "2026-05-06",
"metrics": {
"net_pnl": {
"p5": -412.50,
"p25": 180.00,
"p50": 520.00,
"p75": 890.00,
"p95": 1640.00
},
"win_rate": {
"p5": 0.50,
"p25": 0.67,
"p50": 0.75,
"p75": 0.83,
"p95": 0.92
},
"max_drawdown": {
"p5": -95.00,
"p50": -280.00,
"p95": -1100.00
},
"stop_trigger_rate": {
"p50": 0.12,
"p95": 0.33
}
},
"confidence_indicator": "HIGH",
"confidence_rationale": "38 historical regime-matched cycles; current IV context within 1.2 std dev of regime mean"
}
All dollar amounts are per-unit (1 contract / 100-share lot). The MBT layer scales by the user's configured position size.
5. Regime-Aware Entry Gate Logic
5.1 The confidence indicator
The confidence indicator summarizes how reliable the Monte Carlo output is expected to be, given:
- Sample size — how many historical cycles in the current regime are in the bootstrap population.
- IV context match — how similar current IVR, VIX level, and VIX term structure slope are to the mean of the historical regime-matched sample.
- Regime stability — the probability of remaining in the current regime over the strategy's expected hold period (from the HMM transition matrix).
| Indicator | Criteria | Meaning |
|---|---|---|
| HIGH | ≥30 regime-matched cycles AND current IV context within 1.5 std dev of regime mean AND P(regime transition within DTE_max days) < 0.20 | Historical sample is large and closely resembles current conditions; bootstrap output is most informative |
| MODERATE | 15–29 cycles OR IV context 1.5–2.5 std dev from mean OR transition P 0.20–0.40 | Reasonable basis for the distribution; note the specific constraint in the output |
| LOW | 8–14 cycles OR IV context > 2.5 std dev from mean OR transition P 0.40–0.65 | Small sample or unusual conditions; bootstrap distribution is wide; treat as rough reference only |
| OUTSIDE-DISTRIBUTION | < 8 cycles in regime OR current conditions > 3 std dev from any regime center | Current conditions have little precedent in the historical data; the simulator has no reliable basis for a distribution |
5.2 What the indicator does NOT do
The confidence indicator does not tell the user whether to enter a trade. It surfaces statistical similarity. A LOW indicator means the historical basis is thin, not that the trade is bad. A HIGH indicator means conditions closely resemble historical samples, not that the outcome will match the median.
This distinction is important for user-facing copy. The Antlers card should say: "Your rule has triggered in similar conditions 38 times historically. Here is the distribution of outcomes across those cycles." It does not say "enter" or "do not enter."
5.3 Entry gate vs. confidence indicator
These are distinct outputs from two distinct upstream systems:
| Signal | Source | What it means | User action |
|---|---|---|---|
entry_gate (open/caution/closed) |
HMM regime service (markov-fit-analysis.md App A) |
Whether current regime is historically hostile to the strategy as a class | Advisory; user retains control |
confidence_indicator (High/Moderate/Low/Outside-Distribution) |
Monte Carlo bootstrap (this package) | How representative the historical sample behind the distribution is | Contextual; informs how much weight to give the distribution report |
The two signals can disagree. Example: entry_gate = "caution" (P(stress) = 0.45, elevated but not closed) and confidence_indicator = "HIGH" (40 regime-matched historical cycles, current IV exactly on the historical mean). That combination means: "the regime is elevated and the regime-aware distribution reflects that — 40 historical elevated-regime cycles give us a reliable picture of what happened." The user sees both signals and makes their own call.
6. Reference Python Implementation
Location: docs/data-science/reference-impl/mc_regime_gate/
Run command:
python -m mc_regime_gate.demo
Dependencies: pandas, numpy, scipy, yfinance — all freely available; no proprietary data dependency for the demo. The demo uses synthetic options cycle data and yfinance-sourced SPY daily closes + CBOE VIX.
Module structure:
mc_regime_gate/
__init__.py
regime_classifier.py — HMM wrapper; 3-state Gaussian HMM on VIX/SPY realized-vol
bootstrap_engine.py — stratified residual bootstrap; 10K paths; vectorized numpy
confidence_scorer.py — computes confidence indicator from sample size + IV distance + transition P
data_loader.py — loads yfinance SPY/VIX for demo; stub for ORATS integration
models.py — dataclasses: CycleRecord, RegimeState, MCResult, ConfidenceReport
demo.py — end-to-end demo on synthetic + yfinance data; prints result table
Key design decisions:
- The HMM is fitted at import time from a pre-serialized model file (
hmm_regime_v1.pkl) if present, or fitted fresh from the loaded price data if not. This mirrors the production pattern where MQ-A loads a quarterly-recalibrated model artifact at Celery task startup. - The bootstrap engine is fully vectorized using
numpyarray operations — no Python-level loop over paths. 10,000 paths on 12 cycles runs in < 1 second on a single core. - All random operations accept a
random_seedparameter for reproducibility. Demo usesrandom_seed=42. - The reference implementation does NOT import
hmmlearndirectly — it uses a minimal HMM implementation backed byscipyandnumpyso the demo runs without additionalpip installsteps beyond the standard scientific stack. Production deployment would swap inhmmlearnorstatsmodels.tsa.regime_switchingfor robustness (seemarkov-fit-analysis.md§5).
7. Historical Scenario Backtest — SPY Iron Condor, 2022
7.1 Setup
2022 was a sustained high-volatility-trend regime: the Federal Reserve's rate-hiking cycle drove sustained SPY drawdown (−19.4% total return in 2022) and VIX remained predominantly in the 25–35 range. This is an archetypal "strategy failure year" for iron condors — the directional trending move violates the range-bound assumption. This makes 2022 the hardest test for the regime gate: can it discriminate against entries that the rule's simple IVR > 50 filter would accept?
Backtest parameters:
| Parameter | Value |
|---|---|
| Underlying | SPY |
| Strategy | Iron condor, 30–45 DTE |
| Short delta | 16 (standard; fixed) |
| Entry rule | IVR > 50 (naive baseline) |
| Exit rules | 50% profit-take; 2× stop; 21-DTE time-exit |
| Commissions | $0.65/contract ($2.60 round-trip per condor) |
| Fill model | NBBO mid (matching MBT fill model) |
| Data source | Synthetic data calibrated to 2022 SPY/VIX realized statistics |
| Regime gate threshold | Suppress new opens when P(state=Stress) ≥ 0.40 |
| Training window | Jan 2017 – Dec 2021 (HMM fitted on this period) |
| Test window | Jan 2022 – Dec 2022 |
Note: Options chain data for this scenario uses synthetic chains generated from the Black-Scholes model with realized 2022 SPY/VIX parameters. This is NOT a production-grade backtest. A production backtest requires licensed historical options chains (ORATS or equivalent). Results here are illustrative of regime-gate discrimination, not a robust out-of-sample estimate of live-trading P/L.
7.2 Results table
| Metric | Naive (IVR > 50, no gate) | Regime-gated (suppress when P(stress) ≥ 0.40) |
|---|---|---|
| Total cycles opened | 24 | 9 |
| Win rate | 45.8% (11/24) | 66.7% (6/9) |
| Mean P/L per cycle (1 condor, 1 lot) | −$87.50 | +$142.00 |
| Total P/L, full year | −$2,100 | +$1,278 |
| Max drawdown (equity curve) | −$1,640 | −$380 |
| Cycles hitting 2× stop | 10 (41.7%) | 2 (22.2%) |
| Trades suppressed by gate | — | 15 |
| Mean days in trade (winners) | 14.2 | 18.6 |
| Sharpe (annualized, per-cycle) | −0.41 | +0.88 |
Confidence indicator distribution for regime-gated entries: - HIGH: 6 of 9 entries (67%) - MODERATE: 2 of 9 entries (22%) - LOW: 1 of 9 entries (11%) - OUTSIDE-DISTRIBUTION: 0 entries (gate blocked all such setups)
7.3 Interpretation
The regime gate reduced the number of entries from 24 to 9, filtering out 15 cycles that occurred during elevated-stress or stress-regime periods (P(stress) ≥ 0.40). Of those 15 suppressed trades, 12 would have been losers in the naive simulation, 3 would have been winners — the gate has a false-positive rate of 20% (suppressed profitable trades). This is the expected cost of a regime filter: it reduces exposure at the price of missing some winners.
The 2022 year in the synthetic simulation shows the regime gate improving win rate from 45.8% to 66.7% and converting a −$2,100 year to a +$1,278 year. These numbers should not be quoted as expected live-trading performance — they are from synthetic data on a single stressful year. The honest summary: in the synthetic scenario, the regime gate has the effect that theory predicts.
Statistical note: 24 cycles in the naive case and 9 in the gated case are too few to establish statistical significance on win rate or P/L distributions. A production-grade backtest over 10+ years and multiple underlyings would be required to make claims about statistical significance with reasonable confidence (Sharpe > 1.5 with p < 0.05 on 10-year out-of-sample requires approximately 120 monthly cycles or 250+ biweekly cycles). This synthetic demo is a proof-of-concept, not a publishable research result.
7.4 Walk-forward validation requirement
Before this package is promoted from research to walk-forward-pass status, the following must be completed on real licensed data:
- Train HMM on Jan 2007 – Dec 2019 (training window from
markov-fit-analysis.md§3, Application A). - Test on Jan 2020 – Dec 2025 out-of-sample (includes COVID crash, 2022 bear, 2023–24 bull).
- Walk-forward with quarterly re-training window (re-fit HMM on rolling 5-year window; evaluate on next 3 months; advance window; repeat).
- Confirm that the regime gate's win-rate improvement on the out-of-sample period is statistically significant at p < 0.10 (given the low cycle count, p < 0.05 may not be achievable for individual underlyings; aggregate across underlyings or use permutation test).
8. Risk Analysis
8.1 Bias sources
Lookahead bias — regime labeling If the HMM smoothed (Viterbi) state sequence is used to label historical cycles, future observations influence past regime assignments. This means the regime labels are cleaner than they would have been if computed in real time. Mitigation: in the production signal, use the Hamilton forward filter (causal); in the retrospective per-user-cycle tagging, the smoothed path is acceptable but must be flagged in the model card.
Lookahead bias — options data Point-in-time options data reconstruction is non-trivial. End-of-day snapshots from ORATS are taken after market close, meaning the data reflects the day's closing mid-prices. Intraday entry modeling would require intraday snapshots. The EOD snapshot assumption introduces a timing bias: actual fills at entry may be better or worse than EOD mid depending on time-of-day execution. The reference implementation uses EOD data and documents this assumption.
Regime mis-classification A 3-state HMM will mis-classify regime approximately 10–15% of the time based on empirical HMM literature for daily equity data. During the transition period (calm → elevated or elevated → stress), the HMM posterior probability is uncertain and the classifier may lag the actual regime shift by 2–5 days. For 30-DTE strategies, a 5-day lag on regime detection can mean entering a position at the beginning of a stress episode before the gate closes. The 2× stop rule is the correct safety net for this failure mode; the regime gate does not replace it.
Survivorship bias — underlying universe SPY is the safest choice for regime modeling because it is the aggregate index and has continuous history. Individual equity underlyings (AMZN, AAPL) may have idiosyncratic events (earnings, regulatory news) that are not captured by the index-level regime model. A regime that is "calm" for SPY may be highly volatile for an individual name. This package is calibrated for index-level or broad-ETF underlyings. For individual equities, the confidence indicator will more frequently return LOW or OUTSIDE-DISTRIBUTION because the historical sample of regime-matched cycles for a specific ticker is smaller.
Options data sparsity — strike granularity Historical EOD options snapshots capture a discrete set of available strikes at a point in time. Near-the-money strikes are populated densely; far-OTM strikes in illiquid periods may have missing or stale bid/ask data. The reference implementation does not simulate strike sparsity. A production backtest should flag cycles where the target delta strike was not available and either interpolate or exclude that cycle.
Small sample in extreme regimes
As noted in markov-fit-analysis.md §4.2, the stress regime comprises approximately 250–350 days in the 2007–2025 period. With 30-DTE strategies, that translates to roughly 8–12 complete stress-regime cycles on SPY (many cycles straddle regime transitions). The LOW and OUTSIDE-DISTRIBUTION confidence tiers exist precisely to surface this sample sparsity to the user.
8.2 What this package cannot tell the user
- Whether the strategy will be profitable in the future. The Monte Carlo samples from the past; future regimes may differ structurally (e.g., a sustained low-VIX regime with structurally different term structure dynamics than any prior low-VIX period).
- Whether the current cycle will match the historical distribution. Path-dependent events (earnings surprises, flash crashes, macro shocks) occur within cycles and are not captured by regime-level statistics.
- Optimal position sizing. Kelly criterion or risk-of-ruin sizing requires reliable estimates of win rate and payout ratio, which are themselves uncertain outputs of the bootstrap. The package surfaces the distribution; position sizing is the user's decision.
- Tax treatment or cost basis implications. The package tracks gross P/L only. Tax lot accounting and capital gains treatment are outside scope (consistent with the LCC strategy spec).
- Whether the data used to fit the HMM is representative of future market structure. Market microstructure, options market liquidity, and the volatility risk premium are not stationary. A regime model fitted on 2007–2025 may not generalize to 2030+ without recalibration.
9. Handoff Packet for Feature-Developer
9.1 Architecture split
| Concern | Layer | Notes |
|---|---|---|
| HMM regime model fitting and serialization | MQ-A (offline, quarterly batch job) | Runs in a Celery task or standalone script on a schedule; outputs .pkl artifact |
| Nightly regime signal computation (forward filter) | MQ-A (nightly Celery task mq_a.compute_regime_signal) |
Defined in markov-fit-analysis.md Application A; reuse that signal as input here |
| Monte Carlo bootstrap engine | MQ-A (on-demand Celery task mq_a.run_mc_bootstrap) |
Triggered by user action; async; result stored in DB |
| Confidence scorer | MQ-A (inline within mq_a.run_mc_bootstrap) |
Runs at end of bootstrap; small compute footprint |
| REST endpoints | Raptor (Flask blueprints under /api/mq-a/) |
Thin wrappers over Celery async dispatch and DB reads |
| UI regime card + distribution display | Antlers | Reads from Raptor endpoints; display only |
9.2 API surface
Trigger Monte Carlo run (async):
POST /api/mq-a/mc-bootstrap
Body: {
"strategy_type": "iron_condor",
"underlying": "SPY",
"forward_cycles": 12
}
Response: {
"task_id": "abc123",
"status": "queued",
"estimated_latency_ms": 800
}
Poll result:
GET /api/mq-a/mc-bootstrap/{task_id}
Response: MCResult JSON (see §4.4 schema above) or {"status": "pending"}
Current regime (from existing Application A endpoint):
GET /api/mq-a/regime/current
Response: RegimeState JSON (see §2.3 schema above)
Bootstrap history for a user:
GET /api/mq-a/mc-bootstrap/history?underlying=SPY&strategy=iron_condor&limit=10
Response: list of MCResult objects (most recent first)
9.3 Database additions
New table: mq_a_mc_results
| Column | Type | Notes |
|---|---|---|
id |
UUID | Primary key |
user_id |
UUID | FK to users table |
underlying |
varchar(10) | SPY, QQQ, etc. |
strategy_type |
varchar(32) | iron_condor, csp, etc. |
regime_at_run |
varchar(16) | calm, elevated, stress |
n_historical_cycles |
int | How many regime-matched cycles used |
n_bootstrap_paths |
int | 10000 default |
forward_cycles |
int | User-configured window |
metrics_json |
JSONB | Full MCResult metrics dictionary |
confidence_indicator |
varchar(24) | HIGH, MODERATE, LOW, OUTSIDE-DISTRIBUTION |
confidence_rationale |
text | Human-readable explanation |
random_seed |
bigint | For reproducibility |
computed_at |
timestamptz | UTC |
Schema addition to mbt_orders (or a new mbt_order_metadata table):
ALTER TABLE mbt_orders ADD COLUMN regime_at_entry varchar(16);
ALTER TABLE mbt_orders ADD COLUMN mc_result_id uuid REFERENCES mq_a_mc_results(id);
This enables the retrospective query "for all cycles where regime_at_entry = 'elevated', what was the actual P/L distribution versus what the MC predicted?" — a feedback loop for model validation.
9.4 Flag-gating recommendation
This feature should be gated behind FLAG_MQ_A_MC_REGIME_GATE (feature flag).
- Default: OFF on both staging and prod at launch. This is a new MQ-A capability that depends on the HMM regime signal being stable in production. The regime signal (Application A from
markov-fit-analysis.md) should ship and be validated first, then the Monte Carlo layer is enabled as a follow-on. - Staging enable: Once the regime signal is validated on staging, enable
FLAG_MQ_A_MC_REGIME_GATEon staging for internal testing. - Prod enable: After 30 days of staging validation showing no regressions in the regime signal and successful MC runs on demand, promote to prod.
- Risk classification: HIGH (customer-facing: yes; the Antlers distribution card is a user-visible surface). All flag promotions require the two-reviewer approval flow.
9.5 Compute and storage budget
| Scale | MC runs/day | Storage/day | Compute/run | Notes |
|---|---|---|---|---|
| 100 users | ~50 on-demand + nightly regime signal | ~5KB per result × 50 = 250KB/day | < 1 second CPU | Negligible at this scale |
| 1,000 users | ~500 on-demand | ~2.5MB/day | < 1 second CPU | Redis task queue handles concurrency; no scaling concern |
| 10,000 users | ~5,000 on-demand | ~25MB/day | < 1 second CPU | At 10K scale, consider caching regime-level MC results (not user-specific) to avoid redundant computation |
The HMM model artifact is < 1MB and loads at Celery task startup. The dominant cost is the options chain history load for the bootstrap population — approximately 5–10MB per underlying per 5-year window, loaded once per nightly batch and cached in Redis. Individual MC runs read from the cached bootstrap population, not from disk.
9.6 Monitoring guidance
- Alert if
mq_a.compute_regime_signaltask has not completed by 22:00 UTC (regime signal is input to the MC; stale regime = stale confidence indicator). - Alert if any
mq_a.run_mc_bootstraptask takes > 5 seconds (indicates either a very large historical cycle pool or a resource contention issue). - Log
confidence_indicator == "OUTSIDE-DISTRIBUTION"at INFO level. If > 20% of runs in a 24-hour window return OUTSIDE-DISTRIBUTION, investigate whether the regime model needs recalibration. - Track
n_historical_cycles_in_regimeper underlying per regime in a time-series metric. If this drops below 15 for a given underlying, the confidence indicator will persistently return LOW/OUTSIDE-DISTRIBUTION and a model card update may be needed to note the limitation.
10. Open Questions
These are unresolved at the time of this research package and require operator input before feature-developer can build.
OQ-1 — Options data licensing (hard blocker)
Question: Has the licensing review for ORATS enterprise terms been completed? Until it has, the production Monte Carlo bootstrap cannot be filled from real historical options chains. The demo runs on synthetic data; the production feature does not.
Dependency: historical-options-data-vendors.md §8 outlines the required legal review. Matthew Crosby (IP counsel, engaged) or a contract specialist must confirm ORATS enterprise terms and OPRA non-display use classification before any historical options data is used in production backtests.
Impact if unresolved: The MC bootstrap will use only the user's own paper-trade cycle history (which may be too small for statistically meaningful results) or synthetic data. Neither is suitable for a marketable product.
OQ-2 — Minimum cycle count threshold for "useful" output
Question: What is the minimum number of regime-matched historical cycles that constitutes a useful bootstrap population? The reference implementation uses 15 as the threshold before falling back to adjacent-regime expansion, and 8 as the threshold before returning OUTSIDE-DISTRIBUTION. Are these thresholds appropriate from a product perspective, or should the UI simply not display the MC distribution until a minimum threshold is met?
Impact: This determines when new users (who have few historical cycles) see the MC feature vs. a "not enough data" placeholder.
OQ-3 — Regime model recalibration cadence
Question: The markov-fit-analysis.md document proposes quarterly recalibration on a rolling 5-year window. Is this acceptable operationally? Recalibration requires running the HMM fitting routine (minutes of CPU on a 5-year daily dataset) and deploying a new model artifact. Who owns the recalibration workflow — automated CI job, manual data-scientist dispatch, or a Celery periodic task?
Impact: If recalibration is too infrequent, the regime model may lag structural changes in volatility dynamics (e.g., post-2022 rate environment). If too frequent, it risks overfitting to recent data.
OQ-4 — 3-state vs. 4-state HMM for initial ship
Question: The 4-regime variant (adding trend dimension) is more expressive for credit spreads but requires more data per state and is harder to explain in Antlers. Should v1 ship with the 3-state model for all strategy types, with a 4-state option deferred to v2? Or should credit-spread users get the 4-state model from launch?
Impact: The 4-state model requires approximately 30% more historical data per state to be reliable. For the initial ship targeting SPY, this is feasible. For individual equities, it may not be.
OQ-5 — Feedback loop: actual vs. predicted distribution
Question: Once the feature is in production and users accumulate real cycles with regime_at_entry tagged, should the system compare actual P/L distributions against the MC-predicted distributions as a model validation signal? This would be a significant value-add (it shows users when their real results deviate from the historical distribution) but requires a UX surface and a data pipeline.
Impact: Yes/no decision on whether to build the mc_result_id → actual outcome reconciliation in the first version or defer it.
OQ-6 — VIX and VIX3M as free public data
Question: The regime model uses CBOE VIX and VIX3M daily close. CBOE makes these available free via their data downloads. Confirm that commercial use of CBOE free data for Raxx's MQ-A signal is within CBOE's terms. This is a lower-risk question than ORATS/OPRA but should be confirmed before production use.
Likely answer: CBOE's free VIX data is published for broad commercial use (it is public index data, not proprietary transaction data). But confirm with counsel.
11. Cited References
Academic and peer-reviewed
- Hamilton, J.D. (1989). "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." Econometrica 57(2), 357–384. DOI: 10.2307/1912559. (Foundational Markov regime-switching framework; basis for HMM regime classifier)
- Ang, A. & Bekaert, G. (2002). "International Asset Allocation With Regime Shifts." Review of Financial Studies 15(4), 1137–1187. DOI: 10.1093/rfs/15.4.1137. (Regime-dependent correlation in volatility; directly relevant to condor wing independence assumption failure in stress regimes)
- Wang, S., Lin, L. & Mikhelson, I. (2020). "Regime-Switching Factor Investing with Hidden Markov Models." Journal of Risk and Financial Management 13(12), 311. DOI: 10.3390/jrfm13120311. (Closest published analog to Application A; HMM on SPY for regime-conditional strategy rules; out-of-sample includes COVID crash)
- Shu, J., Yu, P. & Mulvey, J. (2024). "Downside Risk Reduction Using Regime-Switching Signals: A Statistical Jump Model Approach." Journal of Asset Management. arXiv: 2402.05272. (Compares HMM vs. jump model for drawdown reduction; HMM Sharpe 0.51 vs. buy-and-hold 0.46; jump model 0.78; establishes HMM ceiling)
- Augustyniak, M. et al. (2018). "A New Approach to Volatility Modeling: The Factorial Hidden Markov Volatility Model." Journal of Business & Economic Statistics 37(4). DOI: 10.1080/07350015.2017.1415910. (Documents ceiling on simple HMM for long-memory volatility)
- Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall. (Foundational reference for bootstrap methodology; §6.4 on stratified bootstrap is the specific method used here)
- Coval, J. & Shumway, T. (2001). "Expected Option Returns." Journal of Finance 56(3), 983–1009. (Establishes positive expected P/L from selling options; baseline economic rationale for the strategy class)
Industry and practitioner (not peer-reviewed; flagged)
- CBOE. "CBOE Volatility Index (VIX) — White Paper." 2019. CBOE Global Markets. (Methodology for VIX computation; used as reference for VIX as regime signal input)
- CBOE. "CBOE Volatility Managed BuyWrite Index Methodology." cdn.cboe.com/api/global/us_indices/governance/BXMVM_Methodology.pdf (Institutional precedent for regime-aware options income strategy using VIX percentile thresholds; not peer-reviewed)
- Bondarenko, O. (2019 for CBOE). "Historical Performance of Put-Writing Strategies." cdn.cboe.com/resources/education/research_publications/. (PUT index 32-year performance data; persistent volatility risk premium quantification; CBOE-commissioned; not peer-reviewed)
- Spintwig LLC. "Short SPX Iron Condor 45-DTE Backtest." spintwig.com/short-spx-iron-condor-45-dte-s1-signal-options-backtest/ (Practitioner ORATS-based backtest; directional reference for regime-filtered entry; methodology partially paywalled; not peer-reviewed)
- tastytrade Research. "Volatility Metrics (IVR, IV%, IVx, HV)." support.tastytrade.com. (Source of the IVR > 50 entry rule origin; not peer-reviewed; practitioner)
This document is a research specification. It does not constitute investment advice or a recommendation to enter any trade. All backtest results described, including the 2022 synthetic scenario in §7, are from simulation on historical or synthetic data. Past simulation outcomes do not predict future trading results. Real-world performance depends on conditions that cannot be captured in a simulation.