The existing calculate_metrics() function computes annualized_return in decimal form internally ((1 + total_return) ** (1/years) - 1) and then multiplies by 100 at return time.
If the developer passes the already-scaled percentage form to calculate_risk_metrics(), the Calmar ratio will be 100x too large (e.g., Calmar = 1500 instead of 15).
Mitigation: the reference implementation comments this clearly; the test suite must include a sanity-check assertion (calmar <= 100 for any realistic strategy).
If the guard is implemented but the condition check has an off-by-one (< 20 vs <= 20), a 20-day run returns metrics computed on exactly 19 return observations (since daily_returns has one fewer element than equity_curve). The boundary test (test_insufficient_data_boundary) catches this.
On a very benign synthetic dataset where no return falls at or below the 5th percentile (can happen with tiny arrays), tail_95 is empty and CVaR returns None while VaR returned a value. This is correct behavior — the frontend must handle cvar_95 = null while var_95 is non-null without breaking.
running_peak == equity[i] for all i when the curve never declines — dd_pct is all zeros, and Ulcer = 0.0. This is correct. The zero-drawdown case for Calmar also needs separate handling (returns null). Make sure Ulcer = 0.0 is not incorrectly treated as null in the frontend.
run_strategy_comparison() builds per-strategy strategy_metrics dicts and a metrics_list summary. The metrics_list is a hardcoded field selection (lines 949–958). The new fields will be in strategy_metrics (because it's the full dict from calculate_metrics()) but they will NOT appear in metrics_list unless the developer explicitly adds them.
Whether metrics_list needs updating depends on how the comparison summary table is rendered. Check BacktestingResults.js for which fields it iterates.
np.percentile with 20 observations and linear interpolation will give a precise result, but the 1st percentile (for VaR_99) sits between the 1st and 2nd smallest return on a 20-point dataset. The result is technically valid but the statistical confidence interval is wide. The "insufficient data" copy in the UI should use language like "calculated from N trading days" so users understand the sample-size limitation — not "N/A" which implies a bug.
total_return, sharpe_ratio, max_drawdown, win_rate, profit_factor, etc.) are untouched.**risk_metrics spread — no existing key is overwritten./api/backtest/export) uses the equity curve, not the metrics dict — no change needed there._normalize_equity_curve_rows() is unaffected.These metrics are computed from the equity curve, which is itself derived from whatever price data the user's backtest covers. If the backtest period includes a market crash (2020 COVID, 2022 rate shock), the VaR/CVaR will reflect those events. If it only covers a bull market, VaR will look low. This is correct and expected — all metrics are labeled as historical/retrospective.
No special regime-detection logic is needed in v1.