Raxx · internal docs

internal · gated ↑ index

Risk Analysis — Issue #84 Metrics Implementation

What can go wrong silently

1. Annualized-return decimal vs percentage confusion (HIGH probability of bug)

The existing calculate_metrics() function computes annualized_return in decimal form internally ((1 + total_return) ** (1/years) - 1) and then multiplies by 100 at return time.

If the developer passes the already-scaled percentage form to calculate_risk_metrics(), the Calmar ratio will be 100x too large (e.g., Calmar = 1500 instead of 15).

Mitigation: the reference implementation comments this clearly; the test suite must include a sanity-check assertion (calmar <= 100 for any realistic strategy).

2. Short backtests (< 20 days) silently returning bad numbers without the guard

If the guard is implemented but the condition check has an off-by-one (< 20 vs <= 20), a 20-day run returns metrics computed on exactly 19 return observations (since daily_returns has one fewer element than equity_curve). The boundary test (test_insufficient_data_boundary) catches this.

3. CVaR denominator empty when all returns are above the VaR threshold

On a very benign synthetic dataset where no return falls at or below the 5th percentile (can happen with tiny arrays), tail_95 is empty and CVaR returns None while VaR returned a value. This is correct behavior — the frontend must handle cvar_95 = null while var_95 is non-null without breaking.

4. Ulcer Index on a flat or all-positive equity curve

running_peak == equity[i] for all i when the curve never declines — dd_pct is all zeros, and Ulcer = 0.0. This is correct. The zero-drawdown case for Calmar also needs separate handling (returns null). Make sure Ulcer = 0.0 is not incorrectly treated as null in the frontend.

5. compare_strategies response propagation

run_strategy_comparison() builds per-strategy strategy_metrics dicts and a metrics_list summary. The metrics_list is a hardcoded field selection (lines 949–958). The new fields will be in strategy_metrics (because it's the full dict from calculate_metrics()) but they will NOT appear in metrics_list unless the developer explicitly adds them.

Whether metrics_list needs updating depends on how the comparison summary table is rendered. Check BacktestingResults.js for which fields it iterates.

6. Floating-point precision on VaR/CVaR for very small datasets near the minimum boundary

np.percentile with 20 observations and linear interpolation will give a precise result, but the 1st percentile (for VaR_99) sits between the 1st and 2nd smallest return on a 20-point dataset. The result is technically valid but the statistical confidence interval is wide. The "insufficient data" copy in the UI should use language like "calculated from N trading days" so users understand the sample-size limitation — not "N/A" which implies a bug.

What does NOT break existing behavior

Regime / stress cases

These metrics are computed from the equity curve, which is itself derived from whatever price data the user's backtest covers. If the backtest period includes a market crash (2020 COVID, 2022 rate shock), the VaR/CVaR will reflect those events. If it only covers a bull market, VaR will look low. This is correct and expected — all metrics are labeled as historical/retrospective.

No special regime-detection logic is needed in v1.