Issue #84 — Risk Metrics Expansion: Sortino, Calmar, VaR, CVaR, Ulcer Index, Beta

Status: Research complete — ready for feature-developer pickup
Dispatched: 2026-05-14 UTC
Target: backend_v2/api/routes/backtest.py → calculate_metrics() function

Problem Statement

The existing calculate_metrics() function returns Sharpe ratio, max drawdown, win rate, and profit factor. For options/conservative strategies (covered calls, cash-secured puts, iron condors), these four metrics are insufficient:

Sharpe penalizes upside volatility equally with downside — options strategies with positive skew get incorrectly penalized.
Max drawdown alone tells you the depth of the worst trough but not how long the account stayed underwater or how frequently drawdowns occurred.
No tail-loss quantification — a strategy can show a modest max drawdown but have fat tails that destroy capital in crisis conditions.

The six metrics requested in #84 fix each of these gaps.

Metric Definitions and Formulas

All metrics operate on the daily equity curve returned by _build_equity_curve(). This is already computed before calculate_metrics() is called, so no new data fetches are needed.

1. Sortino Ratio

Penalizes only downside deviation (negative returns), ignoring positive volatility.

downside_returns = [r for r in daily_returns if r < 0]
downside_deviation = sqrt( mean(r^2 for r in downside_returns) )   # semi-deviation, target = 0
sortino = (mean_daily_return / downside_deviation) * sqrt(252)

Edge case: if downside_returns is empty (no losing days), Sortino = None — display as "N/A", not 0 or infinity.
Minimum data: require >= 20 daily return observations; otherwise return None.

2. Calmar Ratio

Annualized return per unit of max drawdown.

calmar = annualized_return_decimal / max_drawdown_decimal

Where both are in decimal form (not percentage). If max_drawdown == 0, return None.
A Calmar of 1.0 means the annualized return equals the worst peak-to-trough loss — often considered the minimum acceptable for a strategy with significant drawdown risk.

3. VaR — Value at Risk (Historical, 95% and 99%)

The loss threshold such that X% of days had losses at or below this level. Historical method only for v1.

sorted_returns = sorted(daily_returns)      # ascending (most negative first)
var_95 = abs(percentile(sorted_returns, 5))  # 5th percentile (worst 5%)
var_99 = abs(percentile(sorted_returns, 1))  # 1st percentile (worst 1%)

VaR is returned as a positive percentage of portfolio value (e.g., var_95 = 2.1 means "on a typical bad day in the worst 5%, the portfolio lost 2.1% of its value").

NumPy: np.percentile(daily_returns, 5) and np.percentile(daily_returns, 1).

Minimum data: 20 observations; otherwise None.

4. CVaR / Expected Shortfall (95%, 99%)

The average loss in the worst X% tail — more useful than VaR for options strategies where tail events cluster.

cvar_95 = abs(mean(r for r in daily_returns if r <= percentile(daily_returns, 5)))
cvar_99 = abs(mean(r for r in daily_returns if r <= percentile(daily_returns, 1)))

CVaR >= VaR always holds. If a strategy has CVaR_99 much larger than VaR_99, the tail events are severe and clustered — a red flag for levered positions.

Minimum data: 20 observations; cvar_95 requires at least 1 observation below the 5th-percentile threshold.

5. Ulcer Index

Measures the depth and duration of drawdowns — not just the peak trough. Useful for strategies that slowly grind lower rather than crash suddenly.

# For each bar i in the equity curve:
drawdown_pct[i] = max(0, (peak_equity_up_to_i - equity[i]) / peak_equity_up_to_i) * 100

ulcer_index = sqrt( mean(drawdown_pct[i]^2  for all i) )

Lower is better. A value of 5 means the account spent its drawdown time roughly 5% below peak on average, weighted by squared depth (amplifies prolonged deep drawdowns).

Reference: Peter G. Martin & Byron B. McCann, "The Investor's Guide to Fidelity Funds" (1989). Implementation is deterministic given the equity curve.

6. Beta to SPY (optional, when benchmark provided)

Sensitivity of strategy daily returns to SPY daily returns.

cov  = covariance(strategy_returns, spy_returns)
var  = variance(spy_returns)
beta = cov / var

This is only computed when a benchmark_returns array is passed alongside the equity curve. For v1, this is pass-through null — the caller can optionally supply benchmark_curve in the backtest result; if absent, beta = null. Feature-developer does not need to wire up SPY data fetching for this card.

Minimum-Data Guard

All six new metrics share a common guard:

MIN_OBSERVATIONS = 20

if len(daily_returns) < MIN_OBSERVATIONS:
    return {
        "sortino_ratio": None,
        "calmar_ratio": None,
        "var_95": None,
        "var_99": None,
        "cvar_95": None,
        "cvar_99": None,
        "ulcer_index": None,
        "beta": None,
        "insufficient_data": True,
        "observations": len(daily_returns),
    }

The existing metrics (Sharpe, win rate, etc.) already handle < 1 day edge cases through their own guards and return 0. Keep that behavior unchanged — this guard only applies to the new metrics block.

Data Sources and Assumptions

Input: equity_curve list from _build_equity_curve() — already in the call chain, no new fetch needed.
Daily returns: derived from consecutive equity values as (e[i] - e[i-1]) / e[i-1] — same as current Sharpe calculation.
No external data feed required for Sortino, Calmar, VaR, CVaR, Ulcer Index.
Beta to SPY: requires SPY bars for the same date range — not wired in v1. Return null.
Annualized return for Calmar: reuse the already-computed annualized_return value — do not recompute.

Display / Null Safety

The issue specifies: "show 'insufficient data' not a bad number" when fewer than 20 data points.

Backend: return None (JSON null) for each metric.
Frontend: safeToFixed(value, 2) already renders -- for null/undefined — this works for all numeric fields.
Add a distinct row or badge in the Summary tab when insufficient_data: true — e.g., an info alert: "Some risk metrics require at least 20 trading days of data."
VaR and CVaR are percentages of portfolio value — display as X.XX%, not as dollar amounts.

Notes on Options Strategy Context

These metrics were specifically requested for options/conservative strategies. Brief context on why each matters:

Sortino is standard for credit-spread and covered-call strategies, which are designed to have limited upside but frequent small wins — Sharpe unfairly penalizes that asymmetry.
Calmar is the go-to ratio for managed futures and income strategies; a Calmar < 0.5 on an income strategy suggests the drawdowns are larger than warranted by the return.
VaR/CVaR on an equity-curve basis (not position-level) is a retrospective description of how bad the equity path got on the worst days of the user's own historical run. This is retrospective description, not prediction.
Ulcer Index is particularly diagnostic for "death by a thousand cuts" — a covered-call strategy that slowly erodes in a strong trending market shows a high Ulcer Index even if the final loss is modest.

All metrics are computed solely on the user's own historical backtest equity curve. They describe what happened on the user's run, not what will happen in the future.