Raxx · internal docs

internal · gated

Issue #84 — Risk Metrics Expansion: Sortino, Calmar, VaR, CVaR, Ulcer Index, Beta

Status: Research complete — ready for feature-developer pickup
Dispatched: 2026-05-14 UTC
Target: backend_v2/api/routes/backtest.pycalculate_metrics() function


Problem Statement

The existing calculate_metrics() function returns Sharpe ratio, max drawdown, win rate, and profit factor. For options/conservative strategies (covered calls, cash-secured puts, iron condors), these four metrics are insufficient:

The six metrics requested in #84 fix each of these gaps.


Metric Definitions and Formulas

All metrics operate on the daily equity curve returned by _build_equity_curve(). This is already computed before calculate_metrics() is called, so no new data fetches are needed.

1. Sortino Ratio

Penalizes only downside deviation (negative returns), ignoring positive volatility.

downside_returns = [r for r in daily_returns if r < 0]
downside_deviation = sqrt( mean(r^2 for r in downside_returns) )   # semi-deviation, target = 0
sortino = (mean_daily_return / downside_deviation) * sqrt(252)

Edge case: if downside_returns is empty (no losing days), Sortino = None — display as "N/A", not 0 or infinity.
Minimum data: require >= 20 daily return observations; otherwise return None.

2. Calmar Ratio

Annualized return per unit of max drawdown.

calmar = annualized_return_decimal / max_drawdown_decimal

Where both are in decimal form (not percentage). If max_drawdown == 0, return None.
A Calmar of 1.0 means the annualized return equals the worst peak-to-trough loss — often considered the minimum acceptable for a strategy with significant drawdown risk.

3. VaR — Value at Risk (Historical, 95% and 99%)

The loss threshold such that X% of days had losses at or below this level. Historical method only for v1.

sorted_returns = sorted(daily_returns)      # ascending (most negative first)
var_95 = abs(percentile(sorted_returns, 5))  # 5th percentile (worst 5%)
var_99 = abs(percentile(sorted_returns, 1))  # 1st percentile (worst 1%)

VaR is returned as a positive percentage of portfolio value (e.g., var_95 = 2.1 means "on a typical bad day in the worst 5%, the portfolio lost 2.1% of its value").

NumPy: np.percentile(daily_returns, 5) and np.percentile(daily_returns, 1).

Minimum data: 20 observations; otherwise None.

4. CVaR / Expected Shortfall (95%, 99%)

The average loss in the worst X% tail — more useful than VaR for options strategies where tail events cluster.

cvar_95 = abs(mean(r for r in daily_returns if r <= percentile(daily_returns, 5)))
cvar_99 = abs(mean(r for r in daily_returns if r <= percentile(daily_returns, 1)))

CVaR >= VaR always holds. If a strategy has CVaR_99 much larger than VaR_99, the tail events are severe and clustered — a red flag for levered positions.

Minimum data: 20 observations; cvar_95 requires at least 1 observation below the 5th-percentile threshold.

5. Ulcer Index

Measures the depth and duration of drawdowns — not just the peak trough. Useful for strategies that slowly grind lower rather than crash suddenly.

# For each bar i in the equity curve:
drawdown_pct[i] = max(0, (peak_equity_up_to_i - equity[i]) / peak_equity_up_to_i) * 100

ulcer_index = sqrt( mean(drawdown_pct[i]^2  for all i) )

Lower is better. A value of 5 means the account spent its drawdown time roughly 5% below peak on average, weighted by squared depth (amplifies prolonged deep drawdowns).

Reference: Peter G. Martin & Byron B. McCann, "The Investor's Guide to Fidelity Funds" (1989). Implementation is deterministic given the equity curve.

6. Beta to SPY (optional, when benchmark provided)

Sensitivity of strategy daily returns to SPY daily returns.

cov  = covariance(strategy_returns, spy_returns)
var  = variance(spy_returns)
beta = cov / var

This is only computed when a benchmark_returns array is passed alongside the equity curve. For v1, this is pass-through null — the caller can optionally supply benchmark_curve in the backtest result; if absent, beta = null. Feature-developer does not need to wire up SPY data fetching for this card.


Minimum-Data Guard

All six new metrics share a common guard:

MIN_OBSERVATIONS = 20

if len(daily_returns) < MIN_OBSERVATIONS:
    return {
        "sortino_ratio": None,
        "calmar_ratio": None,
        "var_95": None,
        "var_99": None,
        "cvar_95": None,
        "cvar_99": None,
        "ulcer_index": None,
        "beta": None,
        "insufficient_data": True,
        "observations": len(daily_returns),
    }

The existing metrics (Sharpe, win rate, etc.) already handle < 1 day edge cases through their own guards and return 0. Keep that behavior unchanged — this guard only applies to the new metrics block.


Data Sources and Assumptions


Display / Null Safety

The issue specifies: "show 'insufficient data' not a bad number" when fewer than 20 data points.


Notes on Options Strategy Context

These metrics were specifically requested for options/conservative strategies. Brief context on why each matters:

All metrics are computed solely on the user's own historical backtest equity curve. They describe what happened on the user's run, not what will happen in the future.