Handoff Packet — Issue #84

Feature-developer: read this first

What to build

Add six new fields to the calculate_metrics() function in backend_v2/api/routes/backtest.py and surface them in the Summary tab of BacktestResults.js.

No new data fetches. No new DB tables. No new API endpoints. This is a pure metrics extension.

Backend — exact file to touch

File: backend_v2/api/routes/backtest.py
Function: calculate_metrics(initial_capital, equity_curve, trades) — lines 789–874

Step 1: Add `calculate_risk_metrics()` as a standalone helper

Copy the full calculate_risk_metrics() function from docs/research/issue-84/reference-implementation.py into backtest.py. Place it directly above calculate_metrics().

Import nothing new — numpy and math are already imported.

Step 2: Wire it into `calculate_metrics()`

Inside calculate_metrics(), the existing code already computes:

daily_returns (list of floats) — lines 829–833
annualized_return (float, decimal form before the * 100) — line 816
max_drawdown (float, decimal form before the * 100) — line 819–824
equity_values (list of floats) — line 809

Add this block before the final return statement (after line 853, before line 855):

    # issue-84: extended risk metrics
    risk_metrics = calculate_risk_metrics(
        daily_returns=daily_returns,
        annualized_return=annualized_return,   # decimal, pre-percentage
        max_drawdown=max_drawdown,             # decimal, pre-percentage
        equity_values=equity_values,
        benchmark_returns=None,
    )

Then merge into the return dict:

    return {
        "total_return": round(total_return * 100, 2),
        "annualized_return": round(annualized_return * 100, 2),
        "max_drawdown": round(max_drawdown * 100, 2),
        "sharpe_ratio": round(sharpe_ratio, 2),
        ...existing fields...,
        # issue-84 additions:
        **risk_metrics,
    }

Critical: annualized_return and max_drawdown are in decimal form inside calculate_metrics() before being multiplied by 100 at return time. Pass the decimal values to calculate_risk_metrics(). The helper uses them in decimal form internally.

New fields added to the response payload

Field	Type	Notes
`sortino_ratio`	float or null	null when < 20 obs or no losing days
`calmar_ratio`	float or null	null when max_drawdown == 0 or < 20 obs
`var_95`	float or null	positive %, e.g. 2.1
`var_99`	float or null	positive %
`cvar_95`	float or null	positive %, always >= var_95
`cvar_99`	float or null	positive %, always >= var_99
`ulcer_index`	float or null	lower is better
`beta`	null	always null in v1
`insufficient_data`	bool	true when obs < 20
`observations`	int	count of daily returns used

These fields are also propagated in run_strategy_comparison() via the strategy_metrics dict inside the loop (line 939) — no change needed there because it already uses {**strategy_metrics} in the per-strategy result. Verify this after your change.

Frontend — exact file to touch

File: frontend/trademaster_ui/src/components/BacktestResults.js
Tab: Summary → "Performance Metrics" card (lines 354–391)

Changes needed

Add a "Risk Metrics" card as a third card below "Performance Metrics" in the Summary tab. Suggested placement: after the existing <Col md={6}> Performance Metrics block, add a new <Row> with a full-width card.
Fields to display:

Label	Key	Format
Sortino Ratio	`results.sortino_ratio`	`safeToFixed(value, 2)`
Calmar Ratio	`results.calmar_ratio`	`safeToFixed(value, 2)`
VaR (95%)	`results.var_95`	`safeToFixed(value, 2) + '%'`
VaR (99%)	`results.var_99`	`safeToFixed(value, 2) + '%'`
CVaR / Exp. Shortfall (95%)	`results.cvar_95`	`safeToFixed(value, 2) + '%'`
CVaR / Exp. Shortfall (99%)	`results.cvar_99`	`safeToFixed(value, 2) + '%'`
Ulcer Index	`results.ulcer_index`	`safeToFixed(value, 2)`

Insufficient-data notice: When results.insufficient_data === true, render a Bootstrap <Alert variant="info"> above the Risk Metrics card:

{results.insufficient_data && (
  <Alert variant="info">
    Risk metrics require at least 20 trading days of data
    ({results.observations ?? 0} days in this run).
  </Alert>
)}

Obfuscate mode: VaR and CVaR are percentage fields, not dollar amounts — they are not passed through formatMoney(). They describe percentage loss, not absolute dollar loss. Render them as plain percentage strings regardless of obfuscate mode.
safeToFixed already handles null/undefined by returning '--' — no additional null guard needed on the display side.
The comparison view (BacktestingResults.js) and the metrics table in compare_strategies response also propagate these fields via the metrics array. That component will display them automatically if it uses strategy_metrics from the API — check whether it has its own hardcoded field list and update accordingly.

What MUST stay retrospective

All six metrics are computed from the user's own historical equity curve. The UI copy must not frame them predictively. Suggested labels:

"VaR (95%)" — NOT "projected loss"
"Worst-day loss at 95% (historical)" — acceptable
The tooltip or footnote on the Risk Metrics card should read: "Calculated from your historical backtest run. Reflects past performance of this strategy on this data."

Feature flag

No feature flag needed. These are additive metrics on an existing endpoint. They do not change any existing field values.

Tests to write

Backend (`backend_v2/tests/`)

Unit test for calculate_risk_metrics() — create tests/test_backtest_risk_metrics.py: - test_sortino_happy_path — 30 synthetic returns with some negatives; assert sortino_ratio is float and > 0. - test_sortino_no_downside_returns — all-positive returns; assert sortino_ratio is None. - test_calmar_zero_drawdown — flat equity curve; assert calmar_ratio is None. - test_var_cvar_invariant — assert cvar_95 >= var_95 and cvar_99 >= var_99 on 50 random samples. - test_ulcer_index_flat_curve — constant equity; assert ulcer_index == 0.0. - test_ulcer_index_monotone_decline — steadily declining equity; assert ulcer_index > 0. - test_insufficient_data_guard — pass 5 returns; assert insufficient_data is True and all metric fields are None. - test_insufficient_data_boundary — pass exactly 20 returns; assert insufficient_data is False.
Integration test on /api/backtest response — extend backtest_export_api_tests.py or add a new integration test: - Mock get_market_data_service() to return 30+ synthetic bars. - POST to /api/backtest and assert response contains sortino_ratio, calmar_ratio, var_95, var_99, cvar_95, cvar_99, ulcer_index keys. - Assert all values are either float or null (not missing entirely). - Assert beta is always null in v1.

Frontend (`frontend/trademaster_ui/src/tests/`)

Component test — extend backtestResultsViewMode.test.js or add backtestRiskMetrics.test.js: - Render BacktestResults with a mock result containing all six new fields. - Assert each field label appears in the document. - Render with insufficient_data: true and assert the info alert is visible. - Render with all six fields as null and assert -- appears in each cell (safeToFixed behavior). - Assert VaR/CVaR values render with % suffix. - Assert VaR/CVaR do NOT go through formatMoney when isObfuscated=true — verify the raw % label is still shown.

Estimated scope

Backend: ~60 lines (new helper + wiring). Frontend: ~80 lines (new card + alert). Tests: ~100 lines.
No migrations. No new dependencies. numpy is already imported.

Open questions for operator

None. Card is fully specified. The only deferred item is Beta to SPY — the card says "Beta (when benchmark provided)" and v1 always returns null. A follow-on card can wire in SPY bars if Kristerpher wants it.