DET-COST-001 — dyno-hour spike

Rule ID: DET-COST-001 Title: Heroku dyno-hour spend rate above the 99.9th percentile of the prior 7 days Category: cost Last validated: 2026-06-04 (initial catalog, dormant) State: dormant — requires Heroku Platform API scraper (campaign prerequisite §P6)

Telemetry source

Heroku Platform API: GET /apps/{app}/dynos (current dyno state) and GET /apps/{app}/formation (configured formation). Heroku does not surface dyno-hour invoices live; the proxy is (dyno_count × dyno_type_cost_per_hour × elapsed_hours_since_last_sample).
Sampling cadence: every 10 minutes via the shared scraper (see §P6 in campaign doc).
Per-sample fields: app, dyno type, count, computed cost-rate per hour.

Statistical method + baseline window

Method: percentile threshold on per-app dyno-hour-spend rate.
Baseline window: rolling 7 days, computed as the 99.9th percentile of all 10-minute samples over the window.
Fire condition: observed spend rate for a 10-min window > baseline 99.9th percentile AND observed spend rate > $0.50/hr absolute floor (filters near-zero noise).

Threshold + expected FP rate

Pre-launch placeholder: spend rate > $2.00/hr per app. Absolute, replaced by dynamic post-baseline.
Expected FP rate (post-launch): ~1 per month, dominated by operator-driven heroku ps:scale events for migrations.

Alert route

MEDIUM (sustained > 99.9th percentile for 30+ min): ops@ daily digest.
HIGH (spend rate > 5× baseline median, any duration): #raxx-ops-alert-sev2-5 / #raxx-ops-alert-sev2.

Escalation owner

sre-agent primary — runaway autoscale, leaked worker, infinite-loop dyno burn.
operator for posture decisions about scaling caps.

Test fixture / synthetic positive

See _fixtures/dyno_hour_spike_positive.json for a synthetic 10-min sample showing $3.40/hr against a 7d baseline P99.9 of $0.85/hr.

What to do when this fires

Identify the app and dyno type whose spend spiked. Recent deploy? Manual scale? Autoscale event?
Check Heroku dashboard for the app's dyno graph; correlate with deploy timeline.
If runaway autoscale: cap formation via heroku ps:scale web=N (operator-only); dispatch sre-agent for root-cause.
If leaked worker (a worker dyno that should have exited but is still running): inspect worker log for infinite-loop signature.

What NOT to do

Do not auto-scale-down from this rule. Scale decisions go to sre-agent.
Do not exclude apps from this rule because they're "expected to be expensive" — set a higher per-app baseline if needed, but every app stays monitored.
Do not extend baseline window beyond 7 days; cost regressions hide in long baselines.