DET-COST-001 — dyno-hour spike
Rule ID: DET-COST-001
Title: Heroku dyno-hour spend rate above the 99.9th percentile of the prior 7 days
Category: cost
Last validated: 2026-06-04 (initial catalog, dormant)
State: dormant — requires Heroku Platform API scraper (campaign prerequisite §P6)
Telemetry source
- Heroku Platform API:
GET /apps/{app}/dynos(current dyno state) andGET /apps/{app}/formation(configured formation). Heroku does not surface dyno-hour invoices live; the proxy is(dyno_count × dyno_type_cost_per_hour × elapsed_hours_since_last_sample). - Sampling cadence: every 10 minutes via the shared scraper (see §P6 in campaign doc).
- Per-sample fields: app, dyno type, count, computed cost-rate per hour.
Statistical method + baseline window
- Method: percentile threshold on per-app dyno-hour-spend rate.
- Baseline window: rolling 7 days, computed as the 99.9th percentile of all 10-minute samples over the window.
- Fire condition: observed spend rate for a 10-min window > baseline 99.9th percentile AND observed spend rate > $0.50/hr absolute floor (filters near-zero noise).
Threshold + expected FP rate
- Pre-launch placeholder: spend rate > $2.00/hr per app. Absolute, replaced by dynamic post-baseline.
- Expected FP rate (post-launch): ~1 per month, dominated by operator-driven
heroku ps:scaleevents for migrations.
Alert route
- MEDIUM (sustained > 99.9th percentile for 30+ min): ops@ daily digest.
- HIGH (spend rate > 5× baseline median, any duration):
#raxx-ops-alert-sev2-5/#raxx-ops-alert-sev2.
Escalation owner
- sre-agent primary — runaway autoscale, leaked worker, infinite-loop dyno burn.
- operator for posture decisions about scaling caps.
Test fixture / synthetic positive
See _fixtures/dyno_hour_spike_positive.json for a synthetic 10-min sample showing $3.40/hr against a 7d baseline P99.9 of $0.85/hr.
What to do when this fires
- Identify the app and dyno type whose spend spiked. Recent deploy? Manual scale? Autoscale event?
- Check Heroku dashboard for the app's dyno graph; correlate with deploy timeline.
- If runaway autoscale: cap formation via
heroku ps:scale web=N(operator-only); dispatch sre-agent for root-cause. - If leaked worker (a worker dyno that should have exited but is still running): inspect worker log for infinite-loop signature.
What NOT to do
- Do not auto-scale-down from this rule. Scale decisions go to sre-agent.
- Do not exclude apps from this rule because they're "expected to be expensive" — set a higher per-app baseline if needed, but every app stays monitored.
- Do not extend baseline window beyond 7 days; cost regressions hide in long baselines.