Raxx · internal docs

internal · gated ↑ index

SOP — HEROKU_API_KEY Drift Recovery

Owner: Operator (Kristerpher) + agent Last updated: 2026-05-03 First incident: 2026-05-03 03:48 UTC (during MBT v1 polish-sprint deploys) Related issues: #925, #891, #943

If the rotation handler raised OldTokenInvalidError ("Heroku rejected the rolling token before mint"), that is the typed signal added in #943. The dyno's HEROKU_API_KEY is the drifted copy — re-sync from vault per Path A or B below.


What "drift" means here

There are three copies of HEROKU_API_KEY that must agree:

  1. Vault/MooseQuest/heroku/HEROKU_API_KEY in Infisical. Source of truth.
  2. GitHub Actions secretHEROKU_API_KEY at https://github.com/raxx-app/TradeMasterAPI/settings/secrets/actions. Read by every Heroku-deploy workflow.
  3. Heroku config varHEROKU_API_KEY on each of the four Heroku apps (raxx-console-prod, raxx-console-staging, raxx-api-prod, raxx-api-staging). Read by the running app for vendor calls.

Drift = one of those values stops matching the others. The most-painful failure mode: GH Actions secret is stale, every Heroku deploy fails with Error: The token provided to HEROKU_API_KEY is invalid.


How to detect drift

Symptom 1 — git push heroku fails in CI

Deploy to Heroku via heroku CLI + credential helper (subtree split):
  Error: The token provided to HEROKU_API_KEY is invalid. Please
  double-check that you have the correct token, or run `heroku login`
  without HEROKU_API_KEY set.

Symptom 2 — heroku run fails locally (different problem; usually means your local CLI auth is stale, not vault drift)

Confirm the source of truth is valid

heroku run --app raxx-console-prod --no-tty 'python -c "
import requests
from app.services import vault
t = vault.get_secret_value(\"HEROKU_API_KEY\")
r = requests.get(\"https://api.heroku.com/account\",
                 headers={\"Authorization\": f\"Bearer {t}\",
                          \"Accept\": \"application/vnd.heroku+json; version=3\"},
                 timeout=10)
print(\"vault token validates:\", r.status_code, \"OK\" if r.ok else r.json())
"'

If this returns 200 OK → vault is valid; the drift is in GH or Heroku config. Continue below.

If this returns 401 → vault itself is stale; you need to mint a new token via the rotate-from-console UI (#885). Stop here; that's a different runbook.


Recovery — if vault is valid, GH is stale

Path A — Manual paste (60 seconds)

# Read vault value (safe; runs in dyno, value goes only to your terminal)
heroku run --app raxx-console-prod --no-tty 'python -c "
from app.services import vault
print(vault.get_secret_value(\"HEROKU_API_KEY\"))
"' | tail -1

Copy the value, then:

  1. Open https://github.com/raxx-app/TradeMasterAPI/settings/secrets/actions
  2. Click HEROKU_API_KEY → "Update secret"
  3. Paste the value
  4. Click "Update secret"

Re-run the failed deploy:

gh workflow run deploy-console.yml -f environment=production -f ref=main

Path B — Automated, requires GITHUB_API_SECRETS_TOKEN (#925)

Once GITHUB_API_SECRETS_TOKEN is in vault with secrets:write scope:

heroku run --app raxx-console-prod --no-tty 'python /dev/stdin' <<'PY'
import base64, os, sys, requests
from nacl.public import PublicKey, SealedBox
from app.services import vault

token = vault.get_secret_value("HEROKU_API_KEY")
gh = vault.get_secret_value("GITHUB_API_SECRETS_TOKEN")
H = {"Authorization": f"Bearer {gh}",
     "Accept": "application/vnd.github+json",
     "X-GitHub-Api-Version": "2022-11-28"}

r = requests.get("https://api.github.com/repos/raxx-app/TradeMasterAPI/actions/secrets/public-key", headers=H, timeout=15)
r.raise_for_status()
pk = r.json()
sealed = SealedBox(PublicKey(base64.b64decode(pk["key"])))
encrypted = base64.b64encode(sealed.encrypt(token.encode("utf-8"))).decode("ascii")
r = requests.put("https://api.github.com/repos/raxx-app/TradeMasterAPI/actions/secrets/HEROKU_API_KEY",
                 headers=H, json={"encrypted_value": encrypted, "key_id": pk["key_id"]}, timeout=15)
print(f"GH secret PUT: HTTP {r.status_code}")
sys.exit(0 if r.ok else 1)
PY

Recovery — if vault is stale, Heroku/GH have valid tokens

This is the inverse drift: vault rotted but the live apps still work. Less common.

  1. Read the live token from one of the Heroku apps: bash heroku config:get HEROKU_API_KEY -a raxx-console-prod
  2. Validate it (same requests.get /account test as above)
  3. Write back to vault: bash heroku run --app raxx-console-prod --no-tty 'python -c " from app.services import vault vault.store_secret_version(\"HEROKU_API_KEY\", \"<paste here>\") "'

Prevention


Postmortem template (use after every drift incident)

### HEROKU_API_KEY drift — <UTC timestamp>

- Detected: <how — log line / failed deploy>
- Vault state: <valid/stale>
- GH secret state: <valid/stale>
- Heroku config state: <valid/stale on each of the 4 apps>
- Root cause: <manual rotation? failed Mode A handler? UI dashboard rotation?>
- Recovery: <Path A / Path B / inverse>
- Time to recover: <minutes>
- Followup: <issue number>

Save postmortems at docs/ops/postmortems/heroku-key-drift-<YYYY-MM-DD>.md.


Refs