HEROKU_API_KEY Drift RecoveryOwner: Operator (Kristerpher) + agent Last updated: 2026-05-03 First incident: 2026-05-03 03:48 UTC (during MBT v1 polish-sprint deploys) Related issues: #925, #891, #943
If the rotation handler raised
OldTokenInvalidError("Heroku rejected the rolling token before mint"), that is the typed signal added in #943. The dyno'sHEROKU_API_KEYis the drifted copy — re-sync from vault per Path A or B below.
There are three copies of HEROKU_API_KEY that must agree:
/MooseQuest/heroku/HEROKU_API_KEY in Infisical. Source of truth.HEROKU_API_KEY at https://github.com/raxx-app/TradeMasterAPI/settings/secrets/actions. Read by every Heroku-deploy workflow.HEROKU_API_KEY on each of the four Heroku apps (raxx-console-prod, raxx-console-staging, raxx-api-prod, raxx-api-staging). Read by the running app for vendor calls.Drift = one of those values stops matching the others. The most-painful failure mode: GH Actions secret is stale, every Heroku deploy fails with Error: The token provided to HEROKU_API_KEY is invalid.
git push heroku fails in CIDeploy to Heroku via heroku CLI + credential helper (subtree split):
Error: The token provided to HEROKU_API_KEY is invalid. Please
double-check that you have the correct token, or run `heroku login`
without HEROKU_API_KEY set.
heroku run fails locally (different problem; usually means your local CLI auth is stale, not vault drift)heroku run --app raxx-console-prod --no-tty 'python -c "
import requests
from app.services import vault
t = vault.get_secret_value(\"HEROKU_API_KEY\")
r = requests.get(\"https://api.heroku.com/account\",
headers={\"Authorization\": f\"Bearer {t}\",
\"Accept\": \"application/vnd.heroku+json; version=3\"},
timeout=10)
print(\"vault token validates:\", r.status_code, \"OK\" if r.ok else r.json())
"'
If this returns 200 OK → vault is valid; the drift is in GH or Heroku config. Continue below.
If this returns 401 → vault itself is stale; you need to mint a new token via the rotate-from-console UI (#885). Stop here; that's a different runbook.
# Read vault value (safe; runs in dyno, value goes only to your terminal)
heroku run --app raxx-console-prod --no-tty 'python -c "
from app.services import vault
print(vault.get_secret_value(\"HEROKU_API_KEY\"))
"' | tail -1
Copy the value, then:
https://github.com/raxx-app/TradeMasterAPI/settings/secrets/actionsHEROKU_API_KEY → "Update secret"Re-run the failed deploy:
gh workflow run deploy-console.yml -f environment=production -f ref=main
GITHUB_API_SECRETS_TOKEN (#925)Once GITHUB_API_SECRETS_TOKEN is in vault with secrets:write scope:
heroku run --app raxx-console-prod --no-tty 'python /dev/stdin' <<'PY'
import base64, os, sys, requests
from nacl.public import PublicKey, SealedBox
from app.services import vault
token = vault.get_secret_value("HEROKU_API_KEY")
gh = vault.get_secret_value("GITHUB_API_SECRETS_TOKEN")
H = {"Authorization": f"Bearer {gh}",
"Accept": "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28"}
r = requests.get("https://api.github.com/repos/raxx-app/TradeMasterAPI/actions/secrets/public-key", headers=H, timeout=15)
r.raise_for_status()
pk = r.json()
sealed = SealedBox(PublicKey(base64.b64decode(pk["key"])))
encrypted = base64.b64encode(sealed.encrypt(token.encode("utf-8"))).decode("ascii")
r = requests.put("https://api.github.com/repos/raxx-app/TradeMasterAPI/actions/secrets/HEROKU_API_KEY",
headers=H, json={"encrypted_value": encrypted, "key_id": pk["key_id"]}, timeout=15)
print(f"GH secret PUT: HTTP {r.status_code}")
sys.exit(0 if r.ok else 1)
PY
This is the inverse drift: vault rotted but the live apps still work. Less common.
bash
heroku config:get HEROKU_API_KEY -a raxx-console-prodrequests.get /account test as above)bash
heroku run --app raxx-console-prod --no-tty 'python -c "
from app.services import vault
vault.store_secret_version(\"HEROKU_API_KEY\", \"<paste here>\")
"'GITHUB_API_SECRETS_TOKEN), the handler can fully self-heal. Until then, the GH-secret destination is operator-only on each rotation.console/app/services/handler_validator.py).### HEROKU_API_KEY drift — <UTC timestamp>
- Detected: <how — log line / failed deploy>
- Vault state: <valid/stale>
- GH secret state: <valid/stale>
- Heroku config state: <valid/stale on each of the 4 apps>
- Root cause: <manual rotation? failed Mode A handler? UI dashboard rotation?>
- Recovery: <Path A / Path B / inverse>
- Time to recover: <minutes>
- Followup: <issue number>
Save postmortems at docs/ops/postmortems/heroku-key-drift-<YYYY-MM-DD>.md.