Incident ID: 2026-05-09-bot-token-mint-404 Date: 2026-05-09 Severity: SEV-3 Duration: ~1 session (detection → root cause identified; full resolution pending operator vault update) Blast radius: All dispatched agents (sre-agent, feature-developer, product-manager, security-agent, card-groomer, raxx-blr-bot) — all fell back to operator PAT for the entire session Author: sre-agent
After the GitHub organization was renamed from MooseQuest to raxx-app, all three GitHub App bot tokens (raxx-dev-bot, raxx-ops-bot, raxx-pm-bot) stopped minting. Every agent dispatch fell back to the operator PAT. Two root causes were found: (1) the GitHub App installation IDs stored in Infisical still reference the pre-migration MooseQuest org installations, which no longer exist on raxx-app and return HTTP 404 from the GitHub token-exchange endpoint; (2) a latent code defect in mint_github_token.py caused the default Infisical path prefix to resolve to /MooseQuest/github/<bot> instead of the documented and correct /MooseQuest/<bot>. The code fix ships in this commit; the vault update (new installation IDs) requires operator action in the GitHub UI.
DEFAULT_PATH_PREFIX corrected); provisioning + arch docs updatedINFISICAL_PATH_PREFIX env var, so the operator could have worked around the code bug at runtimeDEFAULT_PATH_PREFIX (/MooseQuest/github/) diverged from the docs (/MooseQuest/), meaning the test at test_mint_github_token.py:216 was asserting the right expected path but the code was producing a wrong one — a signal that was never caughtgithub-app-provisioning.md) used lowercase secret key names (app-id, installation-id, private-key-pem) instead of the uppercase names the mint script expects (APP_ID, INSTALLATION_ID, PRIVATE_KEY_PEM); this would cause exit 4 failures on fresh provisioning from the runbookContributing factor 1: GitHub App installation IDs are org-scoped and do not migrate. When MooseQuest was renamed to raxx-app, the App installations on MooseQuest became orphaned. GitHub creates new installations when an App is re-installed on the new org, with new installation IDs. The vault still held the old IDs. Every call to POST /app/installations/{old_id}/access_tokens returned HTTP 404. The mint script exits 5 on this response and the wrapper falls back to PAT.
Contributing factor 2: DEFAULT_PATH_PREFIX hardcoded as /MooseQuest/github/ instead of /MooseQuest/. Line 137 of mint_github_token.py set DEFAULT_PATH_PREFIX = "/MooseQuest/github/". Combined with the path construction f"{path_prefix.rstrip('/')}/{bot}", the effective secret path was /MooseQuest/github/raxx-dev-bot instead of /MooseQuest/raxx-dev-bot. If the operator stored secrets at /MooseQuest/raxx-dev-bot/ (per the docs), the Infisical fetch would return empty secrets and the script would exit 4. If the operator happened to store them under /MooseQuest/github/raxx-dev-bot/ (matching the buggy default), the Infisical fetch succeeded — but then the GitHub token exchange would still fail with 404 because the installation ID was stale. Either way, no token was minted.
Contributing factor 3: No post-migration checklist existed. The system had no documented procedure for "GitHub org migration." The architecture doc, provisioning runbook, and agent-identity doc all assumed the org was static.
scripts/agents/with_bot_token.sh raxx-ops-bot gh api /user and alerts on non-ghs_ token format or fallback warning in output. Track in action item #1 below.scripts/agents/mint_github_token.py line 137: changed DEFAULT_PATH_PREFIX from "/MooseQuest/github/" to "/MooseQuest/". This aligns the default with the documented vault layout and with the INFISICAL_PATH_PREFIX default shown in agent-bot-tokens-setup.md.
Validation: The existing test at test_mint_github_token.py:216 asserts "/MooseQuest/raxx-dev-bot" in captured.err — this assertion was already correct for the intended path. After the fix, it passes for the default case (previously it would have checked against /MooseQuest/github/raxx-dev-bot which is wrong but the assertion checked the right thing).
See action items 2 and 3.
| # | Action | Owner | Due | Issue |
|---|---|---|---|---|
| 1 | Add a scheduled CI smoke-test that mints a bot token (raxx-ops-bot), checks that the output starts with ghs_, and pages/alerts if it falls back to PAT |
ops | 2026-05-16 | file new |
| 2 | Re-install the three GitHub Apps on the raxx-app org and capture new installation IDs | Kristerpher (operator) | 2026-05-09 | — |
| 3 | Update INSTALLATION_ID in Infisical at /MooseQuest/raxx-dev-bot/, /MooseQuest/raxx-ops-bot/, /MooseQuest/raxx-pm-bot/ with the new IDs from action item 2 |
Kristerpher (operator) | 2026-05-09 | — |
| 4 | Verify token mint after vault update: scripts/agents/with_bot_token.sh raxx-ops-bot gh api /user — confirm "login": "raxx-ops-bot[bot]" in response |
Kristerpher (operator) | 2026-05-09 | — |
docs/ops/runbooks/github-app-provisioning.md (updated this incident)docs/architecture/agent-github-identity.md (updated this incident — migration checklist added)docs/ops/runbooks/agent-bot-tokens-setup.mdscripts/agents/mint_github_token.pyscripts/agents/with_bot_token.sh