GitHub App credentials runbook
System: GitHub App installation tokens (raxx-dev-bot, raxx-ops-bot, raxx-pm-bot)
Owner: ops
Last incident: 2026-05-18 (see docs/incidents/2026-05-09-bot-token-mint-404.md, SEV-1 #2392)
Last reviewed: 2026-05-18
Canonical vault path schema
Each bot has exactly three required secrets in Infisical, plus zero or more aux keys:
/MooseQuest/<bot-name>/
APP_ID — numeric App ID from GitHub (e.g., "3501328")
INSTALLATION_ID — numeric installation ID after the App is installed on raxx-app org
PRIVATE_KEY_PEM — full RSA private key (BEGIN/END RSA PRIVATE KEY block, newlines intact)
Bot → vault path mapping:
| Bot | Vault path | Used by |
|---|---|---|
raxx-dev-bot |
/MooseQuest/raxx-dev-bot/ |
feature-developer, software-architect, ux-polisher |
raxx-ops-bot |
/MooseQuest/raxx-ops-bot/ |
sre-agent, security-agent, card-groomer, qa-agent |
raxx-pm-bot |
/MooseQuest/raxx-pm-bot/ |
product-manager, marketing-strategist, business-legal-researcher |
Full bot permissions are documented in docs/architecture/agent-github-identity.md.
Key name conventions:
- Names must be uppercase with underscores. The mint script (
scripts/agents/mint_github_token.py) fetches them by exact name —APP_ID,INSTALLATION_ID,PRIVATE_KEY_PEM. Lowercase or hyphenated names causeexit 4and silent fallback to operator PAT. - The
PRIVATE_KEY_PEMvalue must include the full PEM header and footer (-----BEGIN RSA PRIVATE KEY-----...-----END RSA PRIVATE KEY-----) with embedded newlines. A base64-only blob without the headers causes JWT signing failure.
Aux keys (present but not required by mint script):
raxx-ops-bot currently also holds GH_RAXX_OPS_BOT_CLIENT_SECRET at that path. This key is not read by the mint script and does not affect token minting.
Current audit state — 2026-05-18
| Bot | APP_ID present | INSTALLATION_ID valid | PRIVATE_KEY_PEM present | Smoke result |
|---|---|---|---|---|
raxx-dev-bot |
yes | yes | yes | ghs_ token minted; repo access confirmed |
raxx-ops-bot |
yes | yes (rotated 2026-05-18 per #2338) | yes (rotated 2026-05-18 per #2392) | ghs_ token minted; issues/repo access confirmed |
raxx-pm-bot |
yes | yes | yes | ghs_ token minted; repo access confirmed |
All three bots are complete and healthy as of this audit.
How to tell it's broken
- Agent logs "warning: token mint failed for
<bot>; falling back to operator PAT" — this is the fallback path inwith_bot_token.sh. - PRs and issues created by agents are attributed to
MooseQuest(operator) instead ofraxx-*-bot[bot]. - CI workflow fails at "Mint installation token" step with exit code 3 (Infisical fetch failed) or exit code 5 (GitHub API rejected the JWT exchange).
scripts/agents/mint_github_token.py --bot <bot>exits non-zero and prints an error.
How to diagnose (in order)
1. Run the mint script directly without --quiet
python3 scripts/agents/mint_github_token.py --bot raxx-ops-bot
Expected output on success:
# raxx-ops-bot token expires_at=<ISO8601>
ghs_AAAA...
Exit codes:
- exit 2 — INFISICAL_CLIENT_ID / INFISICAL_CLIENT_SECRET / INFISICAL_PROJECT_ID not set in shell
- exit 3 — Infisical login or fetch failed (wrong credentials, Infisical down, CF Access blocking)
- exit 4 — Vault path exists but one or more of APP_ID, INSTALLATION_ID, PRIVATE_KEY_PEM is missing or empty
- exit 5 — GitHub API rejected the JWT exchange (stale installation ID, revoked App, wrong App ID)
2. Check which keys are present in vault (without printing values)
python3 - <<'EOF'
import os, json, urllib.parse, urllib.request
HOST = os.environ.get("INFISICAL_HOST", "https://app.infisical.com")
CLIENT_ID = os.environ["INFISICAL_CLIENT_ID"]
CLIENT_SECRET = os.environ["INFISICAL_CLIENT_SECRET"]
PROJECT_ID = os.environ["INFISICAL_PROJECT_ID"]
CF_ID = os.environ.get("CF_ACCESS_CLIENT_ID", "")
CF_SEC = os.environ.get("CF_ACCESS_CLIENT_SECRET", "")
UA = "raxx-agent-token-mint/1.0"
def cf_hdrs():
return {"CF-Access-Client-Id": CF_ID, "CF-Access-Client-Secret": CF_SEC} if CF_ID and CF_SEC else {}
def post_json(url, body):
data = json.dumps(body).encode()
req = urllib.request.Request(url, data=data, method="POST",
headers={"Content-Type": "application/json", "User-Agent": UA, **cf_hdrs()})
with urllib.request.urlopen(req, timeout=10) as r: return json.loads(r.read())
def get_json(url, tok):
req = urllib.request.Request(url, method="GET",
headers={"Authorization": f"Bearer {tok}", "User-Agent": UA, **cf_hdrs()})
with urllib.request.urlopen(req, timeout=10) as r: return json.loads(r.read())
tok = post_json(f"{HOST}/api/v1/auth/universal-auth/login",
{"clientId": CLIENT_ID, "clientSecret": CLIENT_SECRET})["accessToken"]
for bot in ["raxx-dev-bot", "raxx-ops-bot", "raxx-pm-bot"]:
path = f"/MooseQuest/{bot}"
qs = urllib.parse.urlencode({"workspaceId": PROJECT_ID, "environment": "prod", "secretPath": path})
data = get_json(f"{HOST}/api/v3/secrets/raw?{qs}", tok)
keys = sorted(s["secretKey"] for s in data.get("secrets", []))
print(f"{bot}: {keys}")
EOF
Expected output:
raxx-dev-bot: ['APP_ID', 'INSTALLATION_ID', 'PRIVATE_KEY_PEM']
raxx-ops-bot: ['APP_ID', 'GH_RAXX_OPS_BOT_CLIENT_SECRET', 'INSTALLATION_ID', 'PRIVATE_KEY_PEM']
raxx-pm-bot: ['APP_ID', 'INSTALLATION_ID', 'PRIVATE_KEY_PEM']
3. Verify the installation ID is live on GitHub
The installation ID can go stale after a GitHub org migration or App reinstall. Check:
# Requires the bot's App ID and PEM from vault — use the mint script's output
# Or via the GitHub App settings page:
# https://github.com/organizations/raxx-app/settings/apps
# Click App → Install App → installation entry → URL contains the installation ID
If the ID in vault differs from the one in the GitHub UI, update vault (see failure mode B below).
4. Verify token format
A token starting with ghs_ is a GitHub App installation token (correct). A token starting with ghp_ is a personal access token (the operator PAT fallback — means the mint failed silently).
TOKEN=$(python3 scripts/agents/mint_github_token.py --bot raxx-ops-bot --quiet 2>/dev/null)
echo "${TOKEN:0:4}" # Expect: ghs_
5. Verify repo access with the minted token
scripts/agents/with_bot_token.sh raxx-ops-bot gh api /repos/raxx-app/TradeMasterAPI 2>/dev/null | grep '"full_name"'
# Expect: "full_name":"raxx-app/TradeMasterAPI"
Note: gh api /user returns HTTP 403 for installation tokens — this is expected. Installation tokens are repo-scoped and cannot call the user endpoint. Use /repos/raxx-app/TradeMasterAPI to verify access instead.
Known failure modes
Failure mode A: Missing or empty vault secrets (exit 4)
Symptom: error: bot secrets at /MooseQuest/<bot>/ missing keys: PRIVATE_KEY_PEM (or APP_ID, or INSTALLATION_ID)
Cause: Vault path exists but one or more required secrets were never written, were written under the wrong key name (e.g., lowercase private_key_pem), or were accidentally deleted.
Fix:
1. Open Infisical at the bot's path (/MooseQuest/<bot-name>/).
2. Confirm the three keys exist with their exact uppercase names.
3. If a key is missing, add it per docs/ops/runbooks/github-app-provisioning.md step 5.
4. If a key exists with the wrong name, rename it in Infisical (create new with correct name, delete old).
Verification:
python3 scripts/agents/mint_github_token.py --bot <bot-name>
# Expect: ghs_... token on stdout
Failure mode B: Stale installation ID (exit 5, HTTP 404)
Symptom: error: GitHub API returned 404 when exchanging JWT for installation token
Cause: The INSTALLATION_ID in vault no longer matches an active installation on the raxx-app org. This happens after a GitHub org migration, an App reinstall, or if the App was uninstalled and reinstalled.
Fix:
1. Open https://github.com/organizations/raxx-app/settings/apps.
2. Click the App (e.g., raxx-ops-bot) → Install App → click the installation for raxx-app.
3. The URL will end with the installation ID: https://github.com/organizations/raxx-app/settings/installations/<ID>.
4. Update Infisical at /MooseQuest/<bot-name>/INSTALLATION_ID with the new value.
5. Update the corresponding GitHub Actions repo secret if any workflows read it directly:
bash
gh secret set RAXX_OPS_BOT_INSTALL_ID --body "<new_id>" >/dev/null 2>&1
Verification:
scripts/agents/with_bot_token.sh <bot-name> gh api /repos/raxx-app/TradeMasterAPI 2>/dev/null | grep '"full_name"'
# Expect: "full_name":"raxx-app/TradeMasterAPI"
History: This failure occurred twice — 2026-05-09 (org migration from MooseQuest to raxx-app) and 2026-05-18 (#2338). The 2026-05-09 incident also had a code bug in the path prefix; that was fixed in the same session.
Failure mode C: Revoked or expired private key (exit 5, HTTP 401)
Symptom: error: GitHub API returned 401 when exchanging JWT for installation token
Cause: The PRIVATE_KEY_PEM in vault is stale — either the key was revoked in the GitHub App settings (e.g., during a leak response), or the PEM in vault wasn't updated after a key rotation.
Fix:
1. Open GitHub App settings for the bot.
2. Under Private keys, check if the stored key is still listed as active.
3. If revoked: generate a new private key (GitHub App settings → Private keys → Generate), write the new PEM to Infisical at /MooseQuest/<bot-name>/PRIVATE_KEY_PEM.
4. Update GitHub Actions repo secrets if any workflows read the PEM directly (see propagation gap note below).
Verification:
python3 scripts/agents/mint_github_token.py --bot <bot-name>
# Expect: ghs_... token
Propagation gap: If workflows read the PEM from a GH Actions repo secret (e.g., RAXX_OPS_BOT_PRIVATE_KEY) rather than reading from vault at runtime, rotating the PEM in vault does not auto-propagate. Each affected secret must also be updated:
# Read from vault and update the GH Actions secret:
PEM_B64=$(python3 -c "
import os, json, urllib.parse, urllib.request
# [use the audit snippet from section 2 above to fetch the PEM]
# base64-encode it: import base64; print(base64.b64encode(pem.encode()).decode())
")
gh secret set RAXX_OPS_BOT_PRIVATE_KEY --body "${PEM_B64}" >/dev/null 2>&1
Long-term fix: refactor workflows to read the PEM from vault via .github/actions/load-vault-secrets at runtime rather than from a GH Actions repo secret. This eliminates the propagation step entirely.
Failure mode D: Infisical unreachable or CF Access blocking (exit 3)
Symptom: error: Infisical login failed: HTTP 302 or HTTP 401 or timeout
Cause options:
- HTTP 302 — Cloudflare Access is in front of the Infisical host but CF Access service-token headers are not being sent
- HTTP 401 — Wrong INFISICAL_CLIENT_ID or INFISICAL_CLIENT_SECRET
- Timeout — Infisical host unreachable (check https://status.infisical.com)
Fix for CF Access 302: Set both CF_ACCESS_CLIENT_ID and CF_ACCESS_CLIENT_SECRET in the shell environment. Both must be present — if only one is set, the mint script ignores both (CF Access requires both headers to be present simultaneously).
Fix for 401: Verify the Machine Identity credentials in your shell config. Rotate the client secret if stale per docs/ops/runbooks/agent-bot-tokens-setup.md.
Note on User-Agent: Infisical API calls must include an explicit User-Agent header. The default Python-urllib/3.x UA triggers Cloudflare Bot Fight Mode filtering even when CF Access service tokens are valid. The mint script sends raxx-agent-token-mint/1.0 explicitly.
Failure mode E: Path prefix mismatch (exit 4, empty secrets)
Symptom: error: bot secrets at /MooseQuest/github/raxx-dev-bot/ missing keys: APP_ID, INSTALLATION_ID, PRIVATE_KEY_PEM (note the spurious /github/ segment)
Cause: Caller set INFISICAL_PATH_PREFIX to a non-default value (e.g., /MooseQuest/github/) that doesn't match where secrets are actually stored.
Fix: Unset or correct INFISICAL_PATH_PREFIX. The documented and correct default is /MooseQuest/. The effective vault path is {prefix}/{bot}, e.g., /MooseQuest/raxx-dev-bot.
History: This was the latent code defect found in the 2026-05-09 incident — DEFAULT_PATH_PREFIX was /MooseQuest/github/ in the code before the fix.
GitHub org migration checklist
When the GitHub org is renamed or the repo is transferred to a new org, follow this order:
- GitHub App installations are org-scoped and do not migrate automatically. Re-install each App on the new org (
https://github.com/organizations/<new-org>/settings/apps→ Install App). - For each re-install, capture the new installation ID from the URL:
https://github.com/organizations/<new-org>/settings/installations/<ID>. - Update
INSTALLATION_IDin Infisical for all three bots. - Update any GH Actions repo secrets that hold an installation ID.
- Update the org URL in
docs/ops/runbooks/github-app-provisioning.mdand this file. - Smoke-test all three bots:
scripts/agents/with_bot_token.sh <bot> gh api /repos/<new-org>/TradeMasterAPI 2>/dev/null | grep '"full_name"'.
Note: The Infisical path prefix /MooseQuest/ is independent of the GitHub org name. It does not change when the org is renamed.
Key rotation SOP (abbreviated)
Full procedure: docs/ops/runbooks/rotation/github-app-installation-token.md.
Short version:
- GitHub App settings → Private keys → Generate a private key (do not delete the old key yet).
- Write new PEM to Infisical at
/MooseQuest/<bot-name>/PRIVATE_KEY_PEM. - Verify a fresh token mints successfully:
python3 scripts/agents/mint_github_token.py --bot <bot-name>. - Update any GH Actions repo secrets that hold the PEM directly (see failure mode C above).
- Delete the old key from GitHub App settings.
Rotation cadence: 365 days per bot. Track via Velvet rotation pipeline (#300).
Emergency stop
To force all agents to fall back to the operator PAT immediately (without breaking vault):
# Option A: clear the shell env vars so the mint script exits 2
unset INFISICAL_CLIENT_ID INFISICAL_CLIENT_SECRET INFISICAL_PROJECT_ID
This makes the fallback path in with_bot_token.sh engage for all subsequent agent dispatches in the current shell session. Agents still function; activity attributes to the operator PAT.
To disable a single bot's minting without affecting others, the fastest path is to set INSTALLATION_ID to an invalid value in vault (e.g., "0") — the GitHub token exchange will fail (exit 5) and the wrapper will fall back to PAT with a warning.
Escalation
Escalate to the operator when:
- All three bots fail simultaneously and Infisical is reachable (suggests a Machine Identity issue or project-level access revocation)
- A private key leak is suspected (see SEV-1 #2392 for precedent — rotate the key immediately, then assess blast radius)
- An App is not found in https://github.com/organizations/raxx-app/settings/apps (App was deleted; requires reprovisioning from scratch per docs/ops/runbooks/github-app-provisioning.md)
References
- Architecture:
docs/architecture/agent-github-identity.md - Provisioning:
docs/ops/runbooks/github-app-provisioning.md - Setup:
docs/ops/runbooks/agent-bot-tokens-setup.md - Rotation:
docs/ops/runbooks/rotation/github-app-installation-token.md - Mint script:
scripts/agents/mint_github_token.py - Wrapper:
scripts/agents/with_bot_token.sh - Bot map:
scripts/agents/agent_bot_map.yaml - Prior incidents:
docs/incidents/2026-05-09-bot-token-mint-404.md - Issue #335 (implementation tracking), #2338 (stale install ID 2026-05-18), #2392 (key leak SEV-1 2026-05-18)
- Issue #2278 (this audit — SC-IDENT-1)