Raxx · internal docs

internal · gated

GitHub App credentials runbook

System: GitHub App installation tokens (raxx-dev-bot, raxx-ops-bot, raxx-pm-bot) Owner: ops Last incident: 2026-05-18 (see docs/incidents/2026-05-09-bot-token-mint-404.md, SEV-1 #2392) Last reviewed: 2026-05-18


Canonical vault path schema

Each bot has exactly three required secrets in Infisical, plus zero or more aux keys:

/MooseQuest/<bot-name>/
  APP_ID            — numeric App ID from GitHub (e.g., "3501328")
  INSTALLATION_ID   — numeric installation ID after the App is installed on raxx-app org
  PRIVATE_KEY_PEM   — full RSA private key (BEGIN/END RSA PRIVATE KEY block, newlines intact)

Bot → vault path mapping:

Bot Vault path Used by
raxx-dev-bot /MooseQuest/raxx-dev-bot/ feature-developer, software-architect, ux-polisher
raxx-ops-bot /MooseQuest/raxx-ops-bot/ sre-agent, security-agent, card-groomer, qa-agent
raxx-pm-bot /MooseQuest/raxx-pm-bot/ product-manager, marketing-strategist, business-legal-researcher

Full bot permissions are documented in docs/architecture/agent-github-identity.md.

Key name conventions:

Aux keys (present but not required by mint script):

raxx-ops-bot currently also holds GH_RAXX_OPS_BOT_CLIENT_SECRET at that path. This key is not read by the mint script and does not affect token minting.


Current audit state — 2026-05-18

Bot APP_ID present INSTALLATION_ID valid PRIVATE_KEY_PEM present Smoke result
raxx-dev-bot yes yes yes ghs_ token minted; repo access confirmed
raxx-ops-bot yes yes (rotated 2026-05-18 per #2338) yes (rotated 2026-05-18 per #2392) ghs_ token minted; issues/repo access confirmed
raxx-pm-bot yes yes yes ghs_ token minted; repo access confirmed

All three bots are complete and healthy as of this audit.


How to tell it's broken


How to diagnose (in order)

1. Run the mint script directly without --quiet

python3 scripts/agents/mint_github_token.py --bot raxx-ops-bot

Expected output on success:

# raxx-ops-bot token expires_at=<ISO8601>
ghs_AAAA...

Exit codes: - exit 2INFISICAL_CLIENT_ID / INFISICAL_CLIENT_SECRET / INFISICAL_PROJECT_ID not set in shell - exit 3 — Infisical login or fetch failed (wrong credentials, Infisical down, CF Access blocking) - exit 4 — Vault path exists but one or more of APP_ID, INSTALLATION_ID, PRIVATE_KEY_PEM is missing or empty - exit 5 — GitHub API rejected the JWT exchange (stale installation ID, revoked App, wrong App ID)

2. Check which keys are present in vault (without printing values)

python3 - <<'EOF'
import os, json, urllib.parse, urllib.request

HOST = os.environ.get("INFISICAL_HOST", "https://app.infisical.com")
CLIENT_ID = os.environ["INFISICAL_CLIENT_ID"]
CLIENT_SECRET = os.environ["INFISICAL_CLIENT_SECRET"]
PROJECT_ID = os.environ["INFISICAL_PROJECT_ID"]
CF_ID = os.environ.get("CF_ACCESS_CLIENT_ID", "")
CF_SEC = os.environ.get("CF_ACCESS_CLIENT_SECRET", "")
UA = "raxx-agent-token-mint/1.0"

def cf_hdrs():
    return {"CF-Access-Client-Id": CF_ID, "CF-Access-Client-Secret": CF_SEC} if CF_ID and CF_SEC else {}

def post_json(url, body):
    data = json.dumps(body).encode()
    req = urllib.request.Request(url, data=data, method="POST",
          headers={"Content-Type": "application/json", "User-Agent": UA, **cf_hdrs()})
    with urllib.request.urlopen(req, timeout=10) as r: return json.loads(r.read())

def get_json(url, tok):
    req = urllib.request.Request(url, method="GET",
          headers={"Authorization": f"Bearer {tok}", "User-Agent": UA, **cf_hdrs()})
    with urllib.request.urlopen(req, timeout=10) as r: return json.loads(r.read())

tok = post_json(f"{HOST}/api/v1/auth/universal-auth/login",
                {"clientId": CLIENT_ID, "clientSecret": CLIENT_SECRET})["accessToken"]
for bot in ["raxx-dev-bot", "raxx-ops-bot", "raxx-pm-bot"]:
    path = f"/MooseQuest/{bot}"
    qs = urllib.parse.urlencode({"workspaceId": PROJECT_ID, "environment": "prod", "secretPath": path})
    data = get_json(f"{HOST}/api/v3/secrets/raw?{qs}", tok)
    keys = sorted(s["secretKey"] for s in data.get("secrets", []))
    print(f"{bot}: {keys}")
EOF

Expected output:

raxx-dev-bot: ['APP_ID', 'INSTALLATION_ID', 'PRIVATE_KEY_PEM']
raxx-ops-bot: ['APP_ID', 'GH_RAXX_OPS_BOT_CLIENT_SECRET', 'INSTALLATION_ID', 'PRIVATE_KEY_PEM']
raxx-pm-bot: ['APP_ID', 'INSTALLATION_ID', 'PRIVATE_KEY_PEM']

3. Verify the installation ID is live on GitHub

The installation ID can go stale after a GitHub org migration or App reinstall. Check:

# Requires the bot's App ID and PEM from vault — use the mint script's output
# Or via the GitHub App settings page:
# https://github.com/organizations/raxx-app/settings/apps
# Click App → Install App → installation entry → URL contains the installation ID

If the ID in vault differs from the one in the GitHub UI, update vault (see failure mode B below).

4. Verify token format

A token starting with ghs_ is a GitHub App installation token (correct). A token starting with ghp_ is a personal access token (the operator PAT fallback — means the mint failed silently).

TOKEN=$(python3 scripts/agents/mint_github_token.py --bot raxx-ops-bot --quiet 2>/dev/null)
echo "${TOKEN:0:4}"  # Expect: ghs_

5. Verify repo access with the minted token

scripts/agents/with_bot_token.sh raxx-ops-bot gh api /repos/raxx-app/TradeMasterAPI 2>/dev/null | grep '"full_name"'
# Expect: "full_name":"raxx-app/TradeMasterAPI"

Note: gh api /user returns HTTP 403 for installation tokens — this is expected. Installation tokens are repo-scoped and cannot call the user endpoint. Use /repos/raxx-app/TradeMasterAPI to verify access instead.


Known failure modes

Failure mode A: Missing or empty vault secrets (exit 4)

Symptom: error: bot secrets at /MooseQuest/<bot>/ missing keys: PRIVATE_KEY_PEM (or APP_ID, or INSTALLATION_ID)

Cause: Vault path exists but one or more required secrets were never written, were written under the wrong key name (e.g., lowercase private_key_pem), or were accidentally deleted.

Fix: 1. Open Infisical at the bot's path (/MooseQuest/<bot-name>/). 2. Confirm the three keys exist with their exact uppercase names. 3. If a key is missing, add it per docs/ops/runbooks/github-app-provisioning.md step 5. 4. If a key exists with the wrong name, rename it in Infisical (create new with correct name, delete old).

Verification:

python3 scripts/agents/mint_github_token.py --bot <bot-name>
# Expect: ghs_... token on stdout

Failure mode B: Stale installation ID (exit 5, HTTP 404)

Symptom: error: GitHub API returned 404 when exchanging JWT for installation token

Cause: The INSTALLATION_ID in vault no longer matches an active installation on the raxx-app org. This happens after a GitHub org migration, an App reinstall, or if the App was uninstalled and reinstalled.

Fix: 1. Open https://github.com/organizations/raxx-app/settings/apps. 2. Click the App (e.g., raxx-ops-bot) → Install App → click the installation for raxx-app. 3. The URL will end with the installation ID: https://github.com/organizations/raxx-app/settings/installations/<ID>. 4. Update Infisical at /MooseQuest/<bot-name>/INSTALLATION_ID with the new value. 5. Update the corresponding GitHub Actions repo secret if any workflows read it directly: bash gh secret set RAXX_OPS_BOT_INSTALL_ID --body "<new_id>" >/dev/null 2>&1

Verification:

scripts/agents/with_bot_token.sh <bot-name> gh api /repos/raxx-app/TradeMasterAPI 2>/dev/null | grep '"full_name"'
# Expect: "full_name":"raxx-app/TradeMasterAPI"

History: This failure occurred twice — 2026-05-09 (org migration from MooseQuest to raxx-app) and 2026-05-18 (#2338). The 2026-05-09 incident also had a code bug in the path prefix; that was fixed in the same session.

Failure mode C: Revoked or expired private key (exit 5, HTTP 401)

Symptom: error: GitHub API returned 401 when exchanging JWT for installation token

Cause: The PRIVATE_KEY_PEM in vault is stale — either the key was revoked in the GitHub App settings (e.g., during a leak response), or the PEM in vault wasn't updated after a key rotation.

Fix: 1. Open GitHub App settings for the bot. 2. Under Private keys, check if the stored key is still listed as active. 3. If revoked: generate a new private key (GitHub App settings → Private keys → Generate), write the new PEM to Infisical at /MooseQuest/<bot-name>/PRIVATE_KEY_PEM. 4. Update GitHub Actions repo secrets if any workflows read the PEM directly (see propagation gap note below).

Verification:

python3 scripts/agents/mint_github_token.py --bot <bot-name>
# Expect: ghs_... token

Propagation gap: If workflows read the PEM from a GH Actions repo secret (e.g., RAXX_OPS_BOT_PRIVATE_KEY) rather than reading from vault at runtime, rotating the PEM in vault does not auto-propagate. Each affected secret must also be updated:

# Read from vault and update the GH Actions secret:
PEM_B64=$(python3 -c "
import os, json, urllib.parse, urllib.request
# [use the audit snippet from section 2 above to fetch the PEM]
# base64-encode it: import base64; print(base64.b64encode(pem.encode()).decode())
")
gh secret set RAXX_OPS_BOT_PRIVATE_KEY --body "${PEM_B64}" >/dev/null 2>&1

Long-term fix: refactor workflows to read the PEM from vault via .github/actions/load-vault-secrets at runtime rather than from a GH Actions repo secret. This eliminates the propagation step entirely.

Failure mode D: Infisical unreachable or CF Access blocking (exit 3)

Symptom: error: Infisical login failed: HTTP 302 or HTTP 401 or timeout

Cause options: - HTTP 302 — Cloudflare Access is in front of the Infisical host but CF Access service-token headers are not being sent - HTTP 401 — Wrong INFISICAL_CLIENT_ID or INFISICAL_CLIENT_SECRET - Timeout — Infisical host unreachable (check https://status.infisical.com)

Fix for CF Access 302: Set both CF_ACCESS_CLIENT_ID and CF_ACCESS_CLIENT_SECRET in the shell environment. Both must be present — if only one is set, the mint script ignores both (CF Access requires both headers to be present simultaneously).

Fix for 401: Verify the Machine Identity credentials in your shell config. Rotate the client secret if stale per docs/ops/runbooks/agent-bot-tokens-setup.md.

Note on User-Agent: Infisical API calls must include an explicit User-Agent header. The default Python-urllib/3.x UA triggers Cloudflare Bot Fight Mode filtering even when CF Access service tokens are valid. The mint script sends raxx-agent-token-mint/1.0 explicitly.

Failure mode E: Path prefix mismatch (exit 4, empty secrets)

Symptom: error: bot secrets at /MooseQuest/github/raxx-dev-bot/ missing keys: APP_ID, INSTALLATION_ID, PRIVATE_KEY_PEM (note the spurious /github/ segment)

Cause: Caller set INFISICAL_PATH_PREFIX to a non-default value (e.g., /MooseQuest/github/) that doesn't match where secrets are actually stored.

Fix: Unset or correct INFISICAL_PATH_PREFIX. The documented and correct default is /MooseQuest/. The effective vault path is {prefix}/{bot}, e.g., /MooseQuest/raxx-dev-bot.

History: This was the latent code defect found in the 2026-05-09 incident — DEFAULT_PATH_PREFIX was /MooseQuest/github/ in the code before the fix.


GitHub org migration checklist

When the GitHub org is renamed or the repo is transferred to a new org, follow this order:

  1. GitHub App installations are org-scoped and do not migrate automatically. Re-install each App on the new org (https://github.com/organizations/<new-org>/settings/apps → Install App).
  2. For each re-install, capture the new installation ID from the URL: https://github.com/organizations/<new-org>/settings/installations/<ID>.
  3. Update INSTALLATION_ID in Infisical for all three bots.
  4. Update any GH Actions repo secrets that hold an installation ID.
  5. Update the org URL in docs/ops/runbooks/github-app-provisioning.md and this file.
  6. Smoke-test all three bots: scripts/agents/with_bot_token.sh <bot> gh api /repos/<new-org>/TradeMasterAPI 2>/dev/null | grep '"full_name"'.

Note: The Infisical path prefix /MooseQuest/ is independent of the GitHub org name. It does not change when the org is renamed.


Key rotation SOP (abbreviated)

Full procedure: docs/ops/runbooks/rotation/github-app-installation-token.md.

Short version:

  1. GitHub App settings → Private keys → Generate a private key (do not delete the old key yet).
  2. Write new PEM to Infisical at /MooseQuest/<bot-name>/PRIVATE_KEY_PEM.
  3. Verify a fresh token mints successfully: python3 scripts/agents/mint_github_token.py --bot <bot-name>.
  4. Update any GH Actions repo secrets that hold the PEM directly (see failure mode C above).
  5. Delete the old key from GitHub App settings.

Rotation cadence: 365 days per bot. Track via Velvet rotation pipeline (#300).


Emergency stop

To force all agents to fall back to the operator PAT immediately (without breaking vault):

# Option A: clear the shell env vars so the mint script exits 2
unset INFISICAL_CLIENT_ID INFISICAL_CLIENT_SECRET INFISICAL_PROJECT_ID

This makes the fallback path in with_bot_token.sh engage for all subsequent agent dispatches in the current shell session. Agents still function; activity attributes to the operator PAT.

To disable a single bot's minting without affecting others, the fastest path is to set INSTALLATION_ID to an invalid value in vault (e.g., "0") — the GitHub token exchange will fail (exit 5) and the wrapper will fall back to PAT with a warning.


Escalation

Escalate to the operator when: - All three bots fail simultaneously and Infisical is reachable (suggests a Machine Identity issue or project-level access revocation) - A private key leak is suspected (see SEV-1 #2392 for precedent — rotate the key immediately, then assess blast radius) - An App is not found in https://github.com/organizations/raxx-app/settings/apps (App was deleted; requires reprovisioning from scratch per docs/ops/runbooks/github-app-provisioning.md)


References