Raxx · internal docs

internal · gated

Handoff Agent Provisioning Runbook

System: handoff-agent-identity / SC-HANDOFF-GITHUB-01 Owner: ops ADR: ADR-0131 (docs/architecture/adr/0131-handoff-agent-least-privilege-identity.md) Design doc: docs/architecture/handoff-agent-least-privilege-identity.md Last reviewed: 2026-06-21


What this runbook covers

Every agent handoff dispatch must begin with a Step 0: provision a scoped, time-bound, independently revocable GitHub identity for the specific task. This runbook documents how to use provision_handoff_identity.py and what to do when it fails.

"Handoff agent" means: any Claude Cloud, scheduled CCR agent, new-repo agent, or unattended autonomous task. NOT live, operator-present sessions (those use the shared bot App tokens from mint_github_token.py).


Prerequisite

The three Infisical env vars must be in your shell:

INFISICAL_CLIENT_ID
INFISICAL_CLIENT_SECRET
INFISICAL_PROJECT_ID

These are the same credentials used by mint_github_token.py. See docs/ops/runbooks/agent-bot-tokens-setup.md for one-time setup.

Optionally, if your vault sits behind Cloudflare Access:

CF_ACCESS_CLIENT_ID
CF_ACCESS_CLIENT_SECRET

How to tell it's broken


How to diagnose (in order)

  1. Check Infisical env vars are set: echo "${INFISICAL_CLIENT_ID:0:4}..." Expected: shows the first 4 chars of your client ID. If empty, source your shell config.

  2. Run provision with verbose output: python3 scripts/agents/provision_handoff_identity.py \ --task-slug test-$(date +%Y%m%d) \ --repo TradeMasterAPI \ --owner raxx-app The script emits step-by-step progress to stderr. All steps numbered 1–5 must complete. Failure message pinpoints whether it's Infisical, GitHub, or vault.

  3. If GitHub returns 403/422 on PAT creation, confirm raxx-dev-bot App has the Fine-grained personal access tokens organization permission in GitHub Settings → Developer Settings → GitHub Apps → raxx-dev-bot → Permissions.

  4. If Infisical returns 404 on secret write, the folder creation step (step 3/5) failed silently. Run manually: # List folders at the handoff path to confirm existence. curl -s -H "Authorization: Bearer <infisical-token>" \ "https://app.infisical.com/api/v1/folders?workspaceId=<id>&environment=prod&path=/MooseQuest/handoffs/"


Known failure modes

Failure mode A: GitHub 422 on fine-grained PAT creation

Symptom: step 2/5 fails with HTTP 422 from api.github.com/user/personal-access-tokens.

Cause: The GitHub App (raxx-dev-bot) does not have the Fine-grained personal access tokens organization-level permission, which is required for a GitHub App to create PATs on behalf of a bot account.

Fix: Operator action required. 1. Go to GitHub.com → Organization Settings → Developer Settings → GitHub Apps → raxx-dev-bot. 2. Under Permissions, enable Organization permissions → Personal access tokens → Read and write. 3. Re-run provision.

Verification: Re-run provision_handoff_identity.py; step 2/5 succeeds.

Note: This is the most common Phase 1 blocker. Phase 2 (GitHub App install scoping) avoids this limitation entirely.


Failure mode B: Infisical 404 on handoff folder

Symptom: step 4/5 fails with vault write returning HTTP 404.

Cause: The /MooseQuest/handoffs/ folder doesn't exist in Infisical (per feedback_vault_folder_must_exist). The script's step 3/5 creates it, but if step 3 itself fails (e.g., Infisical auth timeout), step 4 will 404.

Fix: 1. Re-run the provision script — step 3/5 will retry folder creation. 2. If Infisical is degraded, wait for vault health to recover before re-provisioning (check https://status.infisical.com).

Verification: Run --verify after re-running provision.


Failure mode C: PAT already exists with same name

Symptom: GitHub returns 422 "token name already taken".

Cause: A previous provision attempt created a PAT that was not revoked. The PAT note includes a date suffix to avoid collisions, but if two provisions run on the same day for the same slug, the second will collide.

Fix:

# Revoke the existing one first.
python3 scripts/agents/provision_handoff_identity.py \
  --revoke --task-slug <slug>

# Then re-provision.
python3 scripts/agents/provision_handoff_identity.py \
  --task-slug <slug> --repo TradeMasterAPI --owner raxx-app

Verification: --verify --task-slug <slug> returns a clean manifest.


Failure mode D: Vault PROVISION_MANIFEST missing at dispatch time

Symptom: Orchestrator blocks dispatch — cannot read PROVISION_MANIFEST for task slug.

Cause: Either provisioning was skipped (Step 0 violated) or provision failed and was not retried.

Fix: Run the full provision sequence:

python3 scripts/agents/provision_handoff_identity.py \
  --task-slug <slug> --repo TradeMasterAPI --owner raxx-app

Then verify:

python3 scripts/agents/provision_handoff_identity.py --verify --task-slug <slug>

Provision a handoff identity

python3 scripts/agents/provision_handoff_identity.py \
  --task-slug <issue-slug>-<YYYYMMDD> \
  --repo TradeMasterAPI \
  --owner raxx-app \
  --ttl-hours 24

Successful output (all to stderr — nothing on stdout):

Provisioning handoff GitHub identity:
  task-slug  : iap-notif-handler-20260621
  PAT name   : gh-pat-handoff-iap-notif-handler-20260621-20260621
  repo       : raxx-app/TradeMasterAPI
  bot        : raxx-dev-bot
  TTL        : 24h (max 24h)
  vault path : /MooseQuest/handoffs/iap-notif-handler-20260621/
  permissions: {'contents': 'write', 'pull_requests': 'write', 'issues': 'write'}

step 1/5: minting bot App token...
  App token minted.
step 2/5: creating fine-grained PAT on GitHub...
  PAT created: id=12345678 name='...' expires=2026-06-22T00:00:00Z
step 3/5: connecting to vault and ensuring folder exists...
  Vault folder confirmed: /MooseQuest/handoffs/iap-notif-handler-20260621/
step 4/5: writing GH_HANDOFF_TOKEN to vault (value not logged)...
  GH_HANDOFF_TOKEN written to /MooseQuest/handoffs/iap-notif-handler-20260621/
step 5/5: writing PROVISION_MANIFEST to vault...
  PROVISION_MANIFEST written to /MooseQuest/handoffs/iap-notif-handler-20260621/

Provisioning complete.
  PAT id     : 12345678
  PAT name   : gh-pat-handoff-iap-notif-handler-20260621-20260621
  expires    : 2026-06-22T00:00:00Z
  vault path : /MooseQuest/handoffs/iap-notif-handler-20260621/GH_HANDOFF_TOKEN
  manifest   : /MooseQuest/handoffs/iap-notif-handler-20260621/PROVISION_MANIFEST
  revoke via : python3 scripts/agents/provision_handoff_identity.py --revoke --task-slug iap-notif-handler-20260621

Verify a provisioned identity

python3 scripts/agents/provision_handoff_identity.py \
  --verify --task-slug <slug>

This reads the PROVISION_MANIFEST from vault, asserts permissions are within the allowed set, then queries GitHub for live PAT metadata. It never prints the token value.


Revoke a handoff identity

On task completion, or immediately if the handoff is compromised:

python3 scripts/agents/provision_handoff_identity.py \
  --revoke --task-slug <slug>

This: 1. Reads the PAT id from the PROVISION_MANIFEST in vault. 2. Deletes the PAT from GitHub (immediate effect). 3. Removes GH_HANDOFF_TOKEN from vault. 4. Removes PROVISION_MANIFEST from vault.

If the PAT was already expired or manually deleted from GitHub, the script logs a warning and continues to clean up vault. Exit 8 if nothing at all was found.


Emergency revocation (PAT compromised)

If a handoff token may be compromised, revoke immediately:

# Revoke via script (preferred — cleans vault too):
python3 scripts/agents/provision_handoff_identity.py --revoke --task-slug <slug>

# Fallback if script is unavailable — revoke directly on GitHub:
# GitHub → Settings → Developer Settings → Personal access tokens →
# Fine-grained tokens → find the token by name → Delete.

# Then clean vault manually:
# Infisical dashboard → MooseQuest project → handoffs/<slug>/ → delete GH_HANDOFF_TOKEN

Blast radius of a compromised handoff PAT (ADR-0131 §6.1): - Scoped to ONE repo (TradeMasterAPI). - Permissions: contents:write, pull_requests:write, issues:write only. - No admin, no secrets, no workflows, no branch-protection bypass. - Cannot merge to main without a PR review (branch protection enforced server-side). - Expires within 24h regardless.


Handoff provisioning gate (checklist)

Before dispatching a handoff agent:

HANDOFF PROVISIONING GATE (GitHub surface)

[ ] task-slug defined and matches GitHub issue slug
[ ] provision_handoff_identity.py --task-slug <slug> completed successfully
[ ] PROVISION_MANIFEST confirmed in vault (run --verify to check)
[ ] PAT expiry < 24h from now
[ ] Revocation plan documented (--revoke command ready)
[ ] After task completion: --revoke will be run

Phase 2 (target state): GitHub App install scoping

Phase 1 (this runbook) uses fine-grained PATs under raxx-dev-bot. Phase 2 (SC-HANDOFF-GITHUB-01 target) moves to per-handoff GitHub App install scoping — the App is installed on only the target repo, with only the three required permissions, and an installation token (1-hour TTL) is generated at dispatch time.

Phase 2 upgrade path: 1. Create a dedicated GitHub App for handoff identity. 2. Install it on TradeMasterAPI with contents:write, pull_requests:write, issues:write. 3. Generate installation tokens at dispatch time instead of fine-grained PATs. 4. Update provision_handoff_identity.py to use the App install flow. 5. Retire fine-grained PAT path.

The security invariants (HANDOFF-INVARIANT-1) are identical in both phases.


Escalation

Wake the operator when: - GitHub API returns 5xx for > 5 minutes during provision/revoke. - A PROVISION_MANIFEST shows permissions outside the allowed set (scope violation). - A compromised handoff PAT may have had write access to the repo for > 1 hour before detection (potential git history audit required). - The Infisical vault is unreachable and a task is awaiting dispatch.

Contact: ops@raxx.app ```