Handoff Agent Provisioning Runbook
System: handoff-agent-identity / SC-HANDOFF-GITHUB-01
Owner: ops
ADR: ADR-0131 (docs/architecture/adr/0131-handoff-agent-least-privilege-identity.md)
Design doc: docs/architecture/handoff-agent-least-privilege-identity.md
Last reviewed: 2026-06-21
What this runbook covers
Every agent handoff dispatch must begin with a Step 0: provision a scoped,
time-bound, independently revocable GitHub identity for the specific task.
This runbook documents how to use provision_handoff_identity.py and what
to do when it fails.
"Handoff agent" means: any Claude Cloud, scheduled CCR agent, new-repo agent,
or unattended autonomous task. NOT live, operator-present sessions (those use
the shared bot App tokens from mint_github_token.py).
Prerequisite
The three Infisical env vars must be in your shell:
INFISICAL_CLIENT_ID
INFISICAL_CLIENT_SECRET
INFISICAL_PROJECT_ID
These are the same credentials used by mint_github_token.py. See
docs/ops/runbooks/agent-bot-tokens-setup.md for one-time setup.
Optionally, if your vault sits behind Cloudflare Access:
CF_ACCESS_CLIENT_ID
CF_ACCESS_CLIENT_SECRET
How to tell it's broken
provision_handoff_identity.py --task-slug ...exits non-zero.- Infisical returns 404 on the
/MooseQuest/handoffs/path (folder not yet created — the script creates it, but if vault is unreachable this will surface here). - GitHub API returns 422 on fine-grained PAT creation (App does not have
Personal access tokensadmin permission on the org — operator action required). - The PROVISION_MANIFEST is missing from vault when the orchestrator checks.
How to diagnose (in order)
-
Check Infisical env vars are set:
echo "${INFISICAL_CLIENT_ID:0:4}..."Expected: shows the first 4 chars of your client ID. If empty, source your shell config. -
Run provision with verbose output:
python3 scripts/agents/provision_handoff_identity.py \ --task-slug test-$(date +%Y%m%d) \ --repo TradeMasterAPI \ --owner raxx-appThe script emits step-by-step progress to stderr. All steps numbered 1–5 must complete. Failure message pinpoints whether it's Infisical, GitHub, or vault. -
If GitHub returns 403/422 on PAT creation, confirm
raxx-dev-botApp has theFine-grained personal access tokensorganization permission in GitHub Settings → Developer Settings → GitHub Apps → raxx-dev-bot → Permissions. -
If Infisical returns 404 on secret write, the folder creation step (step 3/5) failed silently. Run manually:
# List folders at the handoff path to confirm existence. curl -s -H "Authorization: Bearer <infisical-token>" \ "https://app.infisical.com/api/v1/folders?workspaceId=<id>&environment=prod&path=/MooseQuest/handoffs/"
Known failure modes
Failure mode A: GitHub 422 on fine-grained PAT creation
Symptom: step 2/5 fails with HTTP 422 from api.github.com/user/personal-access-tokens.
Cause: The GitHub App (raxx-dev-bot) does not have the
Fine-grained personal access tokens organization-level permission, which
is required for a GitHub App to create PATs on behalf of a bot account.
Fix: Operator action required.
1. Go to GitHub.com → Organization Settings → Developer Settings →
GitHub Apps → raxx-dev-bot.
2. Under Permissions, enable Organization permissions → Personal access tokens → Read and write.
3. Re-run provision.
Verification: Re-run provision_handoff_identity.py; step 2/5 succeeds.
Note: This is the most common Phase 1 blocker. Phase 2 (GitHub App install scoping) avoids this limitation entirely.
Failure mode B: Infisical 404 on handoff folder
Symptom: step 4/5 fails with vault write returning HTTP 404.
Cause: The /MooseQuest/handoffs/ folder doesn't exist in Infisical
(per feedback_vault_folder_must_exist). The script's step 3/5 creates it,
but if step 3 itself fails (e.g., Infisical auth timeout), step 4 will 404.
Fix:
1. Re-run the provision script — step 3/5 will retry folder creation.
2. If Infisical is degraded, wait for vault health to recover before
re-provisioning (check https://status.infisical.com).
Verification: Run --verify after re-running provision.
Failure mode C: PAT already exists with same name
Symptom: GitHub returns 422 "token name already taken".
Cause: A previous provision attempt created a PAT that was not revoked. The PAT note includes a date suffix to avoid collisions, but if two provisions run on the same day for the same slug, the second will collide.
Fix:
# Revoke the existing one first.
python3 scripts/agents/provision_handoff_identity.py \
--revoke --task-slug <slug>
# Then re-provision.
python3 scripts/agents/provision_handoff_identity.py \
--task-slug <slug> --repo TradeMasterAPI --owner raxx-app
Verification: --verify --task-slug <slug> returns a clean manifest.
Failure mode D: Vault PROVISION_MANIFEST missing at dispatch time
Symptom: Orchestrator blocks dispatch — cannot read PROVISION_MANIFEST for task slug.
Cause: Either provisioning was skipped (Step 0 violated) or provision failed and was not retried.
Fix: Run the full provision sequence:
python3 scripts/agents/provision_handoff_identity.py \
--task-slug <slug> --repo TradeMasterAPI --owner raxx-app
Then verify:
python3 scripts/agents/provision_handoff_identity.py --verify --task-slug <slug>
Provision a handoff identity
python3 scripts/agents/provision_handoff_identity.py \
--task-slug <issue-slug>-<YYYYMMDD> \
--repo TradeMasterAPI \
--owner raxx-app \
--ttl-hours 24
Successful output (all to stderr — nothing on stdout):
Provisioning handoff GitHub identity:
task-slug : iap-notif-handler-20260621
PAT name : gh-pat-handoff-iap-notif-handler-20260621-20260621
repo : raxx-app/TradeMasterAPI
bot : raxx-dev-bot
TTL : 24h (max 24h)
vault path : /MooseQuest/handoffs/iap-notif-handler-20260621/
permissions: {'contents': 'write', 'pull_requests': 'write', 'issues': 'write'}
step 1/5: minting bot App token...
App token minted.
step 2/5: creating fine-grained PAT on GitHub...
PAT created: id=12345678 name='...' expires=2026-06-22T00:00:00Z
step 3/5: connecting to vault and ensuring folder exists...
Vault folder confirmed: /MooseQuest/handoffs/iap-notif-handler-20260621/
step 4/5: writing GH_HANDOFF_TOKEN to vault (value not logged)...
GH_HANDOFF_TOKEN written to /MooseQuest/handoffs/iap-notif-handler-20260621/
step 5/5: writing PROVISION_MANIFEST to vault...
PROVISION_MANIFEST written to /MooseQuest/handoffs/iap-notif-handler-20260621/
Provisioning complete.
PAT id : 12345678
PAT name : gh-pat-handoff-iap-notif-handler-20260621-20260621
expires : 2026-06-22T00:00:00Z
vault path : /MooseQuest/handoffs/iap-notif-handler-20260621/GH_HANDOFF_TOKEN
manifest : /MooseQuest/handoffs/iap-notif-handler-20260621/PROVISION_MANIFEST
revoke via : python3 scripts/agents/provision_handoff_identity.py --revoke --task-slug iap-notif-handler-20260621
Verify a provisioned identity
python3 scripts/agents/provision_handoff_identity.py \
--verify --task-slug <slug>
This reads the PROVISION_MANIFEST from vault, asserts permissions are within the allowed set, then queries GitHub for live PAT metadata. It never prints the token value.
Revoke a handoff identity
On task completion, or immediately if the handoff is compromised:
python3 scripts/agents/provision_handoff_identity.py \
--revoke --task-slug <slug>
This:
1. Reads the PAT id from the PROVISION_MANIFEST in vault.
2. Deletes the PAT from GitHub (immediate effect).
3. Removes GH_HANDOFF_TOKEN from vault.
4. Removes PROVISION_MANIFEST from vault.
If the PAT was already expired or manually deleted from GitHub, the script logs a warning and continues to clean up vault. Exit 8 if nothing at all was found.
Emergency revocation (PAT compromised)
If a handoff token may be compromised, revoke immediately:
# Revoke via script (preferred — cleans vault too):
python3 scripts/agents/provision_handoff_identity.py --revoke --task-slug <slug>
# Fallback if script is unavailable — revoke directly on GitHub:
# GitHub → Settings → Developer Settings → Personal access tokens →
# Fine-grained tokens → find the token by name → Delete.
# Then clean vault manually:
# Infisical dashboard → MooseQuest project → handoffs/<slug>/ → delete GH_HANDOFF_TOKEN
Blast radius of a compromised handoff PAT (ADR-0131 §6.1):
- Scoped to ONE repo (TradeMasterAPI).
- Permissions: contents:write, pull_requests:write, issues:write only.
- No admin, no secrets, no workflows, no branch-protection bypass.
- Cannot merge to main without a PR review (branch protection enforced server-side).
- Expires within 24h regardless.
Handoff provisioning gate (checklist)
Before dispatching a handoff agent:
HANDOFF PROVISIONING GATE (GitHub surface)
[ ] task-slug defined and matches GitHub issue slug
[ ] provision_handoff_identity.py --task-slug <slug> completed successfully
[ ] PROVISION_MANIFEST confirmed in vault (run --verify to check)
[ ] PAT expiry < 24h from now
[ ] Revocation plan documented (--revoke command ready)
[ ] After task completion: --revoke will be run
Phase 2 (target state): GitHub App install scoping
Phase 1 (this runbook) uses fine-grained PATs under raxx-dev-bot.
Phase 2 (SC-HANDOFF-GITHUB-01 target) moves to per-handoff GitHub App install
scoping — the App is installed on only the target repo, with only the three
required permissions, and an installation token (1-hour TTL) is generated at
dispatch time.
Phase 2 upgrade path:
1. Create a dedicated GitHub App for handoff identity.
2. Install it on TradeMasterAPI with contents:write, pull_requests:write, issues:write.
3. Generate installation tokens at dispatch time instead of fine-grained PATs.
4. Update provision_handoff_identity.py to use the App install flow.
5. Retire fine-grained PAT path.
The security invariants (HANDOFF-INVARIANT-1) are identical in both phases.
Escalation
Wake the operator when: - GitHub API returns 5xx for > 5 minutes during provision/revoke. - A PROVISION_MANIFEST shows permissions outside the allowed set (scope violation). - A compromised handoff PAT may have had write access to the repo for > 1 hour before detection (potential git history audit required). - The Infisical vault is unreachable and a task is awaiting dispatch.
Contact: ops@raxx.app ```