Raxx · internal docs

internal · gated

Per-PR Context Swap — Agent Identity Routing

Status: Design complete; sub-cards filed
Date: 2026-05-16 UTC
Tracking: #2070
ADR: 0096
Owner: software-architect


1. Context

Every agent dispatched from the main orchestrator thread must open GitHub PRs, file issues, and push commits. Today all three GitHub Apps (raxx-dev-bot, raxx-ops-bot, raxx-pm-bot) are defined and mapped to agent classes in scripts/agents/agent_bot_map.yaml. The mapping is canonical but not enforced at spawn time — agents that skip the wrapper default to the operator's PAT, which attributes the PR to MooseQuest (Kristerpher).

This has two concrete consequences:

  1. Self-approval block. GitHub prevents an author from approving their own PRs. If every agent-dispatched PR authors as MooseQuest, Kristerpher cannot use the standard review flow — he must admin-merge, bypassing the approval gate.

  2. Audit trail degradation. When eight PRs from a single session all author as MooseQuest, there is no signal distinguishing operator-authored work from agent-dispatched work. The workflow UUID trace (see workflow-uuid-tracing.md) relies on PR author as one classification signal.

Operator picked Option C — per-PR context swap (see #2070): the bot identity that opens a PR is determined by the agent type that produced it, not a shared default. This gives the most precise attribution and the best audit trail, at the cost of higher implementation complexity.


2. Invariants

These constraints are non-negotiable and take precedence over all design choices:


3. Current State Audit

3.1 Agent-to-bot mapping (canonical, from agent_bot_map.yaml)

Agent class Bot identity Vault path
feature-developer, ux-polisher, ux-designer raxx-dev-bot /MooseQuest/raxx-dev-bot/
sre-agent, security-agent, card-groomer raxx-ops-bot /MooseQuest/raxx-ops-bot/
product-manager, software-architect, marketing-strategist, business-legal-researcher, data-scientist raxx-pm-bot /MooseQuest/raxx-pm-bot/

The mapping is complete. No new GitHub Apps are required. The "conductor" role (main orchestrator thread) does not open PRs — it dispatches agents. The orchestrator's only GitHub actions are issue comments and agent spawn calls.

3.2 Identity verification

None of the three Apps map to the operator's GitHub account (MooseQuest). Operator self-approval is unblocked the moment all agent-dispatched PRs route through these identities.

3.3 Current gap

agent_bot_map.yaml exists and is correct, but:


4. Token Routing Design

4.1 Vault paths (existing, no changes required)

/MooseQuest/raxx-dev-bot/APP_ID
/MooseQuest/raxx-dev-bot/INSTALLATION_ID
/MooseQuest/raxx-dev-bot/PRIVATE_KEY_PEM

/MooseQuest/raxx-ops-bot/APP_ID
/MooseQuest/raxx-ops-bot/INSTALLATION_ID
/MooseQuest/raxx-ops-bot/PRIVATE_KEY_PEM

/MooseQuest/raxx-pm-bot/APP_ID
/MooseQuest/raxx-pm-bot/INSTALLATION_ID
/MooseQuest/raxx-pm-bot/PRIVATE_KEY_PEM

The vault paths are already defined and populated per the existing agent-github-identity.md design. No new secrets or vault folders are needed.

4.2 Resolver: agent type → bot name

The resolver is scripts/agents/agent_bot_map.yaml. No code changes; this file is already the canonical source. Any new agent class added to the codebase MUST have an entry in this map before its first PR can open.

4.3 Token injection — orchestrator responsibility

When the orchestrator dispatches an agent, it reads agent_bot_map.yaml to determine the agent's bot, then either:

The session pattern is preferred for agents that make ≥3 gh calls (most feature-developer and software-architect runs). The per-call pattern is acceptable for lightweight agents (card-groomer filing a single issue).

4.4 Failure mode and fallback

If token minting fails, with_bot_token.sh logs a warning to stderr and falls back to the operator PAT. This fallback must be surfaced in the agent's final output so the operator notices. A PR opened via the fallback path is still functional but will show MooseQuest as author — the self-approval block re-appears. The fallback is acceptable for agent tasks that don't open PRs.


5. Audit Trail Extension

Each agent-dispatched PR body must include a standard header block:

Agent: <agent-class>
Bot identity: <raxx-dev-bot | raxx-ops-bot | raxx-pm-bot>
Workflow UUID: <wfl_* if in scope; "n/a" for doc-only changes>
Orchestrator session: <session timestamp UTC>
Refs #<parent-issue>

This is a convention enforced through agent prompt templates, not code. The software-architect and product-manager agents already include Refs #NNN in PR bodies per existing conventions; the Agent: and Bot identity: fields are the new additions.

Traceability chain: GitHub PR author (raxx-dev-bot[bot]) → PR body Agent: feature-developerWorkflow UUID: wfl_* → full trace in trace_events table (see workflow-uuid-tracing.md).


6. Agent Prompt Template Additions

Each agent definition at .claude/agents/<agent>.md requires a "GitHub identity" preamble (5–8 lines) that:

  1. Names the bot: "You open PRs and file issues as raxx-dev-bot."
  2. References the session pattern: "For ≥3 gh calls, use the session pattern."
  3. States the PR body convention: "Include the Agent: / Bot identity: / Workflow UUID: header block."
  4. States the fallback warning requirement: "If token mint fails, surface the warning in your final output."

The software-architect and sre-agent definitions already include partial GitHub identity preambles (referencing with_bot_token.sh). The update standardizes the format across all agents and adds the PR body convention.


7. Sequence Diagram

sequenceDiagram
    participant Op as Operator (Kristerpher)
    participant Orch as Orchestrator (main thread)
    participant Map as agent_bot_map.yaml
    participant Vault as Infisical Vault
    participant GH as GitHub API
    participant Agent as Dispatched Agent

    Op->>Orch: dispatch feature-developer on #2150
    Orch->>Map: lookup bot for feature-developer
    Map-->>Orch: raxx-dev-bot
    Orch->>Vault: GET /MooseQuest/raxx-dev-bot/{APP_ID,INSTALLATION_ID,PRIVATE_KEY_PEM}
    Vault-->>Orch: credentials
    Orch->>GH: POST /app/installations/{id}/access_tokens (JWT signed with PEM)
    GH-->>Orch: GH_TOKEN=ghs_...  (1-hour validity)
    Orch->>Agent: spawn with GH_TOKEN injected
    Agent->>Agent: implements feature
    Agent->>GH: gh pr create (GH_TOKEN = raxx-dev-bot token)
    GH-->>Agent: PR #NNNN authored by raxx-dev-bot[bot]
    Agent-->>Orch: PR URL + sub-card links
    Orch-->>Op: "PR #NNNN ready for review (authored by raxx-dev-bot)"
    Op->>GH: approve PR #NNNN  ← self-approval block GONE

8. Migrations

No schema migrations. No application code changes.

Changes are confined to: - scripts/agents/agent_bot_map.yaml — already correct; no changes needed - .claude/agents/*.md — prompt template additions (6 files) - scripts/agents/mint_github_token.py — potential minor update if orchestrator inject path needs a --agent-class flag (feature-developer evaluates this in SC-IDENT-2)

Rollback: revert .claude/agents/*.md prompt changes. All agents fall back to per-call with_bot_token.sh usage or operator PAT. No data migration needed.


9. Rollout Plan

Phase Gate Description
Dark SC-IDENT-5 complete Verify all 3 Apps are operational; tokens mint clean from vault
SC-IDENT-1 vault entries confirmed Confirm vault path schema; document any gaps
SC-IDENT-2 SC-IDENT-1 done Orchestrator injects per-agent GH_TOKEN at spawn time
SC-IDENT-3 SC-IDENT-2 done Agent prompt templates updated; PR body convention added
SC-IDENT-4 SC-IDENT-3 done Audit header block (Agent/Bot/Workflow UUID) in all agent PR bodies
SC-IDENT-6 SC-IDENT-3 done Smoke test: open 1 PR per agent type; verify author is the expected bot
GA SC-IDENT-6 passes Zero agent-dispatched PRs authored as MooseQuest

All phases can run sequentially within one sprint. No feature flags needed — this is infrastructure-layer plumbing, not a user-visible feature.


10. Security Considerations

PII collected: None. Bot identities and workflow UUIDs are operational metadata, not personal data.

Retention: PR body audit headers are retained for the lifetime of the PR. Vault credentials rotate on a 365-day cadence per docs/ops/runbooks/rotation/github-app-installation-token.md.

DSR: Not applicable — no user PII flows through bot identity routing.

Credential replay: Bot PEM keys never leave Infisical. The minted token (ghs_*) has 1-hour validity. The orchestrator's only durable artifact is the Infisical Machine Identity (INFISICAL_CLIENT_ID + INFISICAL_CLIENT_SECRET), which does not grant GitHub access directly.

Audit trail: Every PR opened by an agent bot is auditable by bot identity on GitHub (filter by author). The PR body convention adds agent class and workflow UUID for cross-reference with the trace store.

Secrets: PRIVATE_KEY_PEM for all three bots lives at /MooseQuest/<bot-name>/PRIVATE_KEY_PEM in Infisical. Rotatable without redeploy. The mint script reads them live on every invocation — no local cache.

Kill-switch: Revoking a bot's GitHub App installation immediately prevents any further token minting for that bot. Agents fall back to operator PAT (with warning). Full revocation of all three Apps is the nuclear kill-switch for agent-dispatched GitHub actions.

Breach: If a minted ghs_* token is leaked, it expires in ≤1 hour and is scoped to the single installation. Impact: an attacker could open PRs as the bot identity for up to 1 hour. Mitigation: revoke the App installation immediately upon detection. No credential has replay value beyond the 1-hour window.


11. Open Questions

None blocking sub-card implementation. Operator has locked Option C. All vault paths are already populated.

One deferred question for post-rollout: should the orchestrator's own issue comments (not PR opens) also use a bot identity, or is operator-attributed orchestrator commentary acceptable? This is aesthetic, not structural — the self-approval block only applies to PR authorship.