ADR 0130 — Vault access pattern for multi-agent environments: per-agent machine identities + header proxy

Status: Accepted — operator confirmed Option A by dispatching prod adoption (2026-06-20) Date: 2026-06-20 UTC Deciders: software-architect (proposed); operator (confirmed via dispatch of prod adoption task) Scope: Infisical vault access for all agents, services, and CI environments; Infisical MCP server integration

Context

Raxx's Infisical vault (vault.raxx.app) is fronted by Cloudflare Access (CF Access) using the non_identity service-token policy for machine callers. Every machine caller must inject two headers on every request: CF-Access-Client-Id and CF-Access-Client-Secret. Most tool integrations (the Infisical MCP server, the CLI in CI pipelines, SDK calls) have no native support for custom request headers, so they cannot reach the vault without a workaround.

The workaround in production today: the main loop dispatches sre-agent for any vault read that the main loop needs directly. This adds 30 seconds per secret one-shot and makes agent flows for anything secret-adjacent awkward. The problem compounds as more agents and repos onboard.

Two separate problems need solving:

Tool compatibility: The Infisical MCP server (and any future tool) cannot send CF Access headers natively. It must either be wrapped or the access model must change.
Identity sprawl: All machine callers currently share an identity. One compromise exposes all accessible paths. Revocation is an all-or-nothing disruption.

This ADR records the decision on the access pattern that resolves both problems with the least security friction and no regression in posture.

Decision

Adopt per-agent scoped Infisical machine identities as the primary security boundary, combined with a header-injecting local proxy (Option A) to retain CF Access as the outer gate.

Specifically:

Each agent class and each persistent service receives its own Infisical machine identity, scoped by project, environment, path, and permission (read-only by default). A shared cross-agent identity is not acceptable for steady-state operation.
A thin header-injecting reverse proxy runs alongside every tool or execution environment that cannot natively send CF Access headers. All vault-bound traffic routes through this proxy (localhost:2019 by default), which injects the CF-Access-Client-Id and CF-Access-Client-Secret headers from environment variables. CF Access service tokens remain, one per execution environment class (not one per agent).
The Infisical MCP server (@infisical/mcp-server) is wired with a dedicated read-only machine identity (infisical-identity-mcp-main-loop-trademaster) scoped to the paths the main loop needs. The delete-secret and invite-member tools exposed by the MCP server are rendered inert by the identity's RBAC — 403 is the response at the Infisical layer, not a tool-level restriction. The MCP server's INFISICAL_HOST_URL points at the proxy endpoint.
Every new machine identity is enrolled in the Velvet subscription manifest for automated rotation. No identity uses a static long-lived secret without a rotation schedule.
The feedback_main_loop_vault_limit operational constraint (dispatch sre-agent for every vault read) is retired once the MCP server is confirmed working via a successful spike. Spike PASSED 2026-06-20 (issue #3737). Prod identity provisioned 2026-06-20 (PR #3748). Full retirement pending operator completion of vault-write step.

The network option (Option A vs Option B vs Option C) is recorded separately in the design doc (vault-multi-agent-access-pattern.md) and required operator confirmation (OQ1). Operator confirmed Option A by dispatching the production adoption task (2026-06-20). This ADR records Option A as the Accepted direction. when the operator confirms OQ1.

Language choice rationale

No new service introduced. The header-injecting proxy is a thin tool component, not a new independently deployable service. If it is implemented as a standalone process rather than embedded tooling, it is Tier 2 (Python) — it is not on the auth hot path, has no p99 latency budget below 5ms, and has no memory-safety-critical properties. Vault reads are not order-execution paths. A Go or Caddy implementation is also acceptable; the operator may choose.

Consequences

Positive

Main loop can call get-secret directly via the MCP server. No sre-agent dispatch for secret reads. 30-second tax eliminated.
Blast radius of a compromised identity is bounded to that identity's scoped paths. Independent revocation per agent is possible.
CF Access remains the outer gate (Option A), so the two-factor model (CF token + Infisical token) is preserved. No regression in perimeter defense.
Onboarding a new agent takes one provisioning step, not a code change. The proxy and MCP server are tool-layer; the security layer is Infisical RBAC.
Per-identity Infisical audit logs make access attribution unambiguous per agent class.
Rotation is automated via existing Velvet v2 infrastructure.

Negative / risks

The proxy is an additional moving part. If the proxy is not running, vault access fails for tool integrations that depend on it. Requires monitoring.
The MCP server is a young npm package. Its security posture should be reviewed at each version update. Version pinning is required for production use.
The MCP server exposes write tools (create-secret, update-secret, delete-secret, invite-member) that are inert only because of identity RBAC. If the identity's permissions are ever broadened inadvertently, those tools become live. Permission changes require a deliberate step and Velvet audit trail.
IP allowlisting for GH Actions runners requires maintenance when GitHub changes runner IP ranges (known issue with IP-based controls for ephemeral compute).

Neutral

The CF Access service token for the proxy is one token per execution environment class, not one per agent. This is the same granularity as today, but now the Infisical identity layer provides per-agent attribution.
The sre-agent remains available for vault write operations; only read dispatch is retired.

Alternatives considered

Option B — Drop CF Access for machine callers; Infisical native auth + WAF

Lowest friction. MCP server works natively with no proxy. Infisical API is internet-reachable (behind CF WAF) rather than CF Access gated.

Rejected as the primary path because: it reduces the perimeter from two independent auth gates (CF Access + Infisical) to one (Infisical). Any Infisical CVE on the API surface is now directly exploitable. The WAF provides L7 filtering for known patterns but not for novel application vulnerabilities. The friction reduction is real but the security regression is not recoverable if an Infisical 0-day is published.

Reconsidered if: Infisical publishes a formal SOC 2 or equivalent assessment, the WAF managed ruleset gains specific Infisical API rule coverage, or Infisical moves to a verified published security model. This is Option B in the design doc and can be adopted via an amendment to this ADR.

Option C — Private mesh (Cloudflare Tunnel or Tailscale)

Strongest perimeter. Vault not reachable from the public internet.

Rejected as the immediate path because: ephemeral environments (GH Actions runners, ad-hoc Claude Code sessions) require mesh client installation and auth at spawn time, adding 15–30 seconds to every CI job that needs vault access. The friction trade is unfavorable for the spike-stage goal of "reduce friction." Option C is the natural Phase 2 upgrade once Option A is stable.

Option D — Hybrid (mesh for persistent services, proxy for ephemeral)

Architecturally sound. Too operationally complex for v1. Two runbooks, two access patterns, two sets of CF service tokens. Revisit after Option A is stable.

PII collected: Infisical audit logs record machine identity, secret path, timestamp. No secret values. No end-user PII in standard audit log entries unless agent-to-user session correlation is implemented (out of scope for this ADR).
Retention period: Infisical audit log: 90 days (matching WAF log retention, ADR-0077 OQ2). Configurable on self-hosted instance.
Deletion on DSR: Agent-access audit logs do not contain end-user PII by default. If session correlation is added in a future ADR, DSR deletion must cover audit entries linked to the requesting user's sessions.
Audit trail: Every secret read/write/delete generates an Infisical audit event tagged with the machine identity. These events satisfy the system-level audit requirement for secret access (invariant I3). Export to the append-only chain (ADR-0022) is deferred to a follow-on card (OQ2 in the design doc).
Stored credentials: Machine identity client_secret stored in Infisical at /MooseQuest/identities/<name>/ (bootstrapped by Velvet identity) and in execution environment variables. Not committed to git. Not in application code. Satisfies I1 and I4.
Breach notification path: Compromised identity → revoke in Infisical (<60 seconds) → audit log shows reads from unexpected IP/time → operator notified via ops@raxx.app. If compromised identity could read customer-correlated secrets: GDPR 72-hour notification clock starts on confirmation of compromise.
Secrets location + rotation: client_id + client_secret per identity in Infisical. Rotatable without redeploy (Velvet distributes new secret to execution env via the subscription manifest). CF Access service token for the proxy: stored in Infisical and in execution environment; rotatable via Velvet.
Kill-switch: MCP server is session-scoped; stop the session. Persistent service identities revocable in Infisical in <60 seconds. No redeploy required for revocation.

Revisit when

Infisical is moved behind a private mesh (Option C maturation). At that point, the proxy component may be retired and this ADR amended.
The operator selects Option B. This ADR is superseded by an updated version reflecting the CF Access removal.
A new Infisical CVE is published. Review whether the Option B trade becomes more or less acceptable.
The MCP server adds native read-only mode or CF Access header support. The proxy may be simplifiable.
The Velvet subscription manifest covers all machine identities (current state: only external tokens are enrolled). When machine identities are enrolled, the rotation section of this ADR should be verified against the new manifest entries.