System: Infisical vault — per-secret env coverage (prod / staging / dev)
Owner: sre-agent / operator
Script: scripts/vault/audit_coverage.py
Last incident: — (no incident; proactive audit)
Last reviewed: 2026-05-12 UTC
Related issues: #596 (Phase 1 — audit script + snapshot)
Related docs: docs/ops/vault-token-taxonomy.md, docs/ops/2026-05-12-vault-coverage-snapshot.md
The Infisical vault holds secrets across three environments: prod, staging,
and dev. Drift occurs when a secret is provisioned in one environment but
not another — a common source of "works in prod, breaks in staging" incidents.
This audit is read-only. It lists secrets, compares presence and version across environments, and reports drift. It does not modify vault contents.
SSM out of scope. AWS-resident workload secrets live in AWS Parameter Store
(SSM), not Infisical. Per feedback_aws_workloads_use_ssm_not_vault.md, those
secrets are managed separately via Terraform + SSM and are never written to
Infisical. Do not attempt to audit SSM via this script.
KeyError, os.environ raise, or getenv() returning None where a
value is required).not_found in staging but healthy in
prod (Velvet rotation logs or console /secrets page).401 / 403
because a vendor key is absent from staging vault.Run the audit script — produces the coverage matrix with all presence gaps in one view. This is always step one.
Cross-reference the taxonomy — docs/ops/vault-token-taxonomy.md
Section 3 documents which secrets are intentionally prod-only vs.
expected in both prod and staging. Presence gaps for account-wide tokens
(Cloudflare, GitHub, Anthropic) are by design, not drift.
Check Infisical directly — if a secret shows as missing in the
audit but you believe it exists, confirm via the Infisical UI or the
Infisical CLI:
bash
infisical secrets get <SECRET_NAME> --path /MooseQuest/<vendor>/ \
--env staging --plain
Check the secret path — Infisical returns 404 for a valid secret name
if the folder path does not exist in that environment. See
feedback_vault_folder_must_exist.md: folders must be created via
POST /api/v1/folders before secrets can be written to a new path.
| Env var | Source |
|---|---|
INFISICAL_CLIENT_ID |
Infisical Universal Auth machine identity — read from vault at /MooseQuest/infisical/ |
INFISICAL_CLIENT_SECRET |
Paired with above |
INFISICAL_PROJECT_ID |
Infisical project / workspace ID — visible in project settings |
INFISICAL_HOST |
Optional. Default: https://app.infisical.com |
INFISICAL_PATH_PREFIX |
Optional. Default: /MooseQuest/. Set to a sub-path to limit scope. |
CF_ACCESS_CLIENT_ID |
Required only if vault host is behind Cloudflare Access (e.g., self-hosted vault.raxx.app) |
CF_ACCESS_CLIENT_SECRET |
Paired with above |
python3 scripts/vault/audit_coverage.py
python3 scripts/vault/audit_coverage.py \
--output docs/ops/2026-05-12-vault-coverage-snapshot.md
Replace the date in the filename with today's date for a new snapshot. Commit the updated file so the coverage history is version-controlled.
python3 scripts/vault/audit_coverage.py --format csv > /tmp/vault-coverage.csv
INFISICAL_PATH_PREFIX=/MooseQuest/heroku/ \
python3 scripts/vault/audit_coverage.py
| Code | Meaning |
|---|---|
| 0 | Audit completed. Drift may exist — check the report. |
| 1 | Missing required env vars or vault unreachable. |
| Cell value | Meaning |
|---|---|
vN (e.g., v3) |
Secret present; Infisical version number N |
— |
Secret absent from this environment |
"Drift" means the secret is present in at least one environment but absent from another. Drift is not always wrong — account-wide tokens (Cloudflare, AWS, GitHub) are intentionally prod-only. Always cross-reference the taxonomy before treating a drift entry as a defect.
Account-wide token (one vendor account regardless of Raxx environment):
prod only is correct. Examples: CF_PAGES_DEPLOY, ANTHROPIC_API_KEY,
AWS_ACCESS_KEY_ID, GITHUB_API_READONLY_TOKEN. No action.
Env-specific token (vendor has separate accounts/servers per env):
must be present in every env where the Raxx service runs. Examples:
HEROKU_API_KEY, ALPACA_PAPER_API_KEY_ID, POSTMARK_SERVER_TOKEN,
CF_ACCESS_SVC_CONSOLE. If staging is missing → provision.
Dev-only token (vendor test-mode):
dev env only. Example: STRIPE_RESTRICTED_KEY (Stripe test-mode key).
If missing from dev → provision the test-mode key.
CF Access service tokens: each Raxx environment has its own CF Access
application (different app_id). Separate service tokens must exist for
prod and staging. Both envs should be populated.
Symptom: Script prints [error] Missing required env vars: INFISICAL_CLIENT_ID ...
Cause: The Universal Auth credentials are not in the shell environment.
Fix:
# Option 1 — export directly (for one-off use; do not persist to shell history)
export INFISICAL_CLIENT_ID="<value>"
export INFISICAL_CLIENT_SECRET="<value>"
export INFISICAL_PROJECT_ID="<value>"
# Option 2 — read from vault via Infisical CLI (bootstrap token required)
eval "$(infisical export --env prod --path /MooseQuest/infisical/ \
| grep -E '^(INFISICAL_CLIENT_ID|INFISICAL_CLIENT_SECRET|INFISICAL_PROJECT_ID)=')"
Verification: Re-run the script. [info] Vault host: line appears = credentials accepted.
Symptom: Script prints [error] Failed to obtain Infisical auth token.
Cause: Either the credentials are set but invalid (wrong client ID / secret), or the Infisical host is unreachable.
Fix:
1. Verify the vault host is reachable:
bash
curl -sS -o /dev/null -w "%{http_code}" "${INFISICAL_HOST:-https://app.infisical.com}/api/status"
Expect 200. Anything else = host unreachable.
If behind Cloudflare Access (vault.raxx.app), verify the CF Access
service-token credentials are set:
bash
echo "CF_ACCESS_CLIENT_ID=${CF_ACCESS_CLIENT_ID:-(not set)}"
echo "CF_ACCESS_CLIENT_SECRET=${CF_ACCESS_CLIENT_SECRET:-(not set)}"
If not set, retrieve from vault at /MooseQuest/cloudflare/ and export.
See docs/ops/runbooks/cf-access-service-token-provisioning.md.
Check Infisical status: https://status.infisical.com
Symptom: Script reports 0 secrets found in 'staging' (or dev) but prod
has secrets.
Cause (most likely): The path prefix does not exist as a folder in that environment. Infisical returns an empty list (not 404) for a valid env with no secrets at the given path.
Diagnosis:
# Confirm the /MooseQuest/ folder exists in staging
curl -s \
-H "Authorization: Bearer $INFISICAL_TOKEN" \
-H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" \
-H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \
"$INFISICAL_HOST/api/v1/folders?workspaceId=$INFISICAL_PROJECT_ID&environment=staging&path=/"
If /MooseQuest/ is absent from the staging folder list, no secrets have ever
been provisioned to staging under this prefix. This is a genuine coverage gap —
all expected staging secrets are missing.
Fix: Provision the folder and required secrets per the classification table
in docs/ops/vault-token-taxonomy.md Section 3. Create the folder first:
curl -s -X POST \
-H "Authorization: Bearer $INFISICAL_TOKEN" \
-H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" \
-H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{"workspaceId":"'"$INFISICAL_PROJECT_ID"'","environment":"staging","name":"MooseQuest","path":"/"}' \
"$INFISICAL_HOST/api/v1/folders"
Then create sub-folders per vendor as needed, then provision the secrets.
Warning: Per
feedback_vault_folder_must_exist.md, Infisical returns 404 if a secret is written to a path whose folder does not exist. Always create the folder before writing the secret. ThePOST /api/v1/folderscall is idempotent — re-running it on an existing folder is safe.
Symptom: A secret is present in both prod and staging but the version
numbers differ significantly (e.g., prod is v8, staging is v2).
Cause: The secret was rotated in prod but the staging equivalent was never updated. Staging is running a stale credential.
Fix: This is a rotation gap, not a coverage gap. Route to Velvet rotation
pipeline or manual rotation per the vendor-specific SOP in
docs/ops/runbooks/rotation/.
For each drift row that is not intentionally prod-only:
Confirm the secret is missing (not just at a different path):
bash
infisical secrets get <SECRET_NAME> --path /MooseQuest/<vendor>/ \
--env staging --plain
# Expect: value printed. If error: secret is genuinely absent.
Check the vendor-specific folder exists in staging (see Failure mode C above).
Provision the missing secret in the Infisical UI or via API. Use the staging credential for that vendor (not the prod value — staging must have its own isolated credential where the vendor supports it).
Add the __EXPIRES_AT companion secret for the new entry.
Re-run the audit to confirm the gap is resolved:
bash
python3 scripts/vault/audit_coverage.py | grep "<SECRET_NAME>"
Update the snapshot file and commit:
bash
python3 scripts/vault/audit_coverage.py \
--output docs/ops/2026-05-12-vault-coverage-snapshot.md
git add docs/ops/2026-05-12-vault-coverage-snapshot.md
git commit -m "ops(vault): update coverage snapshot after remediation"
The audit should be run monthly and after any vault provisioning change. No automated scheduler is wired for this yet — tracked in #596 action items.
To add it to the nightly GH Actions digest (future):
1. Add a vault-coverage-audit step to .github/workflows/nightly-ops.yml
that runs audit_coverage.py --format md and posts the drift section to
Slack if drift_count > 0.
2. Wire INFISICAL_CLIENT_ID, INFISICAL_CLIENT_SECRET, and
INFISICAL_PROJECT_ID as GitHub Actions secrets sourced from vault.
This script is read-only. There is no emergency stop — it cannot modify vault
contents. If the script is running and you want to stop it, Ctrl-C is
sufficient.
Escalate to operator (Kristerpher) when:
HEROKU_API_KEY missing from prod) — this is a SEV-2:
a prod service may be running without rotation coverage.15 minutes (see
docs/ops/runbooks/infisical-cloud-config.md).
sensitivity:critical (live trading keys, Heroku platform
key) appears in an unexpected environment.scripts/vault/audit_coverage.pydocs/ops/2026-05-12-vault-coverage-snapshot.mddocs/ops/vault-token-taxonomy.mddocs/ops/runbooks/infisical-cloud-config.mddocs/ops/runbooks/cf-access-service-token-provisioning.mdfeedback_vault_folder_must_exist.mdfeedback_aws_workloads_use_ssm_not_vault.md