Last verified against: Velvet v2 (2026-05-04 UTC)
Parent epic: #907
Design doc: docs/architecture/velvet/v2-rotation-flows.md
Handler-author guide: docs/architecture/velvet-handler-author-guide.md
| Situation | Use Velvet | Use manual procedure |
|---|---|---|
| Scheduled rotation of a credential with registered subscribers | Yes | No |
| Emergency revocation after a suspected leak | Yes — revocation flow | No |
Credential with active: false in the subscription manifest |
No — fix the manifest first | Proceed manually per vendor SOP |
| Velvet itself is down or unreachable | No — use vendor SOP directly | Yes |
Velvet's own bootstrap credentials (INFISICAL_CLIENT_SECRET, HK_VELVET_BOOTSTRAP) |
No — circular dependency | Yes — section 8 below |
| Vendor does not support programmatic token creation (e.g. CF User API tokens) | Operator-assisted Velvet (OPERATOR_MANUAL flow) | Parallel manual path |
Feature flag velvet_v2_rotation is off |
No — Velvet returns 503 | Yes — use vendor SOP |
| Credential has no subscribers registered in the manifest | No | Yes — use per-credential SOP in docs/ops/runbooks/rotation/ |
Complete every item before triggering a rotation. A stalled pre-flight is cheaper than a stalled distribute.
{"status": "ok"}:curl -sf https://raxx-velvet-prod.herokuapp.com/healthz
curl -sf https://raxx-velvet-staging.herokuapp.com/healthz
If either returns non-200 or times out, stop. Do not rotate against a degraded Velvet.
[ ] Open a FreeScout ticket for this rotation. You will need the ticket ID at the revoke confirmation gate. Format: ROT-YYYY-CRED_NAME (example: ROT-2026-05-HK_PLATFORM_FULL).
[ ] Confirm the feature flag is on:
curl -sf https://raxx-velvet-prod.herokuapp.com/flags | python3 -m json.tool | grep velvet_v2_rotation
Expected: "velvet_v2_rotation": true. If false, rotation endpoints return 503 and you must use the manual vendor SOP.
[ ] Confirm you have the correct environment — look at the console environment banner. Red = prod, purple = staging. Do not rotate prod credentials against the staging Velvet app.
[ ] Confirm the credential is listed in the manifest — inspect docs/architecture/velvet/subscription-manifest.yml or call:
curl -sf https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/subscribers
You should see the expected consumer list. If the list is empty or the endpoint returns 404, the credential is not registered.
curl -sf "https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations?limit=5"
If the most recent job is in distribute_partial or revoke_failed, resolve it before starting a new job. Two overlapping rotation jobs for the same credential are not supported.
https://raxx-console-prod.herokuapp.com/security/secretsAll endpoints require a rotate-scoped service token in the Authorization: Bearer <token> header.
Step 1 — Create the job:
POST https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotate
Content-Type: application/json
{
"flow_type": "operational",
"idempotency_key": "<uuid-v4>",
"force_revoke": false
}
Response (202):
{ "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "init" }
Step 2 — Run Stage 1 (Verify):
POST https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>/stage
Content-Type: application/json
{ "action": "verify" }
Step 3 — Proceed to Mint + Distribute:
POST .../rotations/<job_id>/stage
{ "action": "proceed_mint" }
Step 4 — Proceed to Revoke (after validating all consumers):
POST .../rotations/<job_id>/stage
{ "action": "proceed_revoke" }
Polling for status:
GET https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>
SSE stream (real-time status):
GET https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>/stream
Accept: text/event-stream
The Stage Wizard shows live status via SSE. Each consumer row updates in real time. A spinner indicates in-progress; green check = succeeded; amber = in-progress; red X = failed.
Poll GET /tokens/{name}/rotations/{job_id} every 5 seconds. The response includes:
{
"job_id": "...",
"status": "distributing",
"consumers": [
{ "consumer_id": "heroku-config-console-prod", "distribute_status": "succeeded" },
{ "consumer_id": "heroku-config-api-prod", "distribute_status": "in_progress" },
{ "consumer_id": "github-actions-heroku-api-key", "distribute_status": "pending" }
],
"created_at": "2026-05-04T06:00:00Z",
"updated_at": "2026-05-04T06:00:42Z"
}
heroku logs --tail --app raxx-velvet-prod | grep job_id=<your-job-id>
Every rotation_jobs row progresses through this state machine. The status column tells you exactly where a job is.
| Status | Meaning | Can advance | Operator action required |
|---|---|---|---|
init |
Job created; nothing touched | Yes | Click "Verify" |
verifying |
Velvet probing the vendor with the current token | Automatic | Wait |
verify_failed |
Auth probe failed; current token may be invalid | Retry or abort | See section 6 |
verified |
Probe confirmed; operator gate before mint | Yes | Click "Proceed to mint" |
minting |
Velvet calling vendor to mint new token | Automatic | Wait |
mint_failed |
Vendor mint API returned error | Abort only | Old token still valid |
minted |
New token in hand; not yet distributed | Yes (automatic fan-out) | Wait |
distributing |
Fan-out to registered consumers in progress | Automatic | Wait |
distribute_partial |
Some consumers failed; others succeeded | Retry or abort | Retry failed rows or section 6 |
distribute_failed |
All consumers failed | Abort | New token minted but not distributed — see abort table |
distributed |
All consumers received new token | Automatic validation | Wait |
validating |
Healthchecks running on all consumers | Automatic | Manual-confirm rows if needed |
validate_partial |
Some healthchecks failed | Retry or abort | Retry failed rows |
validate_failed |
All healthchecks failed | Abort | Investigate consumer reachability |
validated |
All consumers healthy with new token | Yes | Type-to-confirm + FreeScout ID, then click Revoke |
revoking |
Velvet calling vendor to revoke old token | Automatic | Wait |
revoke_failed |
Vendor revoke API returned error | Retry or mark manual | See section 6 |
done |
Rotation complete; old token revoked | Terminal | None |
aborted |
Operator or system aborted | Terminal | Check residual state (section 6) |
Revocation flow statuses:
| Status | Meaning |
|---|---|
rev_init |
Revocation job created |
rev_revoking |
Vendor revoke call in flight |
rev_revoke_failed |
Vendor rejected the revoke call |
rev_revoked |
Vendor confirmed revocation; validating consumers |
rev_validating |
Healthchecks running (expecting 401 from each consumer) |
rev_leaked |
One or more consumers returned non-401 after revocation — SEV1 |
rev_done |
All consumers confirmed locked out |
Definition of stuck: A job has been in the same status for more than 5 minutes without a state change, OR a job is in a terminal state (distribute_partial, revoke_failed, aborted) that requires operator action.
verifyingThe auth probe is timing out or being rate-limited.
heroku logs --app raxx-velvet-prod | grep job_id=<id>ConnectionError, Timeout, or HTTP status code.mintingmint failed entries.distribute_partialSome consumers received the new token; others did not. The old token is still valid.
Recovery options:
Option A (preferred): Retry failed rows in the console. Click "Retry failed" for each red row. Velvet will re-attempt the PATCH/INFISICAL_WRITE for those consumers only.
Option B (if retries keep failing): Identify the failing consumer(s) by their consumer_id in the job status. Manually push the new token to that consumer using the vendor's own interface. Once done, click "Manual confirm" in the console to mark that row as succeeded. After all rows are green, advance to Stage 3.
Option C (if you need to abort): Click Abort. The new token is now minted but distributed only to some consumers. You must manually delete the new token from vault and re-sync the affected consumers to the old token. The console shows the residual consumer list. File a FreeScout ticket and follow per-vendor SOP in docs/ops/runbooks/rotation/.
validate_partial or validate_failedHealthchecks failed on one or more consumers after the new token was distributed.
validate_status: failed in the job status response.heroku ps --app <app-name> — if dyno is crashed, that's your answer.healthcheck_endpoint: null in the manifest, a "Manual confirm" button appears. Verify the consumer manually, then click confirm to mark it as validated.revoke_failedThe new token is distributed and validated. The old token has not been revoked.
old_auth_id or equivalent from the job metadata (visible in the console audit summary and in Velvet logs).done.minted or distribute_partialThe new token exists in vault but the old token is still valid. Both tokens are now live simultaneously.
Cleanup required:
| Aborted from | Action |
|---|---|
minted (new token in vault, not distributed) |
Delete the new token from the vault path. The old token remains the active credential. File a FreeScout ticket documenting the orphaned token. |
distribute_partial |
Document which consumers have the new token and which have the old (check the rotation_job_consumers rows). Manually sync all consumers back to the old token. Then delete the new token from vault. |
validated |
Distribution and validation succeeded; only revocation is pending. You may manually revoke the old token via the vendor dashboard, then use "Mark manually revoked" in the console. |
Velvet does not support one-click rollback. Once the new token has been distributed and the old token revoked, there is no automated path back.
What is reversible:
proceed_revoke: The old token is still valid. Abort the job. Manually roll back any consumers that received the new token to the old token. Delete the new token from vault.done: The old token is revoked. Re-rotation is required: start a new operational rotation job to mint a fresh token.What is NOT reversible:
Velvet's own credentials (INFISICAL_CLIENT_SECRET, INFISICAL_CLIENT_ID, HK_VELVET_BOOTSTRAP) are stored as Heroku config vars, not in vault, to break the bootstrap circularity. Velvet cannot rotate them itself.
INFISICAL_CLIENT_SECRETheroku config:set INFISICAL_CLIENT_SECRET=<new_value> --app raxx-velvet-prod >/dev/null 2>&1
heroku config:set INFISICAL_CLIENT_SECRET=<new_value> --app raxx-velvet-staging >/dev/null 2>&1
Note: always redirect to /dev/null 2>&1 — heroku config:set echoes config vars to stdout by default (feedback: heroku_config_set_echoes_secrets).
/healthz returns 200 on both apps.HK_VELVET_BOOTSTRAPThis token is used by Velvet to authenticate its PATCH calls to Heroku config vars on behalf of consumer updates.
heroku config:set HK_VELVET_BOOTSTRAP=<new_token> --app raxx-velvet-prod >/dev/null 2>&1
heroku config:set HK_VELVET_BOOTSTRAP=<new_token> --app raxx-velvet-staging >/dev/null 2>&1
/healthz on both apps.heroku authorizations:revoke <old-auth-id>
HK_VELVET_BOOTSTRAP__AUTH_ID in Infisical with the new authorization UUID, so the next rotation can find it.PM_SERVER_MAIL)| Failure | Meaning | Action |
|---|---|---|
verify_failed with HTTP 401 |
Postmark token already invalid | Rotate manually: generate new token in Postmark dashboard, enter new value via Velvet OPERATOR_MANUAL or directly in vault |
distribute_partial on infisical-postmark-prod |
Infisical write failed | Check INFISICAL_CLIENT_ID/INFISICAL_CLIENT_SECRET Heroku config vars; retry |
validate_failed HTTP 401 |
New token not yet active at Postmark (rare propagation delay) | Wait 30 seconds; retry validation |
| Revoke not automated | Postmark does not expose a token-delete API | Operator must manually delete the old server token in the Postmark dashboard; click "Mark manually revoked" |
HK_PLATFORM_FULL)| Failure | Meaning | Action |
|---|---|---|
verify_failed with "old token invalid" |
HEROKU_PLATFORM_API_TOKEN in Velvet config vars is drifted |
Follow docs/ops/runbooks/heroku-api-key-drift-recovery.md |
distribute_partial — one Heroku app |
PATCH to that app returned non-200 | Check if the app exists: heroku apps --app <app-name>. If the app was deleted, remove it from the manifest; mark consumer row as skipped |
revoke_failed with "revoke_pending" |
Old auth DELETE failed after distribute succeeded | Note old_auth_id from logs; manually revoke via heroku authorizations:revoke <id>; mark manually revoked |
distribute_partial — github-actions-heroku-api-key |
GitHub Actions secret PUT failed | Check GH_APP_OPS_BOT token in vault; verify repo name is correct in manifest |
CF_DNS_EDIT_RAXX_APP, others)| Failure | Meaning | Action |
|---|---|---|
Consumer active: false in manifest |
CF adapter pending OQ7 resolution | Rotate manually per docs/ops/runbooks/rotation/cloudflare-user-api-token.md |
| OPERATOR_MANUAL flow — operator entered wrong value | New token does not validate at CF | Re-enter the correct token value; Velvet will re-attempt vault write |
Note: scripts/ops/probe_cf_token_perms.py reads Cloudflare token permissions directly from Infisical. It does not go through Velvet. This is intentional — it is a read-only diagnostic tool and has not been migrated to the Velvet bus.
AWS_ACCESS_KEY_ID, password-class credentials)| Failure | Meaning | Action |
|---|---|---|
distribute_partial — SSM write 403 |
Velvet's IAM role lacks ssm:PutParameter on the target path |
Verify the IAM policy attached to the Velvet Heroku dyno's assumed role covers /raxx/{env}/{vendor}/{name} |
| SSM path not found (404 on read) | SSM path does not exist yet | First rotation creates the path; if the path was deleted externally, it will be re-created by the adapter |
rev_leaked responseIf a revocation job reaches rev_leaked, one or more consumers returned a non-401 response after the old token was confirmed revoked at the vendor. This means at least one consumer still has a copy of the revoked token and may be accepting it.
Immediate steps:
SL_BOT_NOTIFY within 30 seconds of the flag being set. The message includes the job_id, credential_name, and the list of leaking consumer_ids.rev_leaked to rev_done.Root causes of rev_leaked:
APP_ENV on each Heroku dyno controls which Infisical environment and SSM path prefix is used.
| App | APP_ENV |
Infisical env slug | SSM path prefix |
|---|---|---|---|
raxx-velvet-prod |
prod |
prod |
/raxx/prod/ |
raxx-velvet-staging |
staging |
staging |
/raxx/staging/ |
The subscription manifest uses env: prod and env: staging per consumer row. A rotation job on raxx-velvet-prod only fans out to consumers with env: prod.
The Heroku app consumer rows for staging config vars (raxx-console-staging, raxx-api-staging) are registered with env: prod in the manifest — this is intentional. The staging apps' config vars hold the same credential (the Heroku platform key), which is a single credential shared across environments.
| Mistake | Symptom | Fix |
|---|---|---|
Starting a prod rotation against raxx-velvet-staging |
Job fans out to staging consumers only; prod consumers never receive the new token | Abort the job. Re-run against raxx-velvet-prod. |
| Forgetting to open a FreeScout ticket before rotating | Cannot enter ticket ID at Stage 3 revoke gate | Open the ticket now. The gate enforces non-empty input but does not validate the ticket exists. |
Clicking "Abort" from validated thinking it rolls everything back |
New token stays in vault and distributed; old token stays live | See section 6f — abort from validated requires manual revocation of the old token only. |
Retrying a revoke_failed job with a different auth token |
Second revoke attempt uses stale auth | Ensure HK_VELVET_BOOTSTRAP or the relevant auth token in vault is current before retrying. |
| Two operators starting rotations for the same credential simultaneously | Second job's verify step returns "active rotation already in progress" | Only one operational rotation per credential can be in flight at a time. The first job must reach done or aborted before the second can start. |
Running heroku config:set without redirecting stdout |
Secret value printed to terminal and shell history | Always use heroku config:set VAR=value >/dev/null 2>&1 |
| Checking vault for the new token value after rotation completes | Token value is not available via Velvet after the job reaches done |
Read from Infisical directly using the machine identity; Velvet does not store the token value after rotation. |
Terminal events (job done, aborted, rev_leaked) trigger a Slack DM to the operator's channel.
Bot channel for automated alerts: SL_BOT_NOTIFY (configured in Velvet Heroku config vars).
Operator DM for walk-away pings: D0AJ7K184TV (Kristerpher's DM channel).
rev_leaked alerts are additionally sent within 30 seconds to the ops alert inbox (ops@raxx.app).