Velvet Operator Runbook
Last verified against: Velvet v2 (2026-05-29 UTC)
Last incident: 2026-05-29 — Velvet prod DEGRADED 463h; three root causes: (1) /health alias missing from stale slug (fixed by deploying v11 via deploy-velvet.yml), (2) velvet.raxx.app had no DNS CNAME in Cloudflare (NXDOMAIN, fixed 2026-05-29 15:00 UTC), (3) Heroku ACM was disabled (enabled 2026-05-29 15:10 UTC). Console probe will return OPERATIONAL once ACM cert provisions (~5 min). See section 15.
Parent epic: #907
Design doc: docs/architecture/velvet/v2-rotation-flows.md
Handler-author guide: docs/architecture/velvet-handler-author-guide.md
Reading time: ~15 min
1. When to use Velvet vs. manual rotation
| Situation | Use Velvet | Use manual procedure |
|---|---|---|
| Scheduled rotation of a credential with registered subscribers | Yes | No |
| Emergency revocation after a suspected leak | Yes — revocation flow | No |
Credential with active: false in the subscription manifest |
No — fix the manifest first | Proceed manually per vendor SOP |
| Velvet itself is down or unreachable | No — use vendor SOP directly | Yes |
Velvet's own bootstrap credentials (INFISICAL_CLIENT_SECRET, HK_VELVET_BOOTSTRAP) |
No — circular dependency | Yes — section 8 below |
| Vendor does not support programmatic token creation (e.g. CF User API tokens) | Operator-assisted Velvet (OPERATOR_MANUAL flow) | Parallel manual path |
Feature flag velvet_v2_rotation is off |
No — Velvet returns 503 | Yes — use vendor SOP |
| Credential has no subscribers registered in the manifest | No | Yes — use per-credential SOP in docs/ops/runbooks/rotation/ |
2. Pre-flight checklist
Complete every item before triggering a rotation. A stalled pre-flight is cheaper than a stalled distribute.
- [ ] Check Velvet health — both environments should return HTTP 200 with
{"status": "ok"}:
curl -sf https://raxx-velvet-prod.herokuapp.com/healthz
curl -sf https://raxx-velvet-staging.herokuapp.com/healthz
If either returns non-200 or times out, stop. Do not rotate against a degraded Velvet.
-
[ ] Open a FreeScout ticket for this rotation. You will need the ticket ID at the revoke confirmation gate. Format:
ROT-YYYY-CRED_NAME(example:ROT-2026-05-HK_PLATFORM_FULL). -
[ ] Confirm the feature flag is on:
curl -sf https://raxx-velvet-prod.herokuapp.com/flags | python3 -m json.tool | grep velvet_v2_rotation
Expected: "velvet_v2_rotation": true. If false, rotation endpoints return 503 and you must use the manual vendor SOP.
-
[ ] Confirm you have the correct environment — look at the console environment banner. Red = prod, purple = staging. Do not rotate prod credentials against the staging Velvet app.
-
[ ] Confirm the credential is listed in the manifest — inspect
docs/architecture/velvet/subscription-manifest.ymlor call:
curl -sf https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/subscribers
You should see the expected consumer list. If the list is empty or the endpoint returns 404, the credential is not registered.
- [ ] Check the current job history for recent failures on this credential:
curl -sf "https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations?limit=5"
If the most recent job is in distribute_partial or revoke_failed, resolve it before starting a new job. Two overlapping rotation jobs for the same credential are not supported.
- [ ] Announce in #ops-internal (Slack) that a rotation is starting: credential name, job type (operational / revocation / testing), ticket ID.
3. Triggering a rotation
3a. Console UI (preferred)
- Navigate to the console:
https://raxx-console-prod.herokuapp.com/security/secrets - Locate the credential row in the Secrets table.
- Click Rotate — this opens the Stage Wizard modal.
- Follow the three-panel flow: Stage 1 (Verify) → Stage 2 (Mint + Distribute) → Stage 3 (Validate + Revoke).
- Each stage requires explicit operator action before advancing. You can abort at any stage.
3b. API (for scripted or emergency use)
All endpoints require a rotate-scoped service token in the Authorization: Bearer <token> header.
Step 1 — Create the job:
POST https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotate
Content-Type: application/json
{
"flow_type": "operational",
"idempotency_key": "<uuid-v4>",
"force_revoke": false
}
Response (202):
{ "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "init" }
Step 2 — Run Stage 1 (Verify):
POST https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>/stage
Content-Type: application/json
{ "action": "verify" }
Step 3 — Proceed to Mint + Distribute:
POST .../rotations/<job_id>/stage
{ "action": "proceed_mint" }
Step 4 — Proceed to Revoke (after validating all consumers):
POST .../rotations/<job_id>/stage
{ "action": "proceed_revoke" }
Polling for status:
GET https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>
SSE stream (real-time status):
GET https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>/stream
Accept: text/event-stream
4. Monitoring in-flight rotations
Console UI
The Stage Wizard shows live status via SSE. Each consumer row updates in real time. A spinner indicates in-progress; green check = succeeded; amber = in-progress; red X = failed.
API polling
Poll GET /tokens/{name}/rotations/{job_id} every 5 seconds. The response includes:
{
"job_id": "...",
"status": "distributing",
"consumers": [
{ "consumer_id": "heroku-config-console-prod", "distribute_status": "succeeded" },
{ "consumer_id": "heroku-config-api-prod", "distribute_status": "in_progress" },
{ "consumer_id": "github-actions-heroku-api-key", "distribute_status": "pending" }
],
"created_at": "2026-05-04T06:00:00Z",
"updated_at": "2026-05-04T06:00:42Z"
}
Heroku logs
heroku logs --tail --app raxx-velvet-prod | grep job_id=<your-job-id>
5. Job states reference
Every rotation_jobs row progresses through this state machine. The status column tells you exactly where a job is.
| Status | Meaning | Can advance | Operator action required |
|---|---|---|---|
init |
Job created; nothing touched | Yes | Click "Verify" |
verifying |
Velvet probing the vendor with the current token | Automatic | Wait |
verify_failed |
Auth probe failed; current token may be invalid | Retry or abort | See section 6 |
verified |
Probe confirmed; operator gate before mint | Yes | Click "Proceed to mint" |
minting |
Velvet calling vendor to mint new token | Automatic | Wait |
mint_failed |
Vendor mint API returned error | Abort only | Old token still valid |
minted |
New token in hand; not yet distributed | Yes (automatic fan-out) | Wait |
distributing |
Fan-out to registered consumers in progress | Automatic | Wait |
distribute_partial |
Some consumers failed; others succeeded | Retry or abort | Retry failed rows or section 6 |
distribute_failed |
All consumers failed | Abort | New token minted but not distributed — see abort table |
distributed |
All consumers received new token | Automatic validation | Wait |
validating |
Healthchecks running on all consumers | Automatic | Manual-confirm rows if needed |
validate_partial |
Some healthchecks failed | Retry or abort | Retry failed rows |
validate_failed |
All healthchecks failed | Abort | Investigate consumer reachability |
validated |
All consumers healthy with new token | Yes | Type-to-confirm + FreeScout ID, then click Revoke |
revoking |
Velvet calling vendor to revoke old token | Automatic | Wait |
revoke_failed |
Vendor revoke API returned error | Retry or mark manual | See section 6 |
done |
Rotation complete; old token revoked | Terminal | None |
aborted |
Operator or system aborted | Terminal | Check residual state (section 6) |
Revocation flow statuses:
| Status | Meaning |
|---|---|
rev_init |
Revocation job created |
rev_revoking |
Vendor revoke call in flight |
rev_revoke_failed |
Vendor rejected the revoke call |
rev_revoked |
Vendor confirmed revocation; validating consumers |
rev_validating |
Healthchecks running (expecting 401 from each consumer) |
rev_leaked |
One or more consumers returned non-401 after revocation — SEV1 |
rev_done |
All consumers confirmed locked out |
6. Stuck job diagnosis and recovery
Definition of stuck: A job has been in the same status for more than 5 minutes without a state change, OR a job is in a terminal state (distribute_partial, revoke_failed, aborted) that requires operator action.
6a. Job stuck in verifying
The auth probe is timing out or being rate-limited.
- Check Velvet logs:
heroku logs --app raxx-velvet-prod | grep job_id=<id> - Look for
ConnectionError,Timeout, or HTTP status code. - If the vendor API is rate-limiting: wait 2 minutes and click "Retry" in the console.
- If the vendor API is returning 401: the current token is already invalid. Stop and use the vendor's manual revocation + re-issue process. File a FreeScout incident ticket.
6b. Job stuck in minting
- Check Velvet logs for
mint failedentries. - If the vendor returned 401 on the mint call, the old token drifted between Verify and Mint. This is rare but possible if two rotation jobs ran simultaneously. Abort this job; the old token is still valid.
- If the vendor returned 5xx: retry once. If it fails again, wait 10 minutes (vendor-side transient issue) and retry.
6c. Job in distribute_partial
Some consumers received the new token; others did not. The old token is still valid.
Recovery options:
Option A (preferred): Retry failed rows in the console. Click "Retry failed" for each red row. Velvet will re-attempt the PATCH/INFISICAL_WRITE for those consumers only.
Option B (if retries keep failing): Identify the failing consumer(s) by their consumer_id in the job status. Manually push the new token to that consumer using the vendor's own interface. Once done, click "Manual confirm" in the console to mark that row as succeeded. After all rows are green, advance to Stage 3.
Option C (if you need to abort): Click Abort. The new token is now minted but distributed only to some consumers. You must manually delete the new token from vault and re-sync the affected consumers to the old token. The console shows the residual consumer list. File a FreeScout ticket and follow per-vendor SOP in docs/ops/runbooks/rotation/.
6d. Job in validate_partial or validate_failed
Healthchecks failed on one or more consumers after the new token was distributed.
- Check which consumers are showing
validate_status: failedin the job status response. - Confirm the consumer application has restarted and loaded the new config var. For Heroku apps:
heroku ps --app <app-name>— if dyno is crashed, that's your answer. - Wait 60 seconds for dyno restart to complete, then click "Retry validation" in the console.
- If a consumer has
healthcheck_endpoint: nullin the manifest, a "Manual confirm" button appears. Verify the consumer manually, then click confirm to mark it as validated. - If validation keeps failing: check whether the distribute step actually wrote the new token. Use the per-vendor SOP to verify the config var value was updated.
6e. Job in revoke_failed
The new token is distributed and validated. The old token has not been revoked.
- Note the
old_auth_idor equivalent from the job metadata (visible in the console audit summary and in Velvet logs). - Retry the revoke in the console ("Retry revoke" button).
- If the vendor returns 404 on the revoke call, the old token was already deleted outside Velvet. Click "Mark manually revoked" and enter the FreeScout ticket ID. Velvet will advance to
done. - If the vendor keeps returning errors: revoke the old token manually via the vendor dashboard or CLI. Then click "Mark manually revoked" with the FreeScout ticket ID.
6f. Job aborted from minted or distribute_partial
The new token exists in vault but the old token is still valid. Both tokens are now live simultaneously.
Cleanup required:
| Aborted from | Action |
|---|---|
minted (new token in vault, not distributed) |
Delete the new token from the vault path. The old token remains the active credential. File a FreeScout ticket documenting the orphaned token. |
distribute_partial |
Document which consumers have the new token and which have the old (check the rotation_job_consumers rows). Manually sync all consumers back to the old token. Then delete the new token from vault. |
validated |
Distribution and validation succeeded; only revocation is pending. You may manually revoke the old token via the vendor dashboard, then use "Mark manually revoked" in the console. |
7. Rollback
Velvet does not support one-click rollback. Once the new token has been distributed and the old token revoked, there is no automated path back.
What is reversible:
- Before
proceed_revoke: The old token is still valid. Abort the job. Manually roll back any consumers that received the new token to the old token. Delete the new token from vault. - After
done: The old token is revoked. Re-rotation is required: start a new operational rotation job to mint a fresh token.
What is NOT reversible:
- Revocation of a Cloudflare User API token: CF does not support re-activating a revoked token. You must create a new token in the CF dashboard.
- Revocation of a Heroku OAuth authorization: The authorization ID is gone. A new OAuth authorization must be minted.
- Any token that the vendor marks as single-use after deletion.
8. Rotating Velvet's own bootstrap credentials (Invariant I7)
Velvet's own credentials (INFISICAL_CLIENT_SECRET, INFISICAL_CLIENT_ID, HK_VELVET_BOOTSTRAP) are stored as Heroku config vars, not in vault, to break the bootstrap circularity. Velvet cannot rotate them itself.
Rotating INFISICAL_CLIENT_SECRET
- In the Infisical dashboard, generate a new client secret for the Velvet machine identity.
- Set the new value on both Heroku apps:
heroku config:set INFISICAL_CLIENT_SECRET=<new_value> --app raxx-velvet-prod >/dev/null 2>&1
heroku config:set INFISICAL_CLIENT_SECRET=<new_value> --app raxx-velvet-staging >/dev/null 2>&1
Note: always redirect to /dev/null 2>&1 — heroku config:set echoes config vars to stdout by default (feedback: heroku_config_set_echoes_secrets).
- Verify Velvet restarts and
/healthzreturns 200 on both apps. - Revoke the old client secret in the Infisical dashboard.
- Record the rotation in a FreeScout ticket.
Rotating HK_VELVET_BOOTSTRAP
This token is used by Velvet to authenticate its PATCH calls to Heroku config vars on behalf of consumer updates.
- Use the Heroku Platform API or dashboard to create a new OAuth authorization for the Velvet machine user.
- Set the new token:
heroku config:set HK_VELVET_BOOTSTRAP=<new_token> --app raxx-velvet-prod >/dev/null 2>&1
heroku config:set HK_VELVET_BOOTSTRAP=<new_token> --app raxx-velvet-staging >/dev/null 2>&1
- Verify
/healthzon both apps. - Revoke the old authorization via the Heroku dashboard or:
heroku authorizations:revoke <old-auth-id>
- Update the companion secret
HK_VELVET_BOOTSTRAP__AUTH_IDin Infisical with the new authorization UUID, so the next rotation can find it.
9. Failure modes by adapter
Postmark (PM_SERVER_MAIL)
| Failure | Meaning | Action |
|---|---|---|
verify_failed with HTTP 401 |
Postmark token already invalid | Rotate manually: generate new token in Postmark dashboard, enter new value via Velvet OPERATOR_MANUAL or directly in vault |
distribute_partial on infisical-postmark-prod |
Infisical write failed | Check INFISICAL_CLIENT_ID/INFISICAL_CLIENT_SECRET Heroku config vars; retry |
validate_failed HTTP 401 |
New token not yet active at Postmark (rare propagation delay) | Wait 30 seconds; retry validation |
| Revoke not automated | Postmark does not expose a token-delete API | Operator must manually delete the old server token in the Postmark dashboard; click "Mark manually revoked" |
Heroku (HK_PLATFORM_FULL)
| Failure | Meaning | Action |
|---|---|---|
verify_failed with "old token invalid" |
HEROKU_PLATFORM_API_TOKEN in Velvet config vars is drifted |
Follow docs/ops/runbooks/heroku-api-key-drift-recovery.md |
distribute_partial — one Heroku app |
PATCH to that app returned non-200 | Check if the app exists: heroku apps --app <app-name>. If the app was deleted, remove it from the manifest; mark consumer row as skipped |
revoke_failed with "revoke_pending" |
Old auth DELETE failed after distribute succeeded | Note old_auth_id from logs; manually revoke via heroku authorizations:revoke <id>; mark manually revoked |
distribute_partial — github-actions-heroku-api-key |
GitHub Actions secret PUT failed | Check GH_APP_OPS_BOT token in vault; verify repo name is correct in manifest |
Cloudflare (CF_DNS_EDIT_RAXX_APP, others)
| Failure | Meaning | Action |
|---|---|---|
Consumer active: false in manifest |
CF adapter pending OQ7 resolution | Rotate manually per docs/ops/runbooks/rotation/cloudflare-user-api-token.md |
| OPERATOR_MANUAL flow — operator entered wrong value | New token does not validate at CF | Re-enter the correct token value; Velvet will re-attempt vault write |
Note: scripts/ops/probe_cf_token_perms.py reads Cloudflare token permissions directly from Infisical. It does not go through Velvet. This is intentional — it is a read-only diagnostic tool and has not been migrated to the Velvet bus.
AWS SSM (AWS_ACCESS_KEY_ID, password-class credentials)
| Failure | Meaning | Action |
|---|---|---|
distribute_partial — SSM write 403 |
Velvet's IAM role lacks ssm:PutParameter on the target path |
Verify the IAM policy attached to the Velvet Heroku dyno's assumed role covers /raxx/{env}/{vendor}/{name} |
| SSM path not found (404 on read) | SSM path does not exist yet | First rotation creates the path; if the path was deleted externally, it will be re-created by the adapter |
10. SEV1 — rev_leaked response
If a revocation job reaches rev_leaked, one or more consumers returned a non-401 response after the old token was confirmed revoked at the vendor. This means at least one consumer still has a copy of the revoked token and may be accepting it.
Immediate steps:
- You will have received a Slack DM on channel
SL_BOT_NOTIFYwithin 30 seconds of the flag being set. The message includes thejob_id,credential_name, and the list of leakingconsumer_ids. - Open the Velvet console: the leaked consumers are highlighted in red with an "Investigate" button.
- Click "Investigate" — this auto-creates a FreeScout ticket pre-filled with the consumer list.
- For each leaking consumer: a. Determine whether the consumer is still actively serving traffic. b. If yes: immediately disable or restart the consumer to force it to reload config. c. Verify the consumer is no longer accepting the revoked token by re-running the healthcheck manually.
- Once all consumers return 401, click "Confirm leak resolved" in the Velvet console. The job advances from
rev_leakedtorev_done. - If any consumer cannot be forced to reject the token (e.g., a caching layer with a long TTL), escalate to a security incident per the security response runbook.
Root causes of rev_leaked:
- Consumer cached the token in memory and has not been restarted since rotation.
- Consumer received the new token via distribute but reverted to an old value from a local config file.
- Consumer's healthcheck endpoint is cached or proxied and is not reflecting the real auth state.
11. Staging vs. production
APP_ENV on each Heroku dyno controls which Infisical environment and SSM path prefix is used.
| App | APP_ENV |
Infisical env slug | SSM path prefix |
|---|---|---|---|
raxx-velvet-prod |
prod |
prod |
/raxx/prod/ |
raxx-velvet-staging |
staging |
staging |
/raxx/staging/ |
The subscription manifest uses env: prod and env: staging per consumer row. A rotation job on raxx-velvet-prod only fans out to consumers with env: prod.
The Heroku app consumer rows for staging config vars (raxx-console-staging, raxx-api-staging) are registered with env: prod in the manifest — this is intentional. The staging apps' config vars hold the same credential (the Heroku platform key), which is a single credential shared across environments.
12. Common operator mistakes and fixes
| Mistake | Symptom | Fix |
|---|---|---|
Starting a prod rotation against raxx-velvet-staging |
Job fans out to staging consumers only; prod consumers never receive the new token | Abort the job. Re-run against raxx-velvet-prod. |
| Forgetting to open a FreeScout ticket before rotating | Cannot enter ticket ID at Stage 3 revoke gate | Open the ticket now. The gate enforces non-empty input but does not validate the ticket exists. |
Clicking "Abort" from validated thinking it rolls everything back |
New token stays in vault and distributed; old token stays live | See section 6f — abort from validated requires manual revocation of the old token only. |
Retrying a revoke_failed job with a different auth token |
Second revoke attempt uses stale auth | Ensure HK_VELVET_BOOTSTRAP or the relevant auth token in vault is current before retrying. |
| Two operators starting rotations for the same credential simultaneously | Second job's verify step returns "active rotation already in progress" | Only one operational rotation per credential can be in flight at a time. The first job must reach done or aborted before the second can start. |
Running heroku config:set without redirecting stdout |
Secret value printed to terminal and shell history | Always use heroku config:set VAR=value >/dev/null 2>&1 |
| Checking vault for the new token value after rotation completes | Token value is not available via Velvet after the job reaches done |
Read from Infisical directly using the machine identity; Velvet does not store the token value after rotation. |
13. Health endpoint returns 404 or NXDOMAIN (DEGRADED on console)
Symptom: Console shows velvet-prod DEGRADED.
Note: The Heroku app's .herokuapp.com hostname changed format when the app was created.
The old-style raxx-velvet-prod.herokuapp.com returns "No such app" — always use the randomized
hostname from heroku domains --app raxx-velvet-prod as the fallback probe URL:
raxx-velvet-prod-b0cea70d1b98.herokuapp.com
13a. Stale slug (endpoint missing)
Cause: A code change added /health or /healthz to velvet/app.py but prod was not redeployed.
Diagnosis:
# Check deployed slug SHA vs. current main
heroku releases --app raxx-velvet-prod --num 3
# Probe via the correct Heroku hostname (get it from: heroku domains --app raxx-velvet-prod)
curl -sf https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/healthz
curl -I https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/health
Fix: Trigger a production deploy of the current main branch via the GH Actions workflow:
- Navigate to: Actions → "Deploy Velvet" → "Run workflow"
- Set
environment = production,ref = main - The workflow runs a subtree-split of
velvet/and pushes to Heroku. - The smoke check in the workflow polls
/healthz.
# Verify after deploy (use hostname from: heroku domains --app raxx-velvet-prod)
curl -sf https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/health # expect 200
curl -sf https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/healthz # expect 200
Resolved 2026-05-29 14:41 UTC: v11 (slug efc8f6e1) deployed; 463h DEGRADED ended.
velvet.raxx.app/healthz probe will return OPERATIONAL once DNS CNAME and ACM cert are in place (section 15).
13b. Custom domain not in DNS (NXDOMAIN)
Cause: velvet.raxx.app had no Cloudflare CNAME record. The console probe targets
https://velvet.raxx.app/healthz per config/status-surfaces.yaml, which returned NXDOMAIN
regardless of dyno health.
Diagnosis:
dig velvet.raxx.app
# Expected: CNAME → closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com
# Symptom: NXDOMAIN
Fix: Three steps required (all completed 2026-05-29):
-
Add the custom domain to Heroku:
heroku domains:add velvet.raxx.app --app raxx-velvet-prod heroku domains:wait velvet.raxx.app --app raxx-velvet-prod # DNS target: closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com -
Enable Heroku ACM (required for TLS — disabled by default on new apps):
heroku certs:auto:enable --app raxx-velvet-prod heroku certs:auto --app raxx-velvet-prod # Wait for status: OK (~5 min after CNAME propagates)Without this step, HTTPS returns CF error 525 (TLS handshake failure) even after CNAME is live. -
Add CNAME record in Cloudflare (done 2026-05-29 15:00 UTC, record ID
1db2a5e38fa5705cb61fe9c8682320e5):# Using CLOUDFLARE_EDIT_DNS token from vault: curl -sS -X POST \ -H "Authorization: Bearer $CLOUDFLARE_EDIT_DNS" \ -H "Content-Type: application/json" \ "https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/dns_records" \ -d '{"type":"CNAME","name":"velvet","content":"closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com","proxied":true}'
After DNS propagates (~1–5 min):
curl -sf https://velvet.raxx.app/healthz
# Expect: {"status": "ok", "service": "velvet"}
14. deploy-velvet.yml smoke check false failures
The workflow's STAGING_URL and PROD_URL env vars reference the old <appname>.herokuapp.com
hostname format (e.g., raxx-velvet-prod.herokuapp.com). Heroku now provisions randomized hostnames
(e.g., raxx-velvet-prod-b0cea70d1b98.herokuapp.com). The old hostnames return "No such app" (404)
from the Heroku router, which causes the post-deploy /healthz smoke check to fail even when the
deploy and dyno are healthy.
Fixed 2026-05-29: PR #3088 updated both URLs:
- STAGING_URL → https://raxx-velvet-staging-609f3019292a.herokuapp.com
- PROD_URL → https://velvet.raxx.app (custom domain — stable across Heroku rebuilds)
15. velvet.raxx.app DNS — completed 2026-05-29
Status (2026-05-29 15:00 UTC): CNAME created in Cloudflare; ACM enabled on Heroku; cert provisioning (~5 min).
Heroku DNS target:
closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com
Completed actions:
- CF CNAME record created (record ID:
1db2a5e38fa5705cb61fe9c8682320e5):velvet.raxx.app CNAME → closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com(proxied) - Heroku ACM enabled:
heroku certs:auto:enable --app raxx-velvet-prod - deploy-velvet.yml PROD_URL updated to
https://velvet.raxx.app(PR #3088, merged)
Remaining (terraform hardening — SEV-4, non-blocking):
Create terraform/velvet/dns.tf using terraform/queue/dns.tf as the pattern for durability.
Add velvet-staging.raxx.app similarly once the staging custom domain is registered.
Verify:
curl -sf https://velvet.raxx.app/healthz
# Expect: {"status": "ok", "service": "velvet"}
The console will show velvet OPERATIONAL within one ~3-min poll cycle after ACM cert is active.
16. Slack DM notifications
Terminal events (job done, aborted, rev_leaked) trigger a Slack DM to the operator's channel.
Bot channel for automated alerts: SL_BOT_NOTIFY (configured in Velvet Heroku config vars).
Operator DM for walk-away pings: D0AJ7K184TV (Kristerpher's DM channel).
rev_leaked alerts are additionally sent within 30 seconds to the ops alert inbox (ops@raxx.app).