Velvet Operator Runbook

Last verified against: Velvet v2 (2026-05-29 UTC) Last incident: 2026-05-29 — Velvet prod DEGRADED 463h; three root causes: (1) /health alias missing from stale slug (fixed by deploying v11 via deploy-velvet.yml), (2) velvet.raxx.app had no DNS CNAME in Cloudflare (NXDOMAIN, fixed 2026-05-29 15:00 UTC), (3) Heroku ACM was disabled (enabled 2026-05-29 15:10 UTC). Console probe will return OPERATIONAL once ACM cert provisions (~5 min). See section 15. Parent epic: #907 Design doc: docs/architecture/velvet/v2-rotation-flows.md Handler-author guide: docs/architecture/velvet-handler-author-guide.md

Reading time: ~15 min

1. When to use Velvet vs. manual rotation

Situation	Use Velvet	Use manual procedure
Scheduled rotation of a credential with registered subscribers	Yes	No
Emergency revocation after a suspected leak	Yes — revocation flow	No
Credential with `active: false` in the subscription manifest	No — fix the manifest first	Proceed manually per vendor SOP
Velvet itself is down or unreachable	No — use vendor SOP directly	Yes
Velvet's own bootstrap credentials (`INFISICAL_CLIENT_SECRET`, `HK_VELVET_BOOTSTRAP`)	No — circular dependency	Yes — section 8 below
Vendor does not support programmatic token creation (e.g. CF User API tokens)	Operator-assisted Velvet (OPERATOR_MANUAL flow)	Parallel manual path
Feature flag `velvet_v2_rotation` is `off`	No — Velvet returns 503	Yes — use vendor SOP
Credential has no subscribers registered in the manifest	No	Yes — use per-credential SOP in `docs/ops/runbooks/rotation/`

2. Pre-flight checklist

Complete every item before triggering a rotation. A stalled pre-flight is cheaper than a stalled distribute.

[ ] Check Velvet health — both environments should return HTTP 200 with {"status": "ok"}:

curl -sf https://raxx-velvet-prod.herokuapp.com/healthz curl -sf https://raxx-velvet-staging.herokuapp.com/healthz

If either returns non-200 or times out, stop. Do not rotate against a degraded Velvet.

[ ] Open a FreeScout ticket for this rotation. You will need the ticket ID at the revoke confirmation gate. Format: ROT-YYYY-CRED_NAME (example: ROT-2026-05-HK_PLATFORM_FULL).
[ ] Confirm the feature flag is on:

curl -sf https://raxx-velvet-prod.herokuapp.com/flags | python3 -m json.tool | grep velvet_v2_rotation

Expected: "velvet_v2_rotation": true. If false, rotation endpoints return 503 and you must use the manual vendor SOP.

[ ] Confirm you have the correct environment — look at the console environment banner. Red = prod, purple = staging. Do not rotate prod credentials against the staging Velvet app.
[ ] Confirm the credential is listed in the manifest — inspect docs/architecture/velvet/subscription-manifest.yml or call:

curl -sf https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/subscribers

You should see the expected consumer list. If the list is empty or the endpoint returns 404, the credential is not registered.

[ ] Check the current job history for recent failures on this credential:

curl -sf "https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations?limit=5"

If the most recent job is in distribute_partial or revoke_failed, resolve it before starting a new job. Two overlapping rotation jobs for the same credential are not supported.

[ ] Announce in #ops-internal (Slack) that a rotation is starting: credential name, job type (operational / revocation / testing), ticket ID.

3. Triggering a rotation

3a. Console UI (preferred)

Navigate to the console: https://raxx-console-prod.herokuapp.com/security/secrets
Locate the credential row in the Secrets table.
Click Rotate — this opens the Stage Wizard modal.
Follow the three-panel flow: Stage 1 (Verify) → Stage 2 (Mint + Distribute) → Stage 3 (Validate + Revoke).
Each stage requires explicit operator action before advancing. You can abort at any stage.

3b. API (for scripted or emergency use)

All endpoints require a rotate-scoped service token in the Authorization: Bearer <token> header.

Step 1 — Create the job:

POST https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotate
Content-Type: application/json

{
  "flow_type": "operational",
  "idempotency_key": "<uuid-v4>",
  "force_revoke": false
}

Response (202):

{ "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "init" }

Step 2 — Run Stage 1 (Verify):

POST https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>/stage
Content-Type: application/json

{ "action": "verify" }

Step 3 — Proceed to Mint + Distribute:

POST .../rotations/<job_id>/stage
{ "action": "proceed_mint" }

Step 4 — Proceed to Revoke (after validating all consumers):

POST .../rotations/<job_id>/stage
{ "action": "proceed_revoke" }

Polling for status:

GET https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>

SSE stream (real-time status):

GET https://raxx-velvet-prod.herokuapp.com/tokens/HK_PLATFORM_FULL/rotations/<job_id>/stream
Accept: text/event-stream

4. Monitoring in-flight rotations

Console UI

The Stage Wizard shows live status via SSE. Each consumer row updates in real time. A spinner indicates in-progress; green check = succeeded; amber = in-progress; red X = failed.

API polling

Poll GET /tokens/{name}/rotations/{job_id} every 5 seconds. The response includes:

{
  "job_id": "...",
  "status": "distributing",
  "consumers": [
    { "consumer_id": "heroku-config-console-prod", "distribute_status": "succeeded" },
    { "consumer_id": "heroku-config-api-prod",     "distribute_status": "in_progress" },
    { "consumer_id": "github-actions-heroku-api-key", "distribute_status": "pending" }
  ],
  "created_at": "2026-05-04T06:00:00Z",
  "updated_at": "2026-05-04T06:00:42Z"
}

Heroku logs

heroku logs --tail --app raxx-velvet-prod | grep job_id=<your-job-id>

5. Job states reference

Every rotation_jobs row progresses through this state machine. The status column tells you exactly where a job is.

Status	Meaning	Can advance	Operator action required
`init`	Job created; nothing touched	Yes	Click "Verify"
`verifying`	Velvet probing the vendor with the current token	Automatic	Wait
`verify_failed`	Auth probe failed; current token may be invalid	Retry or abort	See section 6
`verified`	Probe confirmed; operator gate before mint	Yes	Click "Proceed to mint"
`minting`	Velvet calling vendor to mint new token	Automatic	Wait
`mint_failed`	Vendor mint API returned error	Abort only	Old token still valid
`minted`	New token in hand; not yet distributed	Yes (automatic fan-out)	Wait
`distributing`	Fan-out to registered consumers in progress	Automatic	Wait
`distribute_partial`	Some consumers failed; others succeeded	Retry or abort	Retry failed rows or section 6
`distribute_failed`	All consumers failed	Abort	New token minted but not distributed — see abort table
`distributed`	All consumers received new token	Automatic validation	Wait
`validating`	Healthchecks running on all consumers	Automatic	Manual-confirm rows if needed
`validate_partial`	Some healthchecks failed	Retry or abort	Retry failed rows
`validate_failed`	All healthchecks failed	Abort	Investigate consumer reachability
`validated`	All consumers healthy with new token	Yes	Type-to-confirm + FreeScout ID, then click Revoke
`revoking`	Velvet calling vendor to revoke old token	Automatic	Wait
`revoke_failed`	Vendor revoke API returned error	Retry or mark manual	See section 6
`done`	Rotation complete; old token revoked	Terminal	None
`aborted`	Operator or system aborted	Terminal	Check residual state (section 6)

Revocation flow statuses:

Status	Meaning
`rev_init`	Revocation job created
`rev_revoking`	Vendor revoke call in flight
`rev_revoke_failed`	Vendor rejected the revoke call
`rev_revoked`	Vendor confirmed revocation; validating consumers
`rev_validating`	Healthchecks running (expecting 401 from each consumer)
`rev_leaked`	One or more consumers returned non-401 after revocation — SEV1
`rev_done`	All consumers confirmed locked out

6. Stuck job diagnosis and recovery

Definition of stuck: A job has been in the same status for more than 5 minutes without a state change, OR a job is in a terminal state (distribute_partial, revoke_failed, aborted) that requires operator action.

6a. Job stuck in `verifying`

The auth probe is timing out or being rate-limited.

Check Velvet logs: heroku logs --app raxx-velvet-prod | grep job_id=<id>
Look for ConnectionError, Timeout, or HTTP status code.
If the vendor API is rate-limiting: wait 2 minutes and click "Retry" in the console.
If the vendor API is returning 401: the current token is already invalid. Stop and use the vendor's manual revocation + re-issue process. File a FreeScout incident ticket.

6b. Job stuck in `minting`

Check Velvet logs for mint failed entries.
If the vendor returned 401 on the mint call, the old token drifted between Verify and Mint. This is rare but possible if two rotation jobs ran simultaneously. Abort this job; the old token is still valid.
If the vendor returned 5xx: retry once. If it fails again, wait 10 minutes (vendor-side transient issue) and retry.

6c. Job in `distribute_partial`

Some consumers received the new token; others did not. The old token is still valid.

Recovery options:

Option A (preferred): Retry failed rows in the console. Click "Retry failed" for each red row. Velvet will re-attempt the PATCH/INFISICAL_WRITE for those consumers only.

Option B (if retries keep failing): Identify the failing consumer(s) by their consumer_id in the job status. Manually push the new token to that consumer using the vendor's own interface. Once done, click "Manual confirm" in the console to mark that row as succeeded. After all rows are green, advance to Stage 3.

Option C (if you need to abort): Click Abort. The new token is now minted but distributed only to some consumers. You must manually delete the new token from vault and re-sync the affected consumers to the old token. The console shows the residual consumer list. File a FreeScout ticket and follow per-vendor SOP in docs/ops/runbooks/rotation/.

6d. Job in `validate_partial` or `validate_failed`

Healthchecks failed on one or more consumers after the new token was distributed.

Check which consumers are showing validate_status: failed in the job status response.
Confirm the consumer application has restarted and loaded the new config var. For Heroku apps: heroku ps --app <app-name> — if dyno is crashed, that's your answer.
Wait 60 seconds for dyno restart to complete, then click "Retry validation" in the console.
If a consumer has healthcheck_endpoint: null in the manifest, a "Manual confirm" button appears. Verify the consumer manually, then click confirm to mark it as validated.
If validation keeps failing: check whether the distribute step actually wrote the new token. Use the per-vendor SOP to verify the config var value was updated.

6e. Job in `revoke_failed`

The new token is distributed and validated. The old token has not been revoked.

Note the old_auth_id or equivalent from the job metadata (visible in the console audit summary and in Velvet logs).
Retry the revoke in the console ("Retry revoke" button).
If the vendor returns 404 on the revoke call, the old token was already deleted outside Velvet. Click "Mark manually revoked" and enter the FreeScout ticket ID. Velvet will advance to done.
If the vendor keeps returning errors: revoke the old token manually via the vendor dashboard or CLI. Then click "Mark manually revoked" with the FreeScout ticket ID.

6f. Job aborted from `minted` or `distribute_partial`

The new token exists in vault but the old token is still valid. Both tokens are now live simultaneously.

Cleanup required:

Aborted from	Action
`minted` (new token in vault, not distributed)	Delete the new token from the vault path. The old token remains the active credential. File a FreeScout ticket documenting the orphaned token.
`distribute_partial`	Document which consumers have the new token and which have the old (check the `rotation_job_consumers` rows). Manually sync all consumers back to the old token. Then delete the new token from vault.
`validated`	Distribution and validation succeeded; only revocation is pending. You may manually revoke the old token via the vendor dashboard, then use "Mark manually revoked" in the console.

7. Rollback

Velvet does not support one-click rollback. Once the new token has been distributed and the old token revoked, there is no automated path back.

What is reversible:

Before proceed_revoke: The old token is still valid. Abort the job. Manually roll back any consumers that received the new token to the old token. Delete the new token from vault.
After done: The old token is revoked. Re-rotation is required: start a new operational rotation job to mint a fresh token.

What is NOT reversible:

Revocation of a Cloudflare User API token: CF does not support re-activating a revoked token. You must create a new token in the CF dashboard.
Revocation of a Heroku OAuth authorization: The authorization ID is gone. A new OAuth authorization must be minted.
Any token that the vendor marks as single-use after deletion.

8. Rotating Velvet's own bootstrap credentials (Invariant I7)

Velvet's own credentials (INFISICAL_CLIENT_SECRET, INFISICAL_CLIENT_ID, HK_VELVET_BOOTSTRAP) are stored as Heroku config vars, not in vault, to break the bootstrap circularity. Velvet cannot rotate them itself.

Rotating `INFISICAL_CLIENT_SECRET`

In the Infisical dashboard, generate a new client secret for the Velvet machine identity.
Set the new value on both Heroku apps:

heroku config:set INFISICAL_CLIENT_SECRET=<new_value> --app raxx-velvet-prod >/dev/null 2>&1 heroku config:set INFISICAL_CLIENT_SECRET=<new_value> --app raxx-velvet-staging >/dev/null 2>&1

Note: always redirect to /dev/null 2>&1 — heroku config:set echoes config vars to stdout by default (feedback: heroku_config_set_echoes_secrets).

Verify Velvet restarts and /healthz returns 200 on both apps.
Revoke the old client secret in the Infisical dashboard.
Record the rotation in a FreeScout ticket.

Rotating `HK_VELVET_BOOTSTRAP`

This token is used by Velvet to authenticate its PATCH calls to Heroku config vars on behalf of consumer updates.

Use the Heroku Platform API or dashboard to create a new OAuth authorization for the Velvet machine user.
Set the new token:

heroku config:set HK_VELVET_BOOTSTRAP=<new_token> --app raxx-velvet-prod >/dev/null 2>&1 heroku config:set HK_VELVET_BOOTSTRAP=<new_token> --app raxx-velvet-staging >/dev/null 2>&1

Verify /healthz on both apps.
Revoke the old authorization via the Heroku dashboard or:

heroku authorizations:revoke <old-auth-id>

Update the companion secret HK_VELVET_BOOTSTRAP__AUTH_ID in Infisical with the new authorization UUID, so the next rotation can find it.

9. Failure modes by adapter

Postmark (`PM_SERVER_MAIL`)

Failure	Meaning	Action
`verify_failed` with HTTP 401	Postmark token already invalid	Rotate manually: generate new token in Postmark dashboard, enter new value via Velvet OPERATOR_MANUAL or directly in vault
`distribute_partial` on `infisical-postmark-prod`	Infisical write failed	Check `INFISICAL_CLIENT_ID`/`INFISICAL_CLIENT_SECRET` Heroku config vars; retry
`validate_failed` HTTP 401	New token not yet active at Postmark (rare propagation delay)	Wait 30 seconds; retry validation
Revoke not automated	Postmark does not expose a token-delete API	Operator must manually delete the old server token in the Postmark dashboard; click "Mark manually revoked"

Heroku (`HK_PLATFORM_FULL`)

Failure	Meaning	Action
`verify_failed` with "old token invalid"	`HEROKU_PLATFORM_API_TOKEN` in Velvet config vars is drifted	Follow `docs/ops/runbooks/heroku-api-key-drift-recovery.md`
`distribute_partial` — one Heroku app	PATCH to that app returned non-200	Check if the app exists: `heroku apps --app <app-name>`. If the app was deleted, remove it from the manifest; mark consumer row as skipped
`revoke_failed` with "revoke_pending"	Old auth DELETE failed after distribute succeeded	Note `old_auth_id` from logs; manually revoke via `heroku authorizations:revoke <id>`; mark manually revoked
`distribute_partial` — github-actions-heroku-api-key	GitHub Actions secret PUT failed	Check `GH_APP_OPS_BOT` token in vault; verify repo name is correct in manifest

Cloudflare (`CF_DNS_EDIT_RAXX_APP`, others)

Failure	Meaning	Action
Consumer `active: false` in manifest	CF adapter pending OQ7 resolution	Rotate manually per `docs/ops/runbooks/rotation/cloudflare-user-api-token.md`
OPERATOR_MANUAL flow — operator entered wrong value	New token does not validate at CF	Re-enter the correct token value; Velvet will re-attempt vault write

Note: scripts/ops/probe_cf_token_perms.py reads Cloudflare token permissions directly from Infisical. It does not go through Velvet. This is intentional — it is a read-only diagnostic tool and has not been migrated to the Velvet bus.

AWS SSM (`AWS_ACCESS_KEY_ID`, password-class credentials)

Failure	Meaning	Action
`distribute_partial` — SSM write 403	Velvet's IAM role lacks `ssm:PutParameter` on the target path	Verify the IAM policy attached to the Velvet Heroku dyno's assumed role covers `/raxx/{env}/{vendor}/{name}`
SSM path not found (404 on read)	SSM path does not exist yet	First rotation creates the path; if the path was deleted externally, it will be re-created by the adapter

10. SEV1 — `rev_leaked` response

If a revocation job reaches rev_leaked, one or more consumers returned a non-401 response after the old token was confirmed revoked at the vendor. This means at least one consumer still has a copy of the revoked token and may be accepting it.

Immediate steps:

You will have received a Slack DM on channel SL_BOT_NOTIFY within 30 seconds of the flag being set. The message includes the job_id, credential_name, and the list of leaking consumer_ids.
Open the Velvet console: the leaked consumers are highlighted in red with an "Investigate" button.
Click "Investigate" — this auto-creates a FreeScout ticket pre-filled with the consumer list.
For each leaking consumer: a. Determine whether the consumer is still actively serving traffic. b. If yes: immediately disable or restart the consumer to force it to reload config. c. Verify the consumer is no longer accepting the revoked token by re-running the healthcheck manually.
Once all consumers return 401, click "Confirm leak resolved" in the Velvet console. The job advances from rev_leaked to rev_done.
If any consumer cannot be forced to reject the token (e.g., a caching layer with a long TTL), escalate to a security incident per the security response runbook.

Root causes of rev_leaked:

Consumer cached the token in memory and has not been restarted since rotation.
Consumer received the new token via distribute but reverted to an old value from a local config file.
Consumer's healthcheck endpoint is cached or proxied and is not reflecting the real auth state.

11. Staging vs. production

APP_ENV on each Heroku dyno controls which Infisical environment and SSM path prefix is used.

App	`APP_ENV`	Infisical env slug	SSM path prefix
`raxx-velvet-prod`	`prod`	`prod`	`/raxx/prod/`
`raxx-velvet-staging`	`staging`	`staging`	`/raxx/staging/`

The subscription manifest uses env: prod and env: staging per consumer row. A rotation job on raxx-velvet-prod only fans out to consumers with env: prod.

The Heroku app consumer rows for staging config vars (raxx-console-staging, raxx-api-staging) are registered with env: prod in the manifest — this is intentional. The staging apps' config vars hold the same credential (the Heroku platform key), which is a single credential shared across environments.

12. Common operator mistakes and fixes

Mistake	Symptom	Fix
Starting a prod rotation against `raxx-velvet-staging`	Job fans out to staging consumers only; prod consumers never receive the new token	Abort the job. Re-run against `raxx-velvet-prod`.
Forgetting to open a FreeScout ticket before rotating	Cannot enter ticket ID at Stage 3 revoke gate	Open the ticket now. The gate enforces non-empty input but does not validate the ticket exists.
Clicking "Abort" from `validated` thinking it rolls everything back	New token stays in vault and distributed; old token stays live	See section 6f — abort from `validated` requires manual revocation of the old token only.
Retrying a `revoke_failed` job with a different auth token	Second revoke attempt uses stale auth	Ensure `HK_VELVET_BOOTSTRAP` or the relevant auth token in vault is current before retrying.
Two operators starting rotations for the same credential simultaneously	Second job's verify step returns "active rotation already in progress"	Only one operational rotation per credential can be in flight at a time. The first job must reach `done` or `aborted` before the second can start.
Running `heroku config:set` without redirecting stdout	Secret value printed to terminal and shell history	Always use `heroku config:set VAR=value >/dev/null 2>&1`
Checking vault for the new token value after rotation completes	Token value is not available via Velvet after the job reaches `done`	Read from Infisical directly using the machine identity; Velvet does not store the token value after rotation.

13. Health endpoint returns 404 or NXDOMAIN (DEGRADED on console)

Symptom: Console shows velvet-prod DEGRADED.

Note: The Heroku app's .herokuapp.com hostname changed format when the app was created. The old-style raxx-velvet-prod.herokuapp.com returns "No such app" — always use the randomized hostname from heroku domains --app raxx-velvet-prod as the fallback probe URL: raxx-velvet-prod-b0cea70d1b98.herokuapp.com

13a. Stale slug (endpoint missing)

Cause: A code change added /health or /healthz to velvet/app.py but prod was not redeployed.

Diagnosis:

# Check deployed slug SHA vs. current main
heroku releases --app raxx-velvet-prod --num 3

# Probe via the correct Heroku hostname (get it from: heroku domains --app raxx-velvet-prod)
curl -sf https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/healthz
curl -I  https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/health

Fix: Trigger a production deploy of the current main branch via the GH Actions workflow:

Navigate to: Actions → "Deploy Velvet" → "Run workflow"
Set environment = production, ref = main
The workflow runs a subtree-split of velvet/ and pushes to Heroku.
The smoke check in the workflow polls /healthz.

# Verify after deploy (use hostname from: heroku domains --app raxx-velvet-prod)
curl -sf https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/health  # expect 200
curl -sf https://raxx-velvet-prod-b0cea70d1b98.herokuapp.com/healthz # expect 200

Resolved 2026-05-29 14:41 UTC: v11 (slug efc8f6e1) deployed; 463h DEGRADED ended. velvet.raxx.app/healthz probe will return OPERATIONAL once DNS CNAME and ACM cert are in place (section 15).

13b. Custom domain not in DNS (NXDOMAIN)

Cause: velvet.raxx.app had no Cloudflare CNAME record. The console probe targets https://velvet.raxx.app/healthz per config/status-surfaces.yaml, which returned NXDOMAIN regardless of dyno health.

Diagnosis:

dig velvet.raxx.app
# Expected: CNAME → closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com
# Symptom: NXDOMAIN

Fix: Three steps required (all completed 2026-05-29):

Add the custom domain to Heroku: heroku domains:add velvet.raxx.app --app raxx-velvet-prod heroku domains:wait velvet.raxx.app --app raxx-velvet-prod # DNS target: closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com
Enable Heroku ACM (required for TLS — disabled by default on new apps): heroku certs:auto:enable --app raxx-velvet-prod heroku certs:auto --app raxx-velvet-prod # Wait for status: OK (~5 min after CNAME propagates) Without this step, HTTPS returns CF error 525 (TLS handshake failure) even after CNAME is live.
Add CNAME record in Cloudflare (done 2026-05-29 15:00 UTC, record ID 1db2a5e38fa5705cb61fe9c8682320e5): # Using CLOUDFLARE_EDIT_DNS token from vault: curl -sS -X POST \ -H "Authorization: Bearer $CLOUDFLARE_EDIT_DNS" \ -H "Content-Type: application/json" \ "https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/dns_records" \ -d '{"type":"CNAME","name":"velvet","content":"closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com","proxied":true}'

After DNS propagates (~1–5 min):

curl -sf https://velvet.raxx.app/healthz
# Expect: {"status": "ok", "service": "velvet"}

14. deploy-velvet.yml smoke check false failures

The workflow's STAGING_URL and PROD_URL env vars reference the old <appname>.herokuapp.com hostname format (e.g., raxx-velvet-prod.herokuapp.com). Heroku now provisions randomized hostnames (e.g., raxx-velvet-prod-b0cea70d1b98.herokuapp.com). The old hostnames return "No such app" (404) from the Heroku router, which causes the post-deploy /healthz smoke check to fail even when the deploy and dyno are healthy.

Fixed 2026-05-29: PR #3088 updated both URLs: - STAGING_URL → https://raxx-velvet-staging-609f3019292a.herokuapp.com - PROD_URL → https://velvet.raxx.app (custom domain — stable across Heroku rebuilds)

15. velvet.raxx.app DNS — completed 2026-05-29

Status (2026-05-29 15:00 UTC): CNAME created in Cloudflare; ACM enabled on Heroku; cert provisioning (~5 min).

Heroku DNS target:

closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com

Completed actions:

CF CNAME record created (record ID: 1db2a5e38fa5705cb61fe9c8682320e5): velvet.raxx.app CNAME → closed-woodpecker-mn8ijxw0n3ek8lbi0t0jjfkr.herokudns.com (proxied)
Heroku ACM enabled: heroku certs:auto:enable --app raxx-velvet-prod
deploy-velvet.yml PROD_URL updated to https://velvet.raxx.app (PR #3088, merged)

Remaining (terraform hardening — SEV-4, non-blocking): Create terraform/velvet/dns.tf using terraform/queue/dns.tf as the pattern for durability. Add velvet-staging.raxx.app similarly once the staging custom domain is registered.

Verify:

curl -sf https://velvet.raxx.app/healthz
# Expect: {"status": "ok", "service": "velvet"}

The console will show velvet OPERATIONAL within one ~3-min poll cycle after ACM cert is active.

16. Slack DM notifications

Terminal events (job done, aborted, rev_leaked) trigger a Slack DM to the operator's channel.

Bot channel for automated alerts: SL_BOT_NOTIFY (configured in Velvet Heroku config vars). Operator DM for walk-away pings: D0AJ7K184TV (Kristerpher's DM channel).

rev_leaked alerts are additionally sent within 30 seconds to the ops alert inbox (ops@raxx.app).