ADR-0037: Velvet — Service-Bus Subscription Model

Date: 2026-05-03
Status: Accepted
Deciders: Kristerpher (operator), architect-agent
Refs: #907 (Velvet epic), v2-rotation-flows design doc, incident rot_2b1e 2026-05-02

Context

The v1 Velvet handler design had each vendor handler hard-code its distribution destinations: a list of Heroku apps, a GitHub Actions secret name, and an Infisical path. When a new consumer needed a token (e.g., a new Heroku app was provisioned), the handler file required a code change and a deploy. More critically, the distribution list lived inside the rotation logic — so when the first step of the handler failed (invalid old token), the error message enumerated all five destinations as if it had attempted them, creating a false picture of partial distribution.

Kristerpher's v2 directive (2026-05-03 06:00 UTC) explicitly called for a service-bus model: "every system registers the tokens it needs so that when a roll happens, each and every service can be updated."

The design question is: how does a consumer register?

Decision

Consumers register via a static versioned manifest checked into the repository at docs/architecture/velvet/subscription-manifest.yml. Velvet loads this manifest at startup and re-reads it on SIGHUP. No runtime registration API is provided.

Each manifest entry declares: - token_name — which credential this consumer holds - consumer_id — stable unique ID - env — prod or staging - update_endpoint + update_method — how Velvet delivers the new value - healthcheck_endpoint — how Velvet verifies the consumer is using the new value - capabilities — what this consumer supports (update, healthcheck, update_no_verify)

Consequences

Positive: - The manifest is diffable, reviewable, and auditable. Adding a new consumer requires a PR — it goes through code review, CI lint, and the normal deploy pipeline. - No registration API attack surface. An attacker who compromises a consumer service cannot register a new subscription to receive future token values. - The manifest is the single source of truth for "which systems hold a copy of this token" — critical for GDPR credential-access audit and breach notification scope. - Restart-safe: Velvet knows its subscriber list immediately on startup, before any rotation job is created.

Negative: - Adding a new consumer requires a code change + deploy to Velvet. This is a 5-minute operation for an operator with console access, but it is not zero-friction. - If a consumer is decommissioned but not removed from the manifest, Velvet will attempt to distribute and fail on that consumer. Distribution failures are surfaced in the UI — the operator must remove the decommissioned entry and retry.

Alternatives Considered

Option B — Runtime registration API: Consumers POST to POST /subscriptions at boot with their subscription record, authenticated with a rotate-scoped service token. Velvet persists in the rotation_job_consumers (or a separate subscriptions) table.

Rejected because: - It creates a write endpoint whose compromise gives an attacker the ability to subscribe to future token values. The blast radius of a leaked service token expands dramatically. - Transient registration means Velvet's subscriber list is only accurate when all consumers are running. A consumer that is stopped (e.g., staging app scaled to zero) does not appear in the list, silently missing distribution. - Harder to audit. "What systems currently hold this token?" becomes a database query with uptime dependency, not a grep on a manifest file.

Option C — Hard-coded per-handler: Each vendor handler contains its own distribution list (the v1 approach).

Rejected because this was the root cause of the rot_2b1e incident. Distribution destinations are not a vendor concern — they are a deployment topology concern.