Date: 2026-05-03
Status: Superseded by ADR-0041 (operator answered OQ1 with runtime-API preference 2026-05-03 ~07:00 UTC). Retained in tree for history.
Deciders: Kristerpher (operator), architect-agent
Refs: ADR-0037, #907, v2-rotation-flows design doc
ADR-0037 established that consumers register via a static manifest rather than a runtime API. This ADR documents the manifest format decisions and the explicit rejection of a runtime registration endpoint — specifically addressing the security and operational tradeoffs in more detail, since this choice has downstream implications for every consumer onboarding workflow.
The subscription manifest lives at docs/architecture/velvet/subscription-manifest.yml. It is a YAML file versioned in the main repository. Velvet loads it at startup (app.py → bus.load_manifest(path)). Re-load is triggered by SIGHUP (for zero-downtime manifest updates on Heroku, this means a config-var change that triggers a dyno restart — acceptable).
The manifest parser validates on load:
- All update_endpoint URIs are HTTPS (reject http:// and velvet:// pass-throughs are whitelisted).
- All consumer_id values are unique within (token_name, env).
- Referenced update_auth_token_name values exist in vault (probe at startup; startup fails if any referenced token is missing — fail-fast).
- capabilities contains at least [update].
A failed manifest load blocks Velvet from starting. This is intentional: a broken manifest means Velvet cannot safely perform rotations, and it is better to fail loudly at deploy time than to silently skip consumers at rotation time.
Positive:
- Every new consumer addition is a PR. It is reviewed, CI-linted, and merged — the same workflow as any code change. The audit trail for "when was this consumer added and who approved it" is the git log.
- The manifest is a complete picture of credential exposure: at any time, grep -r "HK_PLATFORM_FULL" subscription-manifest.yml tells you every system that holds a copy.
- Startup validation catches manifest errors before they reach a rotation job. A misconfigured consumer is discovered at deploy time, not mid-rotation.
Negative:
- Adding a new consumer requires a repository commit, PR, review, merge, and Velvet deploy. This is approximately 10-30 minutes if the pipeline is healthy, longer if there is review latency.
- Removing a decommissioned consumer from the manifest requires the same cycle. Until the PR merges, Velvet will attempt to distribute to the decommissioned consumer on every rotation and fail — generating noisy distribution errors in the UI.
- Mitigation for the decommission case: each manifest entry supports active: false which causes Velvet to skip the consumer during distribution (no attempt, no failure) while keeping the entry for audit purposes.
The manifest carries a format_version field (currently 2). Breaking changes to the manifest schema increment this version; Velvet checks format_version on load and refuses to start if it does not match the expected version. This prevents a stale Velvet binary from misinterpreting a newer manifest.
Changes to an existing entry (e.g., updating an update_endpoint URL because an app was migrated) require a PR but do not require a format version bump.
Runtime registration API (explicit rejection):
A POST /subscriptions endpoint would allow consumers to register at boot without a manifest PR. This was evaluated and rejected for the following reasons:
Credential exposure surface: A service token with rotate scope + a valid consumer_id would be sufficient to register a subscription. If such a token is leaked, an attacker could register a subscription that receives future token values for any token the attacker names. This transforms a leaked service token into a long-term credential-harvesting mechanism.
Inconsistent subscriber state: Consumers that are not running (scaled-to-zero Heroku dynos, paused GitHub Actions workers) would not appear in the runtime registry. Velvet would skip them during distribution — silently, if the registration was ephemeral. The static manifest guarantees that every consumer is always registered, running or not.
Audit gap: A runtime registration can be created and destroyed between two audit reviews. The git history of the manifest file cannot be retroactively altered.
Complexity for no gain: The majority of Raxx's consumers are long-lived infrastructure systems (Heroku config vars, GitHub Actions secrets, SSM parameters). These do not change deployment topology frequently. The overhead of a manifest PR is proportionate to the infrequency of topology changes.