Raxx · internal docs

internal · gated ↑ index

ADR-0040: Velvet — Static Manifest for Consumer Registration (No Runtime API)

Date: 2026-05-03
Status: Superseded by ADR-0041 (operator answered OQ1 with runtime-API preference 2026-05-03 ~07:00 UTC). Retained in tree for history.
Deciders: Kristerpher (operator), architect-agent
Refs: ADR-0037, #907, v2-rotation-flows design doc


Context

ADR-0037 established that consumers register via a static manifest rather than a runtime API. This ADR documents the manifest format decisions and the explicit rejection of a runtime registration endpoint — specifically addressing the security and operational tradeoffs in more detail, since this choice has downstream implications for every consumer onboarding workflow.


Decision

The subscription manifest lives at docs/architecture/velvet/subscription-manifest.yml. It is a YAML file versioned in the main repository. Velvet loads it at startup (app.pybus.load_manifest(path)). Re-load is triggered by SIGHUP (for zero-downtime manifest updates on Heroku, this means a config-var change that triggers a dyno restart — acceptable).

The manifest parser validates on load: - All update_endpoint URIs are HTTPS (reject http:// and velvet:// pass-throughs are whitelisted). - All consumer_id values are unique within (token_name, env). - Referenced update_auth_token_name values exist in vault (probe at startup; startup fails if any referenced token is missing — fail-fast). - capabilities contains at least [update].

A failed manifest load blocks Velvet from starting. This is intentional: a broken manifest means Velvet cannot safely perform rotations, and it is better to fail loudly at deploy time than to silently skip consumers at rotation time.


Consequences

Positive: - Every new consumer addition is a PR. It is reviewed, CI-linted, and merged — the same workflow as any code change. The audit trail for "when was this consumer added and who approved it" is the git log. - The manifest is a complete picture of credential exposure: at any time, grep -r "HK_PLATFORM_FULL" subscription-manifest.yml tells you every system that holds a copy. - Startup validation catches manifest errors before they reach a rotation job. A misconfigured consumer is discovered at deploy time, not mid-rotation.

Negative: - Adding a new consumer requires a repository commit, PR, review, merge, and Velvet deploy. This is approximately 10-30 minutes if the pipeline is healthy, longer if there is review latency. - Removing a decommissioned consumer from the manifest requires the same cycle. Until the PR merges, Velvet will attempt to distribute to the decommissioned consumer on every rotation and fail — generating noisy distribution errors in the UI. - Mitigation for the decommission case: each manifest entry supports active: false which causes Velvet to skip the consumer during distribution (no attempt, no failure) while keeping the entry for audit purposes.


Manifest versioning policy

The manifest carries a format_version field (currently 2). Breaking changes to the manifest schema increment this version; Velvet checks format_version on load and refuses to start if it does not match the expected version. This prevents a stale Velvet binary from misinterpreting a newer manifest.

Changes to an existing entry (e.g., updating an update_endpoint URL because an app was migrated) require a PR but do not require a format version bump.


Alternatives Considered

Runtime registration API (explicit rejection):

A POST /subscriptions endpoint would allow consumers to register at boot without a manifest PR. This was evaluated and rejected for the following reasons:

  1. Credential exposure surface: A service token with rotate scope + a valid consumer_id would be sufficient to register a subscription. If such a token is leaked, an attacker could register a subscription that receives future token values for any token the attacker names. This transforms a leaked service token into a long-term credential-harvesting mechanism.

  2. Inconsistent subscriber state: Consumers that are not running (scaled-to-zero Heroku dynos, paused GitHub Actions workers) would not appear in the runtime registry. Velvet would skip them during distribution — silently, if the registration was ephemeral. The static manifest guarantees that every consumer is always registered, running or not.

  3. Audit gap: A runtime registration can be created and destroyed between two audit reviews. The git history of the manifest file cannot be retroactively altered.

  4. Complexity for no gain: The majority of Raxx's consumers are long-lived infrastructure systems (Heroku config vars, GitHub Actions secrets, SSM parameters). These do not change deployment topology frequently. The overhead of a manifest PR is proportionate to the infrequency of topology changes.