Raxx · internal docs

internal · gated ↑ index

ADR-0041 — Velvet consumer registration: runtime API + manifest bootstrap (supersedes ADR-0040)

Status: Accepted (2026-05-03) Supersedes: ADR-0040 Related: ADR-0037, ADR-0038, ADR-0039

Context

ADR-0040 (merged 2026-05-03 ~06:15 UTC in PR #944) chose static-manifest-only registration for Velvet consumers. The reasoning was security: a runtime registration endpoint widens attack surface.

In conversation 2026-05-03 ~07:00 UTC, the operator answered OQ1 explicitly:

"Runtime API — I'm looking for the flow that makes this a bit more dynamic. I feel like holding values is catastrophic at some point. If you disagree, state your reasons."

Operator's concern: a static manifest drifts from code-reality over time. Consumers added to the codebase without manifest updates are silently excluded from rotations. Consumers removed from the codebase but still in the manifest get repeatedly poked with credentials they no longer use. Drift is a footgun, especially as the consumer count grows.

This ADR supersedes ADR-0040 with a hybrid model that captures both arguments.

Decision

Hybrid registration: manifest is the bootstrap seed; runtime API is the durable source of truth.

Security mitigations for the runtime endpoint

The OQ1 security concern from ADR-0040 stands; the mitigations make the runtime endpoint safe:

  1. Caller authentication — every registration request carries a per-caller scoped Velvet service token (D3, locked 2026-05-03). The middleware (V5/#912) verifies the token's identity against the authz table and rejects unknown callers.

  2. mTLS on /api/v1/subscribers — all registration traffic flows over a mutually-authenticated TLS channel. Internal-only — not reachable from the public internet. Cloudflare Access service-token gate at the edge, plus Velvet's own per-caller token at the app layer (defense in depth).

  3. Registration TTL with periodic re-register — every subscriber row carries expires_at = registered_at + 24h. Consumers re-register hourly via a side-effect of normal startup; rows past their TTL are flagged stale and excluded from rotation distribute. Stale rows persist for 7 days for audit visibility, then auto-prune.

  4. Allowlist of known consumer identities — the velvet_caller_authz table enumerates which callers may register subscribers for which token names. A console caller cannot register a subscriber for STRIPE_RESTRICTED_KEY, only for credentials it actually uses. The authz table is hand-managed (no runtime registration of callers themselves; that ladder ends).

  5. Audit on every registrationvelvet.subscriber.registered, velvet.subscriber.updated, velvet.subscriber.removed, velvet.subscriber.expired events. Tail the audit log → spot anomalies.

Why hybrid (not pure-runtime)

A pure-runtime model has a chicken-and-egg failure mode: when Velvet itself first boots, the registry is empty. No consumer can register because no consumer knows Velvet exists yet. First rotation against a cold registry would silently distribute to nobody. The bootstrap manifest avoids this by seeding the initial set; runtime API takes over from there.

Consequences

Positive: - Drift between code and config is eliminated — consumers register from inside their own startup paths. - New consumers onboard with code, not config — no separate manifest PR needed. - Removed consumers self-deregister or expire via TTL. - Manifest stays useful for first-deploy and DR scenarios.

Negative: - M1 implementation cost grows: NV1 (#945) now needs both a manifest loader AND a runtime registration handler with mTLS termination. Estimated +1-2 days. - The drift-reporter is an additional moving piece that itself needs monitoring (drift between drift-reporter expectations and reality is a real meta-problem). - Per-caller authz table is hand-managed — adds operator overhead when onboarding new caller classes.

Migration path: - M1 ships the manifest loader first (NV1 baseline). - M1.5 ships the runtime API + drift reporter (additive; no removal of manifest support). - Existing subscribers from the manifest are auto-imported on first M1.5 boot, marked source: manifest_seed. - Subscribers re-registering via runtime API update their row to source: runtime_api.

Open work

Decision matrix considered

Option Pros Cons
Pure static manifest (ADR-0040) Smallest attack surface; everything reviewable in PRs Drifts catastrophically; cold-start consumers absent from rotation
Pure runtime API Self-healing, no drift Cold-start with empty registry; no DR seed; no source-control history
Hybrid (this ADR) Bootstrap from manifest, durable via API, drift visible via reporter More code to ship in M1; operator overhead on authz table

References