Raxx · internal docs

internal · gated ↑ index

Velvet v2 — PM Card Rework

Date: 2026-05-03
Author: PM agent (raxx-pm-bot)
Parent epic: #907
Trigger: Kristerpher's 2026-05-03 ~06:00 UTC pivot to three-flow rotation architecture
Architect doc (in-flight): docs/architecture/velvet/v2-rotation-flows.md — not yet landed; sections referenced below will be cross-linked once the file is committed
Scope doc (v1, for annotation): docs/architecture/velvet/scope.md — supersession notes at end of this document


1. What Changed in v2

Kristerpher's directive collapses the V1 single-handler-pattern model into three named flows with an explicit service-bus registration layer:

Flow Description Terminal signal
Testing Validates auth, permissions, and token visibility before any mutation N/A (read-only pass/fail)
Operational Stage 1 → Stage 2 → Stage 3 orchestrated rotation (verify, mint + distribute, validate + revoke) completed job status
Revocation Terminate token immediately; no re-mint; wait for 401 to confirm 401 from consumer

The service-bus model requires that every system that consumes a token registers against that token so that a rotation event fans out to all registered consumers automatically. This replaces the v1 pattern where the distribute step held a hardcoded destination list per handler.

The modal UI per stage (Stage 1 / 2 / 3 progress visibility + type-to-confirm gates) is now a first-class deliverable, not a post-M3 polish item.


2. V1 Card-by-Card Disposition (V1–V11 = #908–#918)

V1 — #908: Scaffold Heroku app pair + Postgres add-on

Disposition: KEEP — no body edits needed

The app pair scaffold, Postgres add-on, CF Access pattern, and bootstrap config var strategy are identical in v2. The only addition is that rotation_jobs will eventually grow a flow column (testing / operational / revocation) — that is a schema concern in V3, not V1.


V2 — #909: GET /tokens/{name} read-through proxy (Infisical)

Disposition: KEEP — minor body note needed

The read-through proxy surface is unchanged. However the audit log event in /value should be enriched to include flow_context when the read is part of a Testing-flow probe. Propose a comment on #909 (not a body edit):

"v2 note: The /value endpoint's audit log line should include an optional flow_context field (testing / operational / revocation) injected by the job runner when the read is part of a named flow. The endpoint itself does not change — the job runner sets context. Flag for implementation PR."


V3 — #910: rotation_jobs Postgres schema + migration

Disposition: REVISE — schema needs two new columns

The v2 three-flow model requires tracking which flow a job belongs to and the current stage within that flow.

Proposed body edits (to propose to card-groomer for the next grooming pass):

  1. Add flow column: TEXT NOT NULL CHECK (flow IN ('testing', 'operational', 'revocation')) DEFAULT 'operational'
  2. Add stage column: TEXT CHECK (stage IN ('stage_1_verify', 'stage_2_mint_distribute', 'stage_3_validate_revoke', NULL)) — nullable; populated as the job runner advances
  3. Extend status CHECK constraint to include 'revoked' as a terminal state distinct from 'completed' (revocation flow completes differently: old token is gone; new token was never minted)
  4. Add subscriber_snapshot JSONB column to capture the registered subscriber list at the time of job creation — immutable after job starts; used for audit and partial-failure replay

No other columns change. Indexes are unaffected.


V4 — #911: POST /tokens/{name}/rotate kickoff + poll endpoint

Disposition: REVISE — three flows, not one kickoff shape

The kickoff endpoint must accept a flow parameter:

POST /tokens/{name}/rotate
{
  "flow": "operational" | "testing" | "revocation",
  "idempotency_key": "..."
}

For the revocation flow, the endpoint name should arguably be POST /tokens/{name}/revoke (separate endpoint) rather than /rotate with flow=revocation. This is an open decision — see Section 4, decision OD-1.

The poll endpoint GET /tokens/{name}/rotations/{job_id} should return stage in addition to status so the console modal can reflect which stage is in progress.

Proposed body edits: - Scope section: add flow field to request body; add stage field to response body - Acceptance criteria: add tests for each flow type kickoff; add test for stage field in poll response - Explicitly document that flow=testing does NOT advance to mint/distribute; job completes at stage_1 with a tested pseudo-status


V5 — #912: Service-token auth middleware + rotation-authz matrix

Disposition: KEEP — no body edits needed

The auth model is flow-agnostic. The scope matrix (read, rotate) covers all three flows. The revocation flow uses rotate scope per D7 resolution. No changes needed here.


V6 — #913: AWS SSM Parameter Store integration (password class)

Disposition: KEEP — D2 gating still applies

SSM integration is flow-agnostic (it is a backing store, not a handler). Still blocked on D2 confirmation. No body edits needed.


V7 — #914: Postmark rotation handler (mint/validate/distribute/revoke skeleton)

Disposition: REPLACE

The v1 handler pattern (handler.mint / validate / distribute / revoke as a monolithic four-function object) is superseded by the bus-adapter pattern. Instead of a handler that contains its own distribute logic, v2 has:

Proposed replacement card (new issue, not yet filed):

V7-v2: "Build flow runner — three-stage orchestrator for operational + testing + revocation flows" - Orchestrates Stage 1 (verify auth + permissions), Stage 2 (mint via vendor module + fan-out to registered subscribers), Stage 3 (validate on new token + execute revoke via vendor module) - Reads subscriber registry for the credential at job creation time; snaps list to subscriber_snapshot JSONB - Advances stage column in rotation_jobs as it progresses - Emits per-stage audit events - Testing flow: executes Stage 1 only; no mutations; job status = tested - Revocation flow: skips Stage 2 mint; executes Stage 3 revoke only; job status = revoked

See Section 3 (New Card Slate) for the full V7-v2 and related new cards.


V8 — #915: Heroku rotation handler — port from #891 fix

Disposition: REPLACE

This card's entire "four-function handler" shape is v1. In v2, the Heroku-specific logic splits into:

  1. Heroku vendor module (token_service/vendors/heroku.py) — mint (OAuth authorization creation) + validate (GET /account) + revoke (delete OAuth authorization). No distribute logic.
  2. Heroku config-var bus adapter (token_service/adapters/heroku_config_var.py) — one adapter instance per Heroku app; each app in the distribution list is a separate registered subscriber with a named adapter

Proposed replacement card:

V8-v2: "Heroku vendor module (mint/validate/revoke) + config-var bus adapter" - Vendor module: mint, validate, revoke via Heroku Platform API (HTTP-only, no CLI) - Bus adapter: HerokuConfigVarAdapter(app_name) — registered once per Heroku app in the subscriber registry; pushes new token via PATCH /apps/{app}/config-vars - GH Actions secret becomes a separate bus adapter GithubActionsSecretAdapter (see new card slate) - Explicit dependency on #925 landing before GH Actions secret adapter can work - Deprecation marker on console/app/services/rotation_handlers/heroku.py — see Section 6


V9 — #916: Cloudflare User API token handler

Disposition: REPLACE

Same pattern as V8. The Cloudflare-specific token resolution and verification logic moves to a vendor module; the Infisical write (currently the only distribute destination) becomes a bus adapter entry.

Proposed replacement card:

V9-v2: "Cloudflare vendor module (mint/validate/revoke) + token-store bus adapter" - Vendor module: mint (POST /user/tokens), validate (GET /user/tokens/verify), revoke (DELETE /user/tokens/{id}) - Companion-secret pattern for __CF_TOKEN_ID storage is retained - token_service/adapters/infisical_write.py is the generic bus adapter that handles the Infisical write destination — reused across all credentials, not Cloudflare-specific

Note: V7, V8, V9 replacements together constitute the bus-adapter system's first three concrete implementations. They should be filed as a set with clear dependencies.


V10 — #917: Migrate first console callsite to Velvet

Disposition: KEEP — add one note

Still valid. Add a comment note: in v2, the migrated callsite reads from GET /tokens/{name}/value on the Velvet API exactly as specified; the bus architecture does not change how consumers read tokens, only how rotations distribute to them. No body edits needed; the card stands.


V11 — #918: Operator runbook + handler-author guide

Disposition: REVISE — scope expands significantly

The handler-author guide content must be replaced with bus-adapter-author guide content. The four-function interface contract is no longer the extension point — the extension point is: 1. Writing a vendor module (thin: mint/validate/revoke only) 2. Writing a bus adapter (how to push a new token value to a specific destination) 3. Registering a subscriber (mapping credential name + consumer system to an adapter instance)

Additionally, the runbook must cover: - The three-flow modal UX: what each stage shows, what the operator does if a stage stalls - How to perform a revocation from the UI vs. API - How to inspect the subscriber registry for a given credential - The deprecation marker convention for v1 handlers

Proposed body edit for card-groomer:

"Scope section rewrite: replace 'four-function interface contract' sections with vendor-module + bus-adapter-author content per v2 architecture. Runbook section additions: three-flow modal UX ops guide, revocation flow SOP, subscriber registry inspection, v1 handler deprecation pattern. Both docs still gate on v2 handlers landing (now V7-v2, V8-v2, V9-v2)."


3. New Card Slate Proposal

The v2 architecture requires approximately 20 cards. Nine carry over from v1 (some revised); eleven are new. The three-flow split plus service-bus infrastructure drives the new count.

Milestone assignment key


Infrastructure — Carry-Over (5 cards, revisions noted)

# Title Disposition Milestone
#908 Scaffold Heroku app pair + Postgres add-on KEEP M1
#909 GET /tokens/{name} read-through proxy KEEP M1
#910 rotation_jobs schema + migration REVISE (flow, stage, subscriber_snapshot columns) M1
#911 POST /tokens/{name}/rotate kickoff + poll REVISE (flow param, stage in response, separate /revoke endpoint TBD per OD-1) M1
#912 Service-token auth middleware KEEP M1

Infrastructure — New Cards (4 cards)

NV1: "Build subscriber registry — per-credential consumer registration + snapshot on job start"

Parent: #907 | Milestone: M2.5 | Depends on: #910 (rotation_jobs schema)

User story: As the Velvet flow runner, I want to query a registry of which systems have subscribed to a given credential so that a rotation job fans out to all of them without per-handler hardcoded lists.

Scope: - subscriber_registry Postgres table: (id, credential_name, consumer_name, adapter_class, adapter_config JSONB, enabled BOOL, env) - GET /tokens/{name}/subscribers — list registered subscribers for a credential - POST /tokens/{name}/subscribers — register a new subscriber (operator action; not automated) - At job creation, the flow runner snapshots the enabled subscriber list into rotation_jobs.subscriber_snapshot JSONB - Infisical write adapter is auto-registered for all Infisical-backed credentials on first rotation

Acceptance criteria: - Subscriber list for HEROKU_API_KEY contains at minimum: Infisical write, raxx-console-prod config-var, raxx-console-staging config-var, raxx-api-prod config-var, raxx-api-staging config-var, GH Actions secret - subscriber_snapshot on a new job matches the enabled subscriber list at job creation time - Disabling a subscriber prevents it from receiving updates on the next rotation (does not revoke the old token from that destination)

Risks: - Empty subscriber list: if a credential has zero enabled subscribers, the rotate job would complete but nothing gets updated. Mitigation: fail the job at Stage 2 with error: no subscribers registered if snapshot is empty. - Registry bootstrapping: the registry needs to be seeded before the first rotation. Mitigation: provide a migration seed script that pre-registers known subscribers from the v1 hardcoded lists.


NV2: "Implement bus adapter base class + Infisical-write adapter (first concrete adapter)"

Parent: #907 | Milestone: M2.5 | Depends on: NV1

User story: As a handler author, I want a base class and working reference implementation for a bus adapter so that new adapters follow a consistent interface and the Infisical write case works out of the box.

Scope: - token_service/adapters/base.pyBusAdapter abstract class: push(credential_name, new_value, context) -> AdapterResult - token_service/adapters/infisical_write.py — writes new value to Infisical via authorized client; returns AdapterResult(destination, ok, error_message) - The flow runner calls adapter.push(...) for each subscriber in subscriber_snapshot; collects results; partial failure does not abort remaining adapters - Per-adapter result is stored back to rotation_jobs.subscriber_snapshot (update in-place with outcome)

Acceptance criteria: - push() on InfisicalWriteAdapter updates the Infisical secret and returns ok=True - A failed push() on one adapter does not raise; it returns ok=False with error_message populated - The flow runner collects all results; if any adapter fails, job status becomes completed_partial (new status value — add to V3 schema revision) - Tests: happy path, single adapter failure, all adapters fail (job → failed)


NV3: "Implement Heroku config-var bus adapter + GH Actions secret bus adapter"

Parent: #907 | Milestone: M2.5 | Depends on: NV2, and #925 for GH Actions adapter

User story: As the flow runner distributing a HEROKU_API_KEY rotation, I want dedicated adapters for Heroku config-var writes and GH Actions secrets so that each destination is independently testable and re-usable across any credential that needs those destinations.

Scope: - token_service/adapters/heroku_config_var.pyHerokuConfigVarAdapter(app_name): PATCH /apps/{app}/config-vars via Heroku Platform API; no CLI - token_service/adapters/github_actions_secret.pyGithubActionsSecretAdapter(secret_name): PyNaCl-encrypted PUT to GH Secrets API; no CLI; depends on GITHUB_API_SECRETS_TOKEN (#925) - Both adapters: log <REDACTED> for any token value in structured logs

Acceptance criteria: - HerokuConfigVarAdapter("raxx-console-prod").push("HEROKU_API_KEY", new_val, ctx) sets the config var on that app (verified by GET /apps/raxx-console-prod/config-vars) - GithubActionsSecretAdapter("HEROKU_API_KEY").push(...) updates the GH Actions secret (verified by GH REST API); fails gracefully if GITHUB_API_SECRETS_TOKEN is missing (returns ok=False, does not raise) - No subprocess.run, os.system, or CLI invocations in either adapter - Tests: mock Heroku API success, mock Heroku API 4xx, GH secret encrypted correctly, GH secret missing token returns ok=False

Risks: - #925 not yet landed: GH Actions adapter cannot be fully exercised in prod until GITHUB_API_SECRETS_TOKEN is provisioned. Mitigation: adapter degrades gracefully; the subscriber can be registered but disabled until the token lands.


NV4: "Implement Cloudflare user-API-token vendor module + token-store adapter"

Parent: #907 | Milestone: M3 | Depends on: NV2 (adapter base)

This is the replacement for V9 (#916). Vendor module only (mint/validate/revoke); the Infisical write destination reuses InfisicalWriteAdapter from NV2 — no custom distribute logic needed.

Scope: - token_service/vendors/cloudflare_user_api_token.py: mint, validate, revoke - Companion-secret pattern retained for __CF_TOKEN_ID - CF auth error 10000 surfaced with remediation hint in error_message - Register CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN (and other CF tokens per vault taxonomy) as using this vendor module


Flow Runner — New Card

NV5: "Implement three-stage flow runner (testing / operational / revocation)"

Parent: #907 | Milestone: M2.5 | Depends on: NV1, NV2, #910 (schema with flow + stage columns), #911 (kickoff endpoint)

This is the replacement for V7 (#914) at the orchestration layer.

User story: As an operator triggering any rotation, I want the flow runner to execute the correct stage sequence for the requested flow so that I never need to manually coordinate mint, distribute, and revoke steps.

Scope:

Testing flow (Stage 1 only, no mutations): - Stage 1: authenticate against the vendor API using the current token; verify the token is visible and the caller has permission to manipulate it - Job status → tested; no job rows other than this job are modified - Terminal signal: pass/fail per credential

Operational flow (all three stages): - Stage 1: same as testing flow — if Stage 1 fails, abort before any mutations - Stage 2: call vendor.mint(); on success, fan out to all registered subscribers via adapters; update subscriber_snapshot with per-adapter results; if any adapter fails → completed_partial but continue to Stage 3 - Stage 3: call vendor.validate(new_token) for each subscriber that succeeded; on all-pass, call vendor.revoke(old_token); job → completed; if validate fails → do NOT revoke; job → failed (old token still valid) - Terminal signal: completed (all subscribers updated + revoke executed) or completed_partial (some subscribers failed but revoke still executed) or failed (validate failed, old token preserved)

Revocation flow (revoke only, no mint): - No Stage 1 or Stage 2 — skip directly to Stage 3 - Stage 3: call vendor.revoke(current_token) only; wait for 401 from a validation probe against the old token to confirm revocation - Job → revoked on confirmation - Terminal signal: 401 from probe = confirmed revoked - Note: this is Kristerpher's insight — revocation is architecturally the same as Stage 3 of the operational flow, extracted as a standalone trigger

Acceptance criteria: - Testing flow for POSTMARK_SERVER_TOKEN: job lands in tested status; no Infisical mutation; Stage 1 result logged - Operational flow for POSTMARK_SERVER_TOKEN: job progresses pending → stage_1_verify → stage_2_mint_distribute → stage_3_validate_revoke → completed; all subscribers updated - Revocation flow for any credential: job progresses to revoked; probe confirms 401 from old token - If Stage 1 fails in operational flow, job → failed immediately; no mutations - If Stage 3 validate fails in operational flow, old token NOT revoked; job → failed; audit log captures old_token_preserved: true - rotation_jobs.stage column advances correctly at each transition

Risks: - Stage 3 validate/revoke atomicity: validate passes but revoke fails (vendor API error). Mitigation: revoke is retried up to 3 times; if revoke fails after retries, job → completed_partial; old token remains valid alongside new token; alert operator. - Subscriber partial failure at Stage 2: one destination fails but revoke still executes. Mitigation: completed_partial status makes partial failure visible; runbook documents the recovery procedure. - Testing flow false negative: Stage 1 probe fails not because the token is invalid but because the vendor API is temporarily down. Mitigation: surface the error message from the vendor API in the testing flow result; operator can distinguish network errors from auth errors.


Vendor Modules — New Cards

NV6: "Postmark vendor module (validate-only; mint is pre-staged)"

Parent: #907 | Milestone: M3 | Depends on: NV5

Thin card. The Postmark-specific logic from V7 (#914) that is not superseded by the flow runner: the validate() call and the pre-staged mint pattern. No distribute logic (that is InfisicalWriteAdapter).


NV7: "Heroku vendor module (mint/validate/revoke) — extracted from V8-v2"

Parent: #907 | Milestone: M3 | Depends on: NV5

The Heroku OAuth mint, GET /account validate, and authorization delete revoke — extracted as a vendor module. The config-var distribution is NV3's adapters. This is the replacement for V8 (#915) at the vendor logic layer.


UI Cards — New Cards (4 cards)

NV8: "Build rotation modal — three-stage progress UI with per-stage status indicators"

Parent: #907 | Milestone: M3 | Depends on: NV5, #911 (poll endpoint returns stage)

User story: As a console operator, I want the rotation modal to show which stage is currently executing so that I understand what is happening and can react if something stalls.

Scope: - Three visual stages in the modal: Stage 1 (Verify), Stage 2 (Mint + Distribute), Stage 3 (Validate + Revoke) - Each stage: pending (gray), in-progress (animated), success (green check), failed (red x with error message) - Modal polls GET /tokens/{name}/rotations/{job_id} for stage and status updates - Stage 2 shows a subscriber table: each subscriber row updates in real-time as adapter results come in from subscriber_snapshot - Banner color: red for prod rotation, purple for staging (per console env-switcher memory) - Flow label displayed in modal header: "Testing" / "Operational" / "Revocation"

Acceptance criteria: - Stage indicators advance correctly as the job progresses through stages - Subscriber table renders all entries from subscriber_snapshot; failed rows show error_message - Revocation flow modal shows only Stage 3 section (Stages 1 and 2 not rendered) - Testing flow modal shows only Stage 1 section with pass/fail indicators - Modal is accessible from the Security > Token Management view


NV9: "Subscriber-table view in Security console — per-credential registry inspector"

Parent: #907 | Milestone: M3 | Depends on: NV1 (subscriber registry)

User story: As an operator, I want to see which systems are registered as subscribers for a given credential so that I know all the destinations that will receive an update on the next rotation.

Scope: - New panel in Security > Token Management: "Subscribers" tab per credential - Lists consumer_name, adapter_class, enabled toggle (operator can disable without deleting) - Shows last push result from subscriber_snapshot of the most recent completed job - Operator can add a subscriber via UI (pre-populates adapter_config form based on adapter_class selection)


NV10: "Type-to-confirm gate for Operational and Revocation flows"

Parent: #907 | Milestone: M3 | Depends on: NV8

User story: As a console operator, I want a type-to-confirm dialog before a mutation-bearing rotation (or revocation) fires so that accidental clicks on a prod credential do not trigger irreversible actions.

Scope: - Before Stage 2 begins in Operational flow: modal prompts Type the credential name to confirm; entry must match exactly; then Stage 2 proceeds - Before Revocation flow starts: same gate; prompt text includes: "This will immediately terminate the token. This action cannot be undone." - Testing flow: no confirmation gate (read-only; no mutations) - Gate is in the UI only; the API layer does not enforce it (the console operator role implies trust; the UI gate is a UX safeguard, not a security boundary)


NV11: "Update operator runbook + bus-adapter-author guide for v2 architecture"

Parent: #907 | Milestone: M3 | Depends on: NV5, NV6, NV7, NV8

This replaces V11 (#918) scope. Same deliverable location (docs/architecture/velvet/), rewritten for v2 concepts.

Scope changes vs. V11: - "Handler-author guide" → "Bus-adapter-author guide": vendor module interface, adapter interface, subscriber registration steps - Runbook additions: three-flow SOP (testing, operational, revocation), subscriber registry inspection, completed_partial recovery, revocation confirmation verification - Reference docs/architecture/velvet/v2-rotation-flows.md throughout once that doc lands


Card Count Summary

Category v1 cards v2 cards Net change
Infrastructure (keep/revise) 5 5 0
Infrastructure (new: service bus) 0 4 (NV1-NV4) +4
Flow runner 0 1 (NV5) +1
Vendor modules 3 (V7-V9) 3 (NV6, NV7, NV4) 0 (replaced)
UI 0 4 (NV8-NV11) +4
SSM (M2) 1 1 0
Migration + docs 2 (V10, V11) 2 0
Total 11 20 +9

4. Three-Milestone Re-Plan

M1 — Was: read-through proxy. Remains: read-through proxy.

M1 scope does not change under v2. The Heroku app pair, Postgres schema, read proxy, rotate kickoff stub, and auth middleware are all v2-compatible as filed. The schema additions for v2 (flow, stage, subscriber_snapshot columns) are small enough to land in V3 revision before M1 is cut.

M1 success gate (unchanged): GET /tokens/POSTMARK_SERVER_TOKEN with a valid service token returns correct metadata from Velvet staging.

M1 v2 additions: - rotation_jobs schema includes flow, stage, subscriber_snapshot columns (V3 revision) - Kickoff endpoint accepts flow param (V4 revision); revoke endpoint question (OD-1) resolved before M1 is cut


M2 — Was: SSM integration. Remains: SSM integration.

M2 scope does not change. It is a backing-store concern that is orthogonal to the bus architecture.

M2 success gate (unchanged): SSM-backed credential readable through GET /tokens/{name}/value.

Hard gate still in effect: D2 (SSM path convention) must be confirmed before M2 starts.


M2.5 — New milestone: service-bus foundation

Scope: NV1 (subscriber registry), NV2 (adapter base + Infisical write adapter), NV3 (Heroku + GH Actions adapters), NV5 (flow runner — testing + operational flows), and the existing subscriber pre-registration seed migration.

M2.5 success gate: Operational flow for HEROKU_API_KEY in staging completes with all four Heroku config-var adapters updated and GH Actions secret updated; rotation_jobs row shows completed with correct stage history in subscriber_snapshot.

This milestone de-risks the entire bus model before any UI is built. The UI cards (NV8-NV10) can be developed against a working API.


M3 — Was: first rotation end-to-end (Postmark). Becomes: all three flows live, Postmark end-to-end in UI.

v2 M3 scope: NV6 (Postmark vendor module), NV8 (three-stage modal UI), NV9 (subscriber-table view), NV10 (type-to-confirm), NV11 (runbook v2), #917 (first console callsite migration)

M3 success gate: Console operator can trigger a Testing flow, Operational flow, and Revocation flow for POSTMARK_SERVER_TOKEN from the UI. All three flows complete correctly, modal reflects per-stage progress, subscriber table shows Infisical write result. Revocation flow terminates the old token and the probe confirms 401.


The revocation flow is architecturally simpler than the operational flow (Stage 3 only), but it is UI-complete and deserves a standalone hardening milestone before it becomes a first-class operator tool.

M4 scope: - Revocation flow hardened to production: type-to-confirm gate live, audit event vault.rotation.revoked fires, probe confirmation logged - Revocation flow available for all registered credentials (not just Postmark) - completed_partial recovery runbook finalized - Load test: concurrent revocation + rotation on different credentials does not deadlock

M4 gate: M3 complete; no M4 card is started before M3 success gate is passed.


5. Risks and Open Decisions

Kristerpher needs a binary answer on each of these before the corresponding card is dispatched. The same questions are flagged in docs/architecture/velvet/v2-rotation-flows.md — answer once, both docs converge.

ID Question Cards gated Options
OD-1 Is revocation a separate endpoint (POST /tokens/{name}/revoke) or the rotate endpoint with flow=revocation? V4 revision, NV5, NV8, NV10 A) Separate endpoint (cleaner semantics; revocation is never a rotation); B) flow param on /rotate (one fewer route; matches the v2 model where revocation re-uses Stage 3 logic)
OD-2 What is the completed_partial policy? If some subscribers fail in Stage 2 but validate passes, does Stage 3 revoke still execute? NV5 A) Yes — revoke always executes if validate passes, regardless of subscriber partial failure; B) No — all subscribers must succeed before revoke; partial failure holds old token valid
OD-3 Does the subscriber registry live in Postgres (on the Velvet app) or in Infisical metadata? NV1 A) Postgres (consistent with rotation_jobs; queryable via SQL; auditable); B) Infisical metadata (no new table; but couples the bus registry to the secret store being rotated, which is a circular dependency risk)
OD-4 What is the Testing flow output surface? Job row only (operator reads from GET /rotations/{id}) or does it push a structured result to the console Status page? NV5, NV8 A) Job row only (operator pulls); B) Testing flow result is a lightweight "health probe" event surfaced on the Console Status page
D2 SSM path convention: confirm /raxx/{env}/{vendor}/{name} before M2 starts #913 V6 Pre-existing open decision; needs explicit Y/N
D3 Auth model: confirm per-caller scoped tokens vs. single global key before V5 (#912) is finalized #912 V5 Pre-existing open decision

6. Coordination with Architect

The architect agent is producing docs/architecture/velvet/v2-rotation-flows.md in parallel. This PM doc should be cross-referenced from that doc, and vice versa. Once the architect's doc lands:

  1. The stage definitions in NV5 (flow runner acceptance criteria) should align exactly with the flow diagrams in v2-rotation-flows.md sections covering the testing, operational, and revocation flows.
  2. The schema additions proposed for V3 (Section 2, V3 disposition) should be reconciled against whatever data model the architect's doc specifies for job state tracking.
  3. Open decisions OD-1, OD-2, OD-3, and OD-4 above should appear as the same questions in v2-rotation-flows.md. If the architect's doc resolves any of them structurally, update the OD table above and un-gate the corresponding cards immediately.
  4. Section 5 of this doc (Risks and Open Decisions) should be read alongside whatever "open decisions" section the architect's doc carries. Kristerpher should answer once; both docs cite the resolution.

7. Migration Timing

V1 Heroku handler (#906 + #934) — deprecation path

The Heroku Mode A handler that shipped in PR #906 and PR #934 is operational and correct (CLI-free, HTTP-only). It should remain active until the Velvet v2 bus is live in production with the Heroku config-var adapter and GH Actions secret adapter both confirmed working end-to-end (M2.5 success gate).

At that point: - Add a file-level deprecation comment to console/app/services/rotation_handlers/heroku.py: # DEPRECATED: Superseded by token_service/vendors/heroku.py + HerokuConfigVarAdapter (Velvet v2). # Do not extend. Remove after Velvet M2.5 is verified in production. # Tracked: https://github.com/raxx-app/TradeMasterAPI/issues/907 - The console rotation UI stays on this handler until V10 (#917) console callsite migration lands - Do NOT remove the handler before V10 is merged and smoke-tested in production

CF Access service token handler — no deprecation yet

The CF Access service token provisioning SOP (docs/ops/runbooks/cf-access-service-token-provisioning.md) is a manual runbook, not a handler. It will eventually be superseded by a Velvet Cloudflare-vendor + Infisical-write adapter workflow (NV4), but that is M3 scope. No deprecation action now.

v1 scope doc annotation

Once docs/architecture/velvet/scope.md is created, annotate the following sections:

v1 scope section v2 status
"Rotation Handler Abstraction" (four-function interface) Superseded by v2 vendor module + bus adapter pattern; refer to v2-rotation-flows.md
"M3 — First rotation end-to-end" success gate Updated by v2 M3 re-plan; see Section 4 of this doc
Handler registry HANDLER_REGISTRY = {name: handler} Superseded by subscriber registry + adapter class mapping (NV1); refer to v2-rotation-flows.md
Distribute step inside handler Superseded by bus adapter fan-out in flow runner Stage 2; refer to NV2/NV3

Sections not affected: M1, M2, app pair scaffold, Postgres state machine basics, auth model, audit log shape.


8. Summary for Kristerpher

Top-3 blast-radius changes:

  1. V7/V8/V9 (#914/#915/#916) — DROP and replace with bus-adapter split. Three vendor modules + four bus adapter cards + one flow runner card replace three monolithic handler cards. This is the largest structural change and cannot be undone once the subscriber registry is seeded.

  2. V3/V4 (#910/#911) — schema and kickoff endpoint must be revised before M1 is cut. The flow, stage, and subscriber_snapshot columns + the flow param on the kickoff endpoint are foundational. Every card that comes after M1 assumes these fields exist. A rework after M1 deploys would require a Postgres migration on a live app.

  3. Revocation flow endpoint shape (OD-1) — decision gates the entire UI layer. If revocation is a separate endpoint (/revoke), the modal, type-to-confirm, and audit log events are named differently than if it is a flow param on /rotate. This decision should be made before NV8 (modal UI) is dispatched.

Six decisions need answers before dispatch begins. OD-1 through OD-4 plus the pre-existing D2 and D3. None are hard to answer — most are binary. Recommend Kristerpher reviews Section 5 and marks each one in a comment on #907.

No new issues filed yet per instructions. This doc is the review artifact. File-and-dispatch on Kristerpher's go.