ADR 0129 — RBAC V2 Blueprint Cutover Rollout Strategy
Status: Accepted
Date: 2026-06-18 UTC
Deciders: Kristerpher (operator), software-architect
Scope: Console service — all 17 blueprint files in console/app/blueprints/
Context
The Console has 141 legacy @require_role(...) call sites that resolve against the flat
four-level admin_roles table (superadmin / ops / support / readonly). RBAC V2 tables
and fine-grained decorators exist in console/app/middleware/rbac.py. Issue #1473
(operator-authorized for pre-launch, 2026-06-18) asks: how do we cut over 141 sites
safely without a single all-or-nothing mega-PR that cannot be rolled back at the route
level?
Two questions drove this ADR:
- Should the cutover use a runtime flag-check inside each decorator (live-flip) or a deployment-time switch (redeploy)?
- Should the cutover happen in a single PR or a phased cluster-by-cluster sequence?
Decision
The cutover uses direct decorator replacement in phased blueprint clusters, with
FLAG_RBAC_V2 as a deployment gate (not a runtime gate). Shadow dual-mode code is
removed in the first sub-card. Each cluster ships independently to staging and soaks
before promotion to prod. Rollback is via tagged-SHA redeploy, not flag flip.
The per-route permission mapping in docs/architecture/rbac-blueprint-cutover.md §3 is
the authoritative correctness artifact. Every sub-card must produce integration tests
proving 200/403 behaviour before merging.
Language choice rationale
Skipped. This ADR governs an operational/authz rollout decision, not a new service.
Consequences
Positive
- Each sub-card PR is independently reviewable and has a small diff.
- Rollback is unambiguous: tagged SHA, not a flag state that may be stale across dynos.
- No runtime overhead of a per-request flag check inside every decorator invocation.
- TOTP elevation chains are unchanged — the new decorator replaces only the session/role gate, not the TOTP layer.
- Audit trail is preserved:
_write_access_denied()fires in V2 decorators identically to legacy.
Negative / risks
FLAG_RBAC_V2cannot live-flip between old and new decorator behaviour on a running dyno. This is a known limitation: Flask resolves decorators at import time.- The per-route mapping table is a new artifact that can drift from code if not maintained.
The S10 CI lint gate (
grep '@require_role'finding zero matches) is the enforcement mechanism. - 4 AMBIGUOUS mappings require operator decisions before 3 of the 10 sub-cards can be claimed (S1, S3, S4, S7). These are explicit blocking dependencies, not implicit ones.
Neutral
- The dual-mode shadow report code (
rbac_dual_mode.pyimport chain) is deleted in S1. This removes observability data about legacy/V2 divergence. The shadow data served its purpose during the parallel-run phase; removing it is correct per #1473 AC. console-opsrole (migration 0012) has no group assignment. It is not used by any mapping inrbac-blueprint-cutover.md. This is an open question (#10/OQ-5) but does not block the cutover.
Alternatives considered
Alternative A: Runtime flag-check inside each decorator
Each route registers a wrapper that checks FLAG_RBAC_V2 at request time and branches
to the legacy or V2 check. This enables live flip without redeploy.
Rejected because: Flask decorator stacks are evaluated at import time. A per-request flag check would require every route to be wrapped in an additional callable, significantly increasing code complexity and introducing a new surface for decorator-ordering bugs. The existing shadow-check mechanism already demonstrated the fragility of this pattern (it had to be lazy-imported to avoid circular imports). The operational benefit — live flip — is low: flag flips on Heroku trigger a dyno restart anyway, so the latency difference between flag-flip-restart and SHA-redeploy is seconds.
Alternative B: Single mega-PR, all 141 sites at once
All blueprints are ported in one PR. Reviewed once, merged once.
Rejected because: A 141-site change with no per-blueprint granularity is unreviable, unrollbackable at the route level, and creates a single point of failure. A wrong mapping in secrets.py would require reverting all blueprints. The phased approach allows secrets (the highest-privilege blueprint) to ship last, after all other clusters have soaked.
Security / GDPR checklist
- PII collected: None. Decorator reads
admin_id(UUID) from session, not email or PII. - Retention period: Audit rows written by
_write_access_deniedfollow existingaudit_logretention policy (unchanged). - Deletion on DSR: No new tables. Existing
admin_idforeign key inaudit_logis already covered by the DSR deletion path. - Audit trail: Every 403 writes an
access_deniedaudit row. Both legacy and V2 decorators call_write_access_denied. No reduction in audit coverage. - Stored credentials: None. V2 reads live from
rbac_*tables per request; no caching of role assertions. - Breach notification path: Unchanged from existing console breach path.
- Secrets location + rotation: No secrets introduced. The
FLAG_RBAC_V2env var is a boolean toggle in Heroku config, not a secret. - Kill-switch: Tagged SHA redeploy (
rbac-legacy-baselinetag) restores legacy decorator behaviour within one Heroku deploy cycle.
Revisit when
- A second operator is added to the team, triggering the first real group-membership change and surfacing any gaps in the group→role assignment seed data.
- The
admin_rolestable is dropped (post-30d soak card following this cutover), at which point the legacy path ceases to exist and this ADR is fully executed. - Queue Phase 1 ships and identity ownership moves to Queue service, potentially requiring the RBAC resolution path to cross service boundaries.