Raxx · internal docs

internal · gated ↑ index

Console Environment Switcher — Design

Status: Draft
Owner: software-architect
Created: 2026-04-28 UTC
Parent epic: #353
Related ADRs: 0024, 0025
Related docs: console.md, console-dashboard.md, rbac-design.md


1. Context

console.raxx.app is a single deployment that operates against two infrastructure environments: prod and staging. An operator can switch context via a dropdown; a colored banner (red = prod, blue-purple = staging) confirms the active environment on every authenticated page.

This design answers the five architectural questions raised in #353 and produces implementation-sized sub-cards for feature-developer.

Environments mental model: - prod — mirrors customer-facing infrastructure; mutations here affect real users and real resources. - staging — playground; operators experiment without risk to production.

The console deployment itself does not change. Only the target of its API calls (Raptor base URL, Infisical environment path, Heroku app identifier) changes when the operator switches.


2. Invariants

All platform invariants apply. Env-switcher-specific:

  1. Fail-closed on environment mismatch. If a mutating route requires env=prod and the session holds env=staging, the request returns 403. The operator is not redirected silently or warned-and-continued. Hard stop, explicit error.
  2. No stored credentials. This design introduces no mechanism to store secrets, API keys, or any replayable values. Env-selection is a routing key, not a credential.
  3. Every mutation records selected_env in the audit log. The console_audit_log.context JSONB already exists; selected_env is a new key inside it. No schema column required.
  4. Prod is the default. CONSOLE_DEFAULT_ENV env var controls the default; its value defaults to prod when unset. Rationale: operators who forget to switch still see prod (most critical env) rather than silently operating against staging believing it to be prod.
  5. Superadmin-only for prod-targeting mutations. Any route that mutates prod infrastructure requires the operator to hold the console-env-admin role (per rbac-design.md §4.1, that role is assigned only to raxx-platform-admins group, which today maps to the superadmin flat role). Switching context to prod is unrestricted by role — any authenticated operator can see the prod banner. Only executing a prod-targeted mutation is gated.
  6. Audit trail for every state change. Switching env is a low-severity state change and is logged to console_audit_log with action: console.env.switch.

3. Answering the Five Architectural Questions

Q1 — Session vs DB column for selected_env

Decision: session-resident, with CONSOLE_DEFAULT_ENV env var for the per-deployment default.

Rationale in ADR-0024. Summary: a DB column that persists across sessions adds synchronization surface (what happens when an admin is in two tabs at once, or two different devices?), a schema migration, and a DSR-erasure consideration. The benefit — remembering which env a returning admin prefers — does not outweigh this. Session-resident means the selection is tab-local and resets on session expiry (8h fixed window), which is safer: an operator who returns after a session expiry is forced to consciously choose their env context rather than silently resuming where they left off.

The CONSOLE_DEFAULT_ENV env var (default: prod) allows the deployment to enforce a consistent starting point without per-user persistence. If a future stakeholder decision requires per-user default persistence, a console_admins.default_env column can be added as a non-breaking migration and read at session creation time; this design does not preclude it.

Q2 — RBAC model for env selection

Decision: switch itself is unrestricted; prod-targeted mutations require console:env:mutate_prod permission.

Two separate concerns: 1. Switching the visible env (reading prod data, seeing prod banner): any authenticated admin may switch to either env. Seeing prod status data is not privileged. 2. Executing a mutation against prod: requires the console:env:mutate_prod permission, which is held by the console-env-admin role (current mapping: superadmin). All other roles may execute mutations only when selected_env = staging.

The @require_env_match(env) decorator enforces this. When env = "prod", the decorator checks both session.selected_env == "prod" AND current_admin.has_permission("console:env:mutate_prod"). When env = "staging", only session.selected_env == "staging" is checked — no extra role required.

This model is additive over existing RBAC: existing @require_role("superadmin") decorators on rotation endpoints remain. The env check is a second gate, not a replacement.

Q3 — Cross-env mutations

Decision: disable the action in the UI when the mutation would affect both environments; expose a clear reason.

A "cross-env mutation" is one where the same credential or resource exists in both environments and changing it in one changes it in both. The canonical example is a shared Infisical secret that propagates to both prod and staging consumers.

Design rule: the UI inspects SecretMeta.affected_sites (from the console-dashboard design). If a secret affects sites in both prod and staging environments, the "Rotate now" button is replaced with a disabled button labeled "Affects both environments — use Infisical directly." An explanatory tooltip is shown on hover. No modal, no confirmation flow — the action is simply unavailable through this UI path.

The deliberate choice is to make cross-env mutations a conscious manual operation (operator goes to Infisical vault, performs the rotation there) rather than embedding a complex "affect both envs" flow in the console. This eliminates a whole class of bugs and confusion.

Q4 — In-flight action on mid-flow switch

Decision: switch is stateless; in-flight actions complete against the env they were started in.

The selected_env on the session is read at request dispatch time by the @require_env_match decorator and stamped into the audit log. If an operator submits a rotation form targeting prod and then — in a different tab — switches to staging, the in-flight rotation request carries its env context in the request itself, not by re-reading the session mid-flight.

Implementation: the mutating form (rotation confirm modal, deploy trigger, etc.) embeds a hidden _target_env field equal to session.selected_env at the time the form was rendered. The @require_env_match decorator validates that _target_env == session.selected_env at the time of the POST. If they differ (tab 1 shows prod form, tab 2 switched to staging before tab 1 submitted), the POST returns 409 Conflict with body {"error": "env_switched_mid_flow", "expected": "prod", "current": "staging"}. The UI handles 409 by showing "Environment was changed mid-flow. Please review and resubmit."

HTMX-specific: the HTMX fragment for the banner is swapped on every navigation via hx-swap-oob (out-of-band swap), keeping the banner in sync without a full page reload. This ensures the banner always reflects the session's current env without race conditions.

Q5 — Test fixture pattern

The testing pattern must let a test switch env mid-test without monkey-patching the session directly. Below is the intended fixture interface (stub types only — implementation is feature-developer's work):

# console/tests/conftest.py (sketch — not implementation)

@pytest.fixture
def client_as_superadmin(app):
    """Returns a test client with an active superadmin session, env=prod."""
    ...

@pytest.fixture
def env_switcher(client):
    """
    Returns a callable that POSTs to /console/env/<env> and asserts 204.
    Usage: env_switcher("staging")
    """
    def _switch(target_env: str):
        resp = client.post(f"/console/env/{target_env}")
        assert resp.status_code == 204
    return _switch

# Usage in a test:
def test_rotation_blocked_in_wrong_env(client_as_superadmin, env_switcher):
    env_switcher("staging")
    resp = client_as_superadmin.post("/api/secrets/HEROKU_API_KEY/rotate", json={...})
    assert resp.status_code == 403
    assert resp.json["error"] == "env_mismatch"

def test_rotation_allowed_in_prod(client_as_superadmin):
    # default env is prod — no switch needed
    resp = client_as_superadmin.post("/api/secrets/HEROKU_API_KEY/rotate", json={...})
    assert resp.status_code in (202, 400)  # 400 = validation, not env rejection

All existing mutating endpoint tests must add a companion _wrong_env variant asserting 403 (see §7 migration path).


4. Data Model

4.1 Session schema addition

console_sessions gains one column in migration 0004_session_env.sql:

ALTER TABLE console_sessions
    ADD COLUMN selected_env TEXT NOT NULL DEFAULT 'prod'
    CHECK (selected_env IN ('prod', 'staging'));

This is additive and safe to roll back (drop the column). Existing sessions get selected_env = 'prod' via the DEFAULT.

Why on the session table, not just the Flask session cookie?
The session record in the DB is the authoritative source; the Flask session cookie is a pointer to this record (session_id). Storing selected_env on the DB row means: - The env selection survives cookie re-issues (token rotation) without user impact. - The audit log can join against console_sessions.selected_env for historical correctness even if the session has since changed env. - Multi-tab detection: if two tabs exist with the same session_id, they share a selected_env. Tab isolation is not a goal for v1 (documented in Open Questions §10).

4.2 Audit log extension (no migration)

console_audit_log.context JSONB already exists. Mutations after this feature lands must include "selected_env": "<prod|staging>" in the context object. The audit writer middleware enforces this.

The env-switch action itself is logged:

action: "console.env.switch"
context: { "from_env": "prod", "to_env": "staging" }

4.3 New permission seed data

console:env:mutate_prod permission added to the permissions seed script. Assigned to the console-env-admin role (rbac-design.md §4.1). Migration adds it to the permissions table and the role_permissions join. No DDL change needed.


5. APIs / Contracts

5.1 POST /console/env/<env>

Switch the session's active environment.

Auth: Any authenticated admin.
Path param: env must be prod or staging. Returns 400 for other values.
Body: none.
Response: 204 No Content on success.
Side effects: Updates console_sessions.selected_env. Writes console_audit_log row with action: console.env.switch.
No RBAC gate on the switch itself. Any authenticated operator may switch to either env. The gate fires at mutation time, not switch time (see §3, Q2).

5.2 GET /console/env (optional, for HTMX polling)

Returns the current session's env. Used by the banner fragment to stay in sync across tab navigations.

Response: {"selected_env": "prod"} or {"selected_env": "staging"}.

5.3 @require_env_match(env) decorator

Applied to mutating routes. Behavior:

@require_env_match("prod")
@require_role("superadmin")
def rotate_secret(name):
    ...

Execution order (decorator application is bottom-up in Python, but logical check order): 1. @require_role fires first — admin must hold the required role. 2. @require_env_match fires second — session must match the required env AND the admin must hold console:env:mutate_prod for prod-targeted routes.

On mismatch, returns:

HTTP 403
{
  "error": "env_mismatch",
  "required_env": "prod",
  "current_env": "staging"
}

The UI maps this error code to a toast: "You must switch to [prod] to perform this action."

5.4 Banner template fragment (_env_banner.html)

Injected at the top of base.html inside the <body>, before the <nav> block. Banner is rendered server-side on every page load (not HTMX-fetched), so it is always correct on initial render.

For HTMX-driven page transitions (in-page fragment swaps), the banner carries hx-swap-oob="true" id="env-banner" so it refreshes whenever any HTMX response includes it.

Tailwind classes: - Prod: bg-red-600 text-white - Staging: bg-purple-600 text-white

Banner text: Operating against PROD / Operating against STAGING. Button on right edge: Switch to [other env] — HTMX hx-post="/console/env/<other_env>" hx-swap="none" followed by a full page reload triggered by hx-on::after-request="window.location.reload()". The reload is intentional: after switching env, all HTMX-loaded fragments on the page may contain stale env-specific data, so a full reload is safer than trying to re-fetch all fragments.


6. State Machines + Sequences

6.1 Login → default env assignment

sequenceDiagram
    participant Admin as Operator
    participant Console as raxx-console
    participant DB as Console Postgres

    Admin->>Console: POST /auth/totp/verify {code}
    Console->>Console: Verify TOTP
    Console->>DB: INSERT console_sessions (selected_env = CONSOLE_DEFAULT_ENV)
    Console-->>Admin: 200, set session cookie
    Console-->>Admin: Redirect to /dashboard
    Note over Admin,Console: Banner renders with default env color

6.2 Env switch action

sequenceDiagram
    participant Admin as Operator
    participant Banner as Banner UI (HTMX)
    participant Console as raxx-console
    participant DB as Console Postgres

    Admin->>Banner: Click "Switch to staging"
    Banner->>Console: POST /console/env/staging (hx-post)
    Console->>Console: Validate session, validate 'staging' is valid env
    Console->>DB: UPDATE console_sessions SET selected_env='staging' WHERE id=...
    Console->>DB: INSERT console_audit_log (action: console.env.switch, context: {from: prod, to: staging})
    Console-->>Banner: 204 No Content
    Banner->>Banner: hx-on::after-request triggers window.location.reload()
    Admin->>Console: GET /dashboard (full page reload)
    Console-->>Admin: Render page with purple banner "Operating against STAGING"

6.3 Mutating route rejection on env mismatch

sequenceDiagram
    participant Admin as Operator (staging context)
    participant Console as raxx-console
    participant DB as Console Postgres

    Note over Admin,Console: session.selected_env = 'staging'
    Admin->>Console: POST /api/secrets/HEROKU_API_KEY/rotate
    Console->>Console: @require_role("superadmin") — passes
    Console->>Console: @require_env_match("prod") — FAILS
    Console->>Console: selected_env='staging', required='prod'
    Console-->>Admin: 403 { "error": "env_mismatch", "required_env": "prod", "current_env": "staging" }
    Admin->>Admin: UI toast: "Switch to prod to perform this action"
    Note over Admin,Console: No audit row for the rejected action (403 before handler)

7. Migrations

Migration 0004_session_env.sql

-- up
ALTER TABLE console_sessions
    ADD COLUMN selected_env TEXT NOT NULL DEFAULT 'prod'
    CHECK (selected_env IN ('prod', 'staging'));

-- down
ALTER TABLE console_sessions DROP COLUMN selected_env;

Zero downtime: column addition with DEFAULT is safe on Postgres without locking rows. All existing sessions default to prod.

Permission seed addition

In scripts/db/seed_rbac_roles.py (added as part of sub-card #3):

INSERT INTO permissions (id, name, description) VALUES (uuid(), 'console:env:mutate_prod', 'Execute mutations targeting the prod environment');
INSERT INTO role_permissions (role_id, permission_id) VALUES (<console-env-admin role id>, <new permission id>);

No DDL migration; the permissions and role_permissions tables already exist (rbac-design.md §7).


8. Rollout Plan

Phase What lands Gate
Dark Migration 0004 applied; selected_env column exists but no UI reads it Additive migration only, no behavior change
Flag-on (staging) Banner renders; switch endpoint live; no mutating routes gated yet CONSOLE_ENV_SWITCHER=1 env var
Beta @require_env_match applied to rotation endpoints; all mutating routes adopt decorator Staging smoke tests pass
GA Full env-switcher active on prod console; raxx-console-staging Heroku app documented for teardown Prod sign-off

The raxx-console-staging Heroku app teardown is a follow-up ops action documented in the GA sub-card, not a blocker for GA.


9. Security Considerations

PII collected: none by this feature. selected_env is an operational value, not personal data.

Retention: console_sessions.selected_env follows session retention (sessions are purged at expires_at + 30d). The audit log entry for console.env.switch follows the 2-year audit retention policy.

DSR erasure: no new PII. Existing session and audit log erasure paths are unchanged.

Audit trail: every mutation records selected_env in context JSONB. The switch action itself is logged. This creates a fully auditable trace: "at 14:33 UTC, operator A switched to staging; at 14:35 UTC, operator A triggered a rotation in staging context."

No credential storage: selected_env is a string 'prod'|'staging'. It is not a secret.

Kill-switch: CONSOLE_ENV_SWITCHER=0 disables the banner and the switch endpoint. The migration is safe to leave in place — the column defaults to prod if the feature is off.

Env-mismatch fail-closed: 403, not 200 with a warning. The action does not proceed in any partial form.

Mid-flow switch race condition: the _target_env hidden field in mutating forms, compared against session.selected_env at POST time, prevents an operator from submitting a prod-targeted form after having switched to staging. Returns 409, not a silent success in the wrong env.

Breach: if console_sessions is exfiltrated, selected_env reveals which env an operator last targeted — operational sensitivity only, no secrets.

Secrets location: CONSOLE_DEFAULT_ENV and CONSOLE_ENV_SWITCHER are Heroku config vars (Infisical-sourced). Rotatable without redeploy (a config var change restarts the dyno on Heroku, which is acceptable for an internal ops tool).


10. Open Questions

These require a decision before the labeled sub-cards can be claimed for GA:

  1. Tab isolation for multi-tab workflows. The current design shares selected_env across all tabs for the same session (same session_id). If operator has prod tab open and staging tab open simultaneously, both tabs reflect whichever env was set last. Is this acceptable, or should env be per-tab (requiring a client-side state model)? Per-tab isolation requires a different approach (e.g., env embedded in the URL path /prod/dashboard vs /staging/dashboard). This is the highest-impact open question.
  2. raxx-console-staging teardown timing. The issue mentions tearing down the staging Heroku app once this lands. Confirm: is there any automation (CI/CD, cron, callback URL) that targets raxx-console-staging directly that would break on teardown? The teardown must be a separate ops ticket, not bundled with GA.
  3. Ops role and prod mutations. Current design: only console-env-admin role (superadmin-equivalent) may run prod mutations. If an ops role operator needs to trigger a prod rotation in an emergency, they cannot. Is the right escalation path "break-glass" (use the break-glass group) or should ops have a configurable prod-mutation gate? This affects the RBAC seed in sub-card #3.
  4. Cross-env credential UX. The design disables the "Rotate now" button for cross-env secrets and instructs operators to use Infisical directly. Is there a set of cross-env secrets for which we want to provide a console-guided flow eventually (future scope)? If yes, a comment on #353 linking a future sub-epic would keep the backlog clean.