Raxx · internal docs

internal · gated ↑ index

raxx-console — Operator Dashboard (Milestones 7–13)

Status: Live (M7-M8 shipped, M9-M13 in progress) Owner: software-architect Last updated: 2026-04-24 UTC (recovered from reflog 2026-04-25) Parent epic: #146 (raxx-console operator admin console) Related issues: #254 (secret rotation UI), #253 (rotation pipeline epic), #331 (rotation UI redesign — see note below) Related ADRs: 0004, 0012

Status note (2026-04-25): This doc was authored on 2026-04-24 reflecting the original Mode B (operator-assisted) rotation design. M11 (token rotation UI) is superseded by #331 — Kristerpher's directive shifted the rotation UI to Mode A (programmatic) primary, with Mode B as a fallback only for vendors that lack rotation APIs. The vault primitives (RBAC + TOTP elevation + console_audit_log schema) remain valid; the rotation handler + UI states described in §6.2-§6.4 are replaced by #331's design. ADRs 0020 and 0021 from the original commit were not recovered and should be considered historical context only — the dataclasses in console/app/services/site_probes.py are the live source of truth for status data shapes.


1. Context

console.raxx.app shipped milestones 1–6: passkey + TOTP auth, RBAC middleware, admin bootstrap + invite, session management, and a stub /dashboard that renders "Welcome." The stub route exists at console/app/blueprints/dashboard.py and a placeholder template.

Milestones 7–13 fill the dashboard with real content:

This document covers design for all seven milestones. Implementation sub-cards follow in §10.


2. Invariants

All platform invariants apply. Dashboard-specific constraints:

  1. No secret values ever in API responses. GET /api/status/secrets returns metadata only: name, last_rotated_at, expires_at, cadence, status. The secret value is never returned by any endpoint, even to superadmin.
  2. Rotation is superadmin-only and requires fresh second factor. Any POST /api/secrets/<name>/rotate must verify a TOTP code submitted with the request, even if the caller has an active session. This is re-elevation, not a new login. See ADR 0021.
  3. Audit trail for every rotation trigger. console_audit_log must receive a row for the trigger event (before the pipeline runs) and a completion/failure event (from the pipeline callback). These are separate rows; the trigger row is written synchronously before the pipeline call goes out.
  4. Vendor API failures must not blank the dashboard. Each external call (Heroku, Cloudflare, Infisical, GitHub) is isolated; a failure produces a degraded badge, not a 500.
  5. No credentials in code. HEROKU_API_KEY, CLOUDFLARE_*, GITHUB_API_READONLY_TOKEN, and all vendor tokens come from Infisical/Heroku config vars. Feature-developer must not hard-code any of them.
  6. Kill-switch on rotation paths. If ROTATION_PIPELINE_DISABLED=1 is set in console's env, all rotation endpoints return 503 with a clear message. Operator can disable the pipeline without a deploy.

3. Surfaces in Scope

ID Hostname Provider Health check Healthy when
api-prod api.raxx.app Heroku (raxx-api-prod) GET /health dyno up, /health 200, last deploy < 7d
api-staging api-staging.raxx.app Heroku (raxx-api-staging) GET /health same
console-prod console.raxx.app Heroku (raxx-console-prod) GET /health via CF Access service token /health 200
console-staging console-staging.raxx.app Heroku (raxx-console-staging) GET /health same
vault vault.raxx.app AWS Lightsail GET /api/status 200, containers healthy
getraxx getraxx.com Cloudflare Pages CF Pages API latest deploy deploy status = success
raxx-mockups internal-docs.raxx.app (also raxx-mockups.pages.dev) Cloudflare Pages CF Pages API latest deploy deploy status = success
raxx-app-previews raxx-app.pages.dev Cloudflare Pages (PR previews) CF Pages API latest deploy latest preview deploy success

Latent surfaces (registered in the surface registry but not rendered until they exist): - support.raxx.app — FreeScout, when shipped - Sentry org — when billing allows org-level API access


4. Status Data Model

4.1 Polling Strategy (see ADR 0020)

Decision: hybrid in-memory cache with 60-second TTL, background thread refresh, with request-time fallback on cold start.

Each site is a named key in a module-level dict (_cache: dict[str, SiteStatus]). A background daemon thread runs every 30 seconds, iterates all surfaces, and refreshes their entries using concurrent.futures.ThreadPoolExecutor (5 workers, 5-second per-call timeout). On a cold start (first request before the background thread has run), the dashboard endpoint triggers a synchronous refresh of all surfaces and blocks up to 5 seconds total — then renders with whatever came back.

This approach requires no additional database tables, no Celery/Redis worker, and fits the single-dyno Eco/Basic Heroku formation. It adds ~5 MB resident memory for cached status objects. Persistence across dyno restarts is not required: a 30-second cold start on dyno wake-up is acceptable for an internal operator tool.

4.2 SiteStatus Shape

SiteStatus:
  id:               str          # matches surface ID from §3
  hostname:         str
  provider:         str          # 'heroku' | 'cloudflare_pages' | 'lightsail'
  liveness:         ProbeResult
  last_deploy:      DeployInfo | None
  build_status:     BuildInfo | None
  sentry_errors_24h: int | None
  checked_at:       datetime UTC
  error:            str | None   # set when this surface's probe failed

ProbeResult:
  ok:               bool
  latency_ms:       int
  status_code:      int | None
  checked_at:       datetime UTC

DeployInfo:
  deploy_id:        str
  deployed_at:      datetime UTC
  author:           str
  status:           str          # 'succeeded' | 'failed' | 'building'
  age_days:         float

BuildInfo:
  run_id:           str
  conclusion:       str          # 'success' | 'failure' | 'cancelled' | 'in_progress' | 'unknown'
  run_url:          str
  trigger:          str          # 'push' | 'pull_request' | 'workflow_dispatch' | 'schedule'
  author:           str
  started_at:       datetime UTC

4.3 SecretMeta Shape (rotation UI)

SecretMeta:
  name:             str          # e.g. 'HEROKU_API_KEY'
  vendor:           str          # 'heroku' | 'cloudflare' | 'github' | 'smtp' | 'sentry'
  affected_sites:   list[str]    # surface IDs from §3 this secret affects
  last_rotated_at:  datetime UTC | None
  rotated_by:       str | None   # 'automated' | admin_id
  expires_at:       datetime UTC | None   # when known (CF tokens have this; API keys often don't)
  suggested_cadence_days: int    # 90 for most; 30 for CF tokens given the May 2026 expiry incident
  status:           str          # 'healthy' | 'stale' | 'expiring_soon' | 'expired' | 'rotating' | 'unknown'
  days_since_rotation: int | None

Secret metadata is fetched from Infisical's API (GET /api/v3/secrets/raw with include_imports=true). The response includes secretKey, createdAt, updatedAtlast_rotated_at is inferred from updatedAt. expires_at is stored as a separate Infisical secret named <SECRET_NAME>__EXPIRES_AT (ISO 8601 string) for secrets where expiry is known (e.g., Cloudflare tokens). Feature-developer documents the __EXPIRES_AT convention in console/docs/secret-expiry-convention.md.

4.4 New Schema: console_poll_log

Every poll cycle's results are appended to a lightweight table for telemetry (see §8):

console_poll_log
  id              BIGSERIAL PK
  surface_id      TEXT NOT NULL
  probe_ok        BOOLEAN NOT NULL
  latency_ms      INTEGER NOT NULL
  status_code     INTEGER NULL
  vendor_call     TEXT NOT NULL   -- 'heroku_dyno' | 'cf_pages' | 'github_actions' | 'vault_health' | 'sentry_errors'
  error_msg       TEXT NULL
  polled_at       TIMESTAMP NOT NULL
  -- retention: 30 days (rolling delete via cron or pg_partman)

This table drives the 24-hour sparkline on the drill-down page and feeds the "Heroku API got slow at 4am" detection that §8 describes. Retention: 30 days. Migration: console/db/migrations/0002_poll_log.sql.


5. API Shape

All endpoints below are mounted under the dashboard blueprint. Auth: every endpoint requires an active session (enforced by @require_role from existing RBAC middleware). JSON responses only; Jinja2 templates call them via HTMX hx-get with hx-trigger="every 60s".

5.1 GET /api/status/sites

Returns the full surface list with current cached status.

Auth: readonly minimum.

Response:

{
  "sites": [ <SiteStatus>, ... ],
  "cache_age_seconds": 42,
  "degraded_vendors": ["heroku"]   // vendors that failed their last poll
}

Caching: served from in-memory cache; no DB read. If cache is empty (cold start), triggers synchronous refresh (max 5s).

Error: 200 always; individual site errors are inside each SiteStatus.error field. The endpoint itself does not 500 due to upstream failures.

5.2 GET /api/status/sites/<id>

Single-site detail for the drill-down page.

Auth: readonly minimum.

Response:

{
  "site": <SiteStatus>,
  "history": [
    { "polled_at": "...", "probe_ok": true, "latency_ms": 120 },
    ...   // last 288 rows from console_poll_log (24h at 5-min granularity)
  ],
  "recent_deploys": [ <DeployInfo x5> ],
  "recent_builds": [ <BuildInfo x5> ],
  "secrets": [ <SecretMeta> ]   // only secrets affecting this site
}

Caching: SiteStatus from cache; history from DB; recent_deploys + recent_builds from cache (refreshed every 5 min); secrets from Infisical (60s TTL, cached separately).

5.3 GET /api/status/builds

Recent CI runs across the repo, grouped by affected surface path.

Auth: support minimum.

Response:

{
  "builds": {
    "api": [ <BuildInfo x5> ],
    "console": [ <BuildInfo x5> ],
    "getraxx": [ <BuildInfo x5> ],
    "raxx-app-previews": [ <BuildInfo x5> ]
  },
  "cache_age_seconds": 180
}

Caching: 5-minute TTL. Source: GitHub Actions API, read-only token (GITHUB_API_READONLY_TOKEN).

5.4 GET /api/status/secrets

All tracked secrets with rotation metadata. No secret values.

Auth: superadmin only.

Response:

{
  "secrets": [ <SecretMeta>, ... ],
  "vault_reachable": true,
  "fetched_at": "2026-04-24T12:00:00Z"
}

Caching: 60s TTL in memory. If Infisical is unreachable, returns last cached response with vault_reachable: false and fetched_at showing staleness.

5.5 POST /api/secrets/<name>/rotate

Trigger rotation for a named secret. Superadmin only, requires fresh TOTP.

Auth: superadmin only + TOTP re-elevation (see §6 and ADR 0021).

Request body:

{
  "totp_code": "123456",
  "confirm_name": "HEROKU_API_KEY"   // must match <name> in path; defense against CSRF misdirection
}

Success response (202 Accepted):

{
  "job_id": "rot_abc123",
  "secret_name": "HEROKU_API_KEY",
  "status": "queued",
  "poll_url": "/api/secrets/HEROKU_API_KEY/rotate/rot_abc123"
}

Audit: Two rows written to console_audit_log: 1. action: secret.rotate.triggered — synchronous, written before the pipeline call. 2. action: secret.rotate.completed or secret.rotate.failed — written by the callback from the rotation pipeline (#253).

Error responses: - 400confirm_name does not match path param - 401 — TOTP code invalid or expired - 404 — secret name not tracked in Infisical - 503ROTATION_PIPELINE_DISABLED=1 env var set

5.6 GET /api/secrets/<name>/rotate/<job_id>

Poll rotation job status. Used by the HTMX progress panel.

Auth: superadmin only.

Response:

{
  "job_id": "rot_abc123",
  "secret_name": "HEROKU_API_KEY",
  "status": "running",   // 'queued' | 'running' | 'completed' | 'failed'
  "steps": [
    { "name": "generate", "status": "completed", "completed_at": "..." },
    { "name": "store_infisical", "status": "running", "started_at": "..." },
    { "name": "sync_consumers", "status": "pending" },
    { "name": "smoke_test", "status": "pending" },
    { "name": "revoke_old", "status": "pending" }
  ],
  "error": null
}

The rotation pipeline (#253) is responsible for updating job state. The console stores a lightweight console_rotation_jobs table as the source of truth for job state (see §4.5 below).

5.7 New Schema: console_rotation_jobs

console_rotation_jobs
  id              TEXT PK      -- 'rot_<uuid_short>'
  secret_name     TEXT NOT NULL
  triggered_by    TEXT NOT NULL FK -> admins.id
  triggered_at    TIMESTAMP NOT NULL
  status          TEXT NOT NULL  -- 'queued' | 'running' | 'completed' | 'failed'
  steps_json      JSONB NULL     -- step-level progress, as above
  completed_at    TIMESTAMP NULL
  error_msg       TEXT NULL
  -- retention: 1 year

The rotation pipeline authenticates to the console callback with a pre-shared ROTATION_CALLBACK_SECRET (Infisical / Heroku config var). The pipeline calls POST /api/secrets/<name>/rotate/<job_id>/callback with the updated step status. The console verifies the HMAC on the callback body and updates console_rotation_jobs.


6. RBAC Extensions

The role matrix from console.md §5.1 extends as follows:

Permission superadmin ops support readonly
View dashboard status grid Y Y Y Y
View per-site drill-down Y Y Y Y
View build status Y Y Y N
View secret metadata (no values) Y N N N
Trigger rotation ("Rotate now") Y N N N
View rotation job history Y N N N

Changes needed in console/app/middleware/rbac.py: - No new decorator signatures needed; the existing @require_role("superadmin") pattern covers rotation endpoints. - New: a @require_totp_elevation decorator wraps the rotation endpoint. It verifies a fresh TOTP code from the request body against the admin's seed, independent of the session TOTP check at login. This decorator is separate from @require_role so that future endpoints can require it without re-inventing re-elevation. See ADR 0021.


7. Token Rotation UI Design

7.1 Per-Site Rotation Panel

Inside the drill-down page for each surface, a "Credentials" section lists every SecretMeta where affected_sites includes the site's ID. For each secret:

[ HEROKU_API_KEY ]  status: healthy  last rotated: 42 days ago  expires: never
  Suggested cadence: 90 days
  [ Rotate now ]  (superadmin only — button absent for other roles)

7.2 Secrets Index Page (/secrets)

Top-level secrets page (GET /secrets) lists all SecretMeta entries in a table. Columns: vendor, secret name, status badge, last rotated, days since rotation (colored red > cadence), expires_at, action. superadmin sees the "Rotate now" button; other roles see a grayed-out lock icon.

7.3 "Rotate Now" Flow

sequenceDiagram
    participant Admin as Operator (superadmin)
    participant UI as Console UI (HTMX)
    participant Console as Console Flask app
    participant Pipeline as Rotation Pipeline (#253)
    participant Infisical as vault.raxx.app

    Admin->>UI: Click "Rotate HEROKU_API_KEY"
    UI->>Admin: Confirm modal:<br/>"Rotate HEROKU_API_KEY?<br/>Enter TOTP code to confirm."
    Admin->>UI: Submit TOTP code + confirm
    UI->>Console: POST /api/secrets/HEROKU_API_KEY/rotate<br/>{totp_code, confirm_name}
    Console->>Console: Validate TOTP (fresh code, not session cache)
    Console->>Console: Write audit: secret.rotate.triggered
    Console->>Pipeline: POST <rotation_pipeline_url>/jobs<br/>{secret_name, job_id, callback_url}
    Console-->>UI: 202 {job_id, poll_url}
    UI->>UI: Begin polling GET /api/secrets/.../rot_abc123<br/>(hx-trigger="every 3s")

    loop step updates
        Pipeline->>Infisical: Generate + store new secret
        Pipeline->>Console: POST /api/secrets/.../rot_abc123/callback<br/>{step, status}
        Console->>Console: Update console_rotation_jobs
        UI->>Console: Poll for status
        Console-->>UI: {steps: [...]}
        UI->>Admin: Render step progress
    end

    Pipeline-->>Console: callback: completed
    Console->>Console: Write audit: secret.rotate.completed
    Console-->>UI: {status: "completed"}
    UI->>Admin: Show "Rotation complete" banner

7.4 Progress Display

The HTMX progress panel renders as an ordered step list. Each step has a status icon (pending / running / completed / failed). On status: completed, the panel shows the new last_rotated_at and the updated days_since_rotation: 0. On status: failed, the panel shows error_msg and a "Retry" button (which starts a new job; the failed job is not reused).


8. GitHub Build Status Integration

8.1 Source Path Mapping

Surface Workflow path filter Branch
api-prod, api-staging backend_v2/**, .github/workflows/deploy-heroku.yml main
console-prod, console-staging console/** main
getraxx frontend/trademaster_ui/** (until dedicated marketing-site/ exists) main
raxx-app-previews frontend/trademaster_ui/** any PR branch

8.2 GitHub API Calls

The console calls GET /repos/{owner}/{repo}/actions/runs with query params branch=main&per_page=10. The response is filtered by path (workflow filename) and head_commit.modified (source paths). The response is normalized into BuildInfo objects.

Token: GITHUB_API_READONLY_TOKEN — read-only PAT with repo:read scope (or a fine-grained PAT with actions:read). Stored in Infisical, injected into console's env. This token does not yet exist and must be created as a prerequisite. Marked as a dependency in sub-card M7-10.

Rate limit: GitHub's unauthenticated rate limit is 60 req/hr; authenticated is 5000 req/hr. With 8 surfaces polling every 5 minutes, that is ~96 req/hr authenticated — well within limits. If rate-limited (429 or X-RateLimit-Remaining: 0), the console returns stale cache with a "as of N min ago" annotation and logs the rate-limit event to console_poll_log.

8.3 Caching

BuildInfo is cached in memory with a 5-minute TTL. On cache miss, the console fetches from the GitHub API. On rate-limit or network error, the console returns the last cached value with stale: true and cache_age_seconds in the response.


9. UI Layout

9.1 Dashboard Home (/dashboard)

┌─────────────────────────────────────────────────────────┐
│ Raxx Console   [production]              Kristerpher · superadmin  [sign out] │
├─────────────────────────────────────────────────────────┤
│ ALERTS (if any)                                          │
│  ⚠  CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN expires in 13d  │
│  ⚠  raxx-api-prod last deploy 8 days ago                 │
├─────────────────────────────────────────────────────────┤
│ INFRASTRUCTURE STATUS                  [last checked: 14s ago]     │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ api-prod     │ │ console-prod │ │ vault        │ │ getraxx.com  │ │
│ │  HEALTHY     │ │  HEALTHY     │ │  HEALTHY     │ │  HEALTHY     │ │
│ │ 42ms · 200   │ │ 38ms · 200   │ │ 91ms · 200   │ │ deploy OK    │ │
│ │ [detail →]   │ │ [detail →]   │ │ [detail →]   │ │ [detail →]   │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ api-staging  │ │console-staging│ │ raxx-mockups │ │ previews     │ │
│ │  DEGRADED    │ │  HEALTHY     │ │  HEALTHY     │ │  HEALTHY     │ │
│ │ Heroku unreach│ │ 55ms · 200  │ │ deploy OK    │ │ deploy OK    │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────┤
│ RECENT ACTIVITY                                          │
│ Deploys (last 5)  │  Builds (last 5)  │  Rotations (last 5) │
│ ...               │  ...              │  ...                 │
└─────────────────────────────────────────────────────────┘

The status grid auto-refreshes via hx-get="/api/status/sites" hx-trigger="every 60s" hx-target="#status-grid" hx-swap="outerHTML".

Alert banners are rendered server-side on page load. They check: any secret with status=expiring_soon or expired; any surface with last_deploy.age_days > 7; any surface where liveness.ok = false.

9.2 Per-Site Drill-Down (/dashboard/sites/<id>)

┌─────────────────────────────────────────────────────────┐
│ ← Dashboard   api.raxx.app   [HEALTHY]   checked 8s ago │
├─────────────────────────────────────────────────────────┤
│ HEALTH SPARKLINE (last 24h)                              │
│  ▂▄█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄  │
│  (green = pass, red = fail, grey = data missing)         │
├─────────────────────────────────────────────────────────┤
│ RECENT DEPLOYS            │ RECENT CI RUNS               │
│ 2026-04-24 14:32 · kris · ✓│ push · main · success      │
│ 2026-04-21 09:10 · kris · ✓│ push · main · success      │
│ ...                        │ ...                         │
├─────────────────────────────────────────────────────────┤
│ CREDENTIALS                                              │
│ HEROKU_API_KEY    healthy  42d ago  cadence: 90d         │
│   [Rotate now]  (superadmin only)                        │
│ HEROKU_PLATFORM_API_TOKEN  stale   97d ago  cadence: 90d │
│   [Rotate now]                                           │
└─────────────────────────────────────────────────────────┘

9.3 Secrets Index (/secrets)

Full table of all tracked secrets. Visible to superadmin only. Columns: vendor badge, secret name, status badge (color-coded), last rotated, days since rotation (red if > cadence), expires at, suggested cadence, action button.

9.4 Rotation Progress Panel

Rendered inline after clicking "Rotate now" in the confirm modal. The HTMX fragment replaces the confirm modal with a step list. Each step row has a spinner (running), checkmark (completed), or X (failed).


10. Failure Modes + Degraded States

Failure Dashboard behavior
Heroku Platform API unreachable All Heroku surfaces show status: degraded, error: "Heroku API unreachable". Alert banner appears: "Heroku API unreachable — dyno status unavailable." Other surfaces unaffected.
Cloudflare API rate-limited CF surfaces show status: stale, checked_at shows age. Banner: "Cloudflare status as of N min ago."
Infisical (vault.raxx.app) unreachable Secrets tab shows "Vault unreachable" banner. Rotation endpoints return 503. SecretMeta shows last cached values with vault_reachable: false.
GitHub API rate-limited Build status panel shows "as of N min ago" annotation. No badge degradation.
Polling thread missed a cycle console_poll_log has a gap. Health sparkline shows grey segment for that window (not red — missing data is not the same as a failure).
Console's own DB (Postgres) down Health endpoint returns {"db": "error"}. All pages 503. In-memory cache still serves status data to dashboards already loaded, but new page loads fail.
All external APIs unreachable simultaneously Dashboard renders with full degraded state from last cache; banner shows "Status data may be stale — multiple vendors unreachable." Never a blank page or 500.
Rotation pipeline unreachable POST /api/secrets/*/rotate returns 503: {"error": "rotation_pipeline_unreachable"}. Audit row secret.rotate.pipeline_unreachable written. Job not created.

11. Telemetry on Dashboard Polls

Every call to an external vendor API from the polling service writes a row to console_poll_log (schema in §4.4). Rows include: - vendor_call — which vendor + which call type - latency_ms — wall-clock time for the HTTP call - probe_ok — whether the call succeeded - error_msg — exception/HTTP error on failure

This enables: - The 24-hour sparkline on the drill-down page (reads console_poll_log grouped by 5-minute windows) - Trend detection: "Heroku API latency doubled at 04:00 UTC" visible by querying avg(latency_ms) group by hour - Degraded-vendor detection: a vendor is marked degraded if > 3 consecutive polls failed

Every audit-logged rotation trigger also records latency of the pipeline call in context JSONB.

The console_poll_log table is never shown to non-superadmin roles via the UI (it is an internal telemetry store). It is queryable via the drill-down page's sparkline endpoint and by the alert-banner logic.


12. Migrations

Migration 0002: console_poll_log

console/db/migrations/0002_poll_log.sql — creates console_poll_log table. Rollback: console/db/migrations/0002_poll_log_down.sql — drops it. No foreign keys to other tables; rollback is clean.

Migration 0003: console_rotation_jobs

console/db/migrations/0003_rotation_jobs.sql — creates console_rotation_jobs table. Rollback: console/db/migrations/0003_rotation_jobs_down.sql — drops it.

Both migrations are additive. Neither modifies existing tables.


13. Rollout Plan

Phase What ships Gate
Dark (behind DASHBOARD_REAL=0 env var) Status polling service, in-memory cache, /api/status/sites JSON Feature flag in env; stub page still renders "Welcome."
Internal alpha Real dashboard grid visible to logged-in admins on staging No public exposure; subdomain gated by Cloudflare Access
Beta (secrets) Secrets index page, per-site credential panel. No rotation yet DASHBOARD_SECRETS=1 env var on staging only
Beta (rotation) "Rotate now" flow, TOTP re-elevation, pipeline callback receiver Staging only; rotation_pipeline_url points to staging pipeline
GA All surfaces, full rotation UI, on production console Prod flag flip; no code change required

Dark-to-alpha transition requires: GitHub API readonly token created and stored in Infisical; GITHUB_API_READONLY_TOKEN propagated to Heroku config.


14. Security Considerations

PII collected: Admin email addresses are already stored (milestone 1–6). The dashboard does not collect new end-user PII. console_poll_log and console_rotation_jobs contain no PII — only infrastructure identifiers and admin IDs (which are UUIDs, not emails). Operator emails appear in rotated_by only as UUIDs resolved from console_admins.id; the display name is joined at read time, not stored in the log.

Retention: console_poll_log: 30 days (rolling). console_rotation_jobs: 1 year (matches audit log tier). console_audit_log: 2 years (per existing design).

DSR (data subject request for operator erasure): Admin erasure path (existing) cascades to delete console_rotation_jobs.triggered_by rows via FK SET NULL; the job record is retained for audit but de-identified. console_poll_log has no personal data; no special DSR handling needed.

No secret values at rest or in transit in console: The GET /api/status/secrets endpoint is explicit: only metadata. Infisical's API is called with a service token scoped read-only to the console project. The token does not have the ability to read secret values — it only reads metadata. Feature-developer must verify this scope when creating the Infisical service token.

Rotation TOTP re-elevation: Covered in ADR 0021. The TOTP code submitted with a rotation request must be validated against the current 30-second window (±1 window tolerance). It must not reuse a code that was used within the last 90 seconds. The console tracks the last validated TOTP code per admin in console_totp_seeds.last_verified_code_hash + last_verified_at (new columns, migration 0002 or added to 0003 — feature-developer chooses).

Rotation kill-switch: ROTATION_PIPELINE_DISABLED=1 in console's Heroku config. Rotatable without redeploy via heroku config:set.

Audit log redaction: context JSONB in console_audit_log must never contain secret values. The existing redact_payload() in console/app/services/crypto.py is used for all payloads stored in context. For rotation events, context contains {secret_name, job_id} only.

Breach notification: If console_rotation_jobs or console_poll_log is compromised, no credentials are exposed (neither table stores values). If console_admins or console_totp_seeds is compromised, existing ADR 0001 + console.md §13 breach procedures apply unchanged.

Secrets location: All new env vars (GITHUB_API_READONLY_TOKEN, ROTATION_PIPELINE_URL, ROTATION_CALLBACK_SECRET, ROTATION_PIPELINE_DISABLED) live in Infisical vault, synced to Heroku config vars. Rotatable without redeploy.


15. Open Questions

  1. Rotation pipeline contract: The POST /jobs call from the console to the rotation pipeline (#253) needs a shared interface spec. This design defines the console side; the pipeline side is out of scope here. Before the rotation UI sub-card can be claimed, #253 must publish its inbound API contract (or this design doc amended once it does).
  2. Infisical service token scope: Does the Infisical read-only token for the console have access to secret metadata (names, updatedAt) but NOT secret values? This depends on how Infisical scopes service tokens. Feature-developer must confirm and document. If Infisical's token granularity does not allow metadata-only reads, an alternative is a separate Infisical service that exposes a console-specific metadata API.
  3. CF Access service token for console health probe: console.raxx.app is itself behind Cloudflare Access. The health probe for that surface needs a CF Access service token (CF_ACCESS_SERVICE_TOKEN_CONSOLE) that bypasses the Access policy. This token does not yet exist; it is a prerequisite for the liveness probe of console-prod.
  4. ROTATION_PIPELINE_URL env var: The URL of the rotation pipeline's API endpoint. Not defined until #253 ships. Placeholder for now; rotation endpoints return 503 until it is set.
  5. Sentry API for 24h error count: The existing console.md §9 design references Sentry 24h error counts. A Sentry API token with org:read scope is required (SENTRY_API_TOKEN in Infisical). This is a separate prerequisite; the dashboard shows "Sentry: unavailable" until the token is provisioned.