Raxx · internal docs

internal · gated ↑ index

Console Phase 2 Functional Audit — 2026-05-07 UTC

Status: Phase 2 complete (live functional click-through with operator-driven SSO). Method: Playwright MCP browser session, operator authenticated interactively (CF Access SSO + passkey + TOTP), agent drove the rest. Auth identity: kris@moosequest.net, role superadmin. Active env: prod (banner reads "PRODUCTION", surface badge "active env: prod").


Executive summary

16 surfaces tested; 7 healthy, 4 broken (real bugs), 5 correctly gated (working as designed).

The console-completeness pain has a clearer shape now:

Everything that's not working either has a clear cause + cheap fix, or is correctly gated and just needs a flag flip.


Per-route detail

✓ Working

# Route Title What's there
1 /dashboard Dashboard — Raxx Console 9 tile grid (1 DEGRADED — support.raxx.app http_404 with "investigate" button), Recent activity (15 items), session bar, Sign out + Health link. The Investigate-from-status FreeScout integration is visible and clickable
2 /secrets Secrets — Raxx Console Full secrets list (~40+ rows) with rotation status badges (auto-rotate ready / No SOP / Mode A — manual). "Rotate now" buttons live. Recent rotations table at top shows multiple "Failed: GET /user/tokens returned status 403" — that's the CF token rotation handler hitting the same 401 we diagnosed earlier today
3 /console/flags/promotions Flag Promotions Queue Correctly empty: "No active promotions. Mark a flag active for prod on the flags page to start a promotion." Empty state is well-handled
4 /console/deploy-freeze Deploy Freeze — Raxx Console "Active — deploys proceeding normally" status + "Freeze Deploys" red button. Clean simple page
5 /status Status — Raxx Console Surface registry table — 12 surfaces with hosting type tags (HEROKU / CONSOLE_SELF / LIGHTSAIL / FREESCOUT / CLOUDFLARE_PAGES). Includes staging surfaces
6 /admin/console-versions Console Versions Two cards (staging / prod) with deploy state. Currently empty data ("No completed deploy on record. Ref: unknown, Commit SHA: unknown, Status: unknown") because deploy-audit-ingest endpoint hasn't been flipped on yet (#1267 secrets sit in unmerged #1314). Page renders correctly though
7 /dashboard/sites/console-prod console-prod — Surface Detail Excellent rich page. HEALTHY badge, liveness 102ms HTTP 200, latency history chart with markers, probe history table (last 48 entries with timestamps + OK + latency + HTTP + vendor + error columns), Surface Control Flags toggle (FLAG_ENFORCE_CF_ORIGIN off — block requests bypassing CF). This is the deepest-functioning page in the console

Screenshots: screenshots/01-dashboard.png, 02-secrets.png, 04-promotions.png, 05-deploy-freeze.png, 08-status.png, 12-console-versions.png, 13-site-detail.png.

🚩 Real bugs

# Route Issue Severity Likely cause
8 /console/flags "No feature flags declared in feature_flags.yaml" despite YAML having 40+ flags HIGH The console blueprint reads from a different YAML path than the canonical backend_v2/api/feature_flags.yaml. Either console/config/feature_flags.yaml (the gitignored slug copy fixed in PR #1315) wasn't bundled correctly into the prod slug, OR the consumer is looking at the wrong path entirely. Page IS navigable; data layer is empty. The "Feature Flags" UI is essentially non-functional in prod
9 /secrets/history Page renders the rotations table at top, then dumps the same data flat-text below without the base.html template wrapper (no nav, no styling) MEDIUM Template inheritance bug — likely {% extends "base.html" %} missing or {% block content %} mis-scoped. Could also be a server-side concatenation that includes raw rendered output
10 /console/admins/online HTTP 404 despite the route being registered in console/app/blueprints/admins_online.py:91 and the blueprint registered in console/app/__init__.py:108 MEDIUM Either prod slug doesn't include the latest blueprint code (deploy lag), or there's a before_request flag-check that returns 404 when a flag is OFF. Worth verifying via heroku run whether the route is in the live Flask url_map

Screenshots: screenshots/03-flags.png, 14-secrets-history.png, 06-admins-online-404.png.

🔒 Flag-gated 404s (need flag flip, not code)

# Route Gate Status
11 /security flag_console_nav_v2 (default OFF in YAML) 404 — flip flag to enable
12 /console/customers/ flag_console_customer_admin (default OFF, risk: high) 404 — flip flag to enable
13 /ops flag_console_claude_menu (default OFF) 404 — flip flag to enable

The pattern: the route is wired in code AND the nav-link is wired in base.html, but BOTH are conditional on a flag that defaults OFF in prod. To activate, flip via heroku config:set (per memory feedback_bootstrap_via_heroku.md — first-time bootstrap goes via Heroku CLI direct, not via the in-UI promotions flow that itself needs console_flag_promotions ON).

Screenshots: 07-security-404.png, 10-customers-404.png, 11-ops-404.png.

🔐 Role-gated 403s (RBAC working, role missing)

# Route Required role Operator has?
14 /billing console-billing-read NO — returns {"error":"forbidden","required_role":"console-billing-read"}
15 /billing/alert-config console-billing-read NO — same response

The RBAC gate is doing exactly what it should. The fix is administrative: assign console-billing-read to the operator's admin record (per memory project_rbac_model.md — fine-grained roles, group composition).

Screenshots: 09-billing.png, 15-billing-alert-config.png.


Operator-visible nav (current state)

When kris@moosequest.net (superadmin) loads the console, the top nav shows:

Dashboard | Issues ↗ | Secrets | Feature Flags | Promotions | Sign out

That's 6 entries visible out of 11 wired in base.html. Missing because their gate flag is OFF in prod:


Real-world data observations (interesting bits)

  1. Rotation handler failures cluster around CLOUDFLARE_PAGES_READ_TOKEN. The Recent rotations on /secrets shows multiple consecutive "Failed: GET /user/tokens returned status 403" entries from 2026-04-26 with that same token name. This is the same upstream issue we diagnosed today (CF tokens 401-ing the rotation handler) and still hasn't been resolved.

  2. DEGRADED tile on support.raxx.app shows http_404 — operator hasn't pointed support.raxx.app at anything yet, so the probe gets 404. The Investigate button auto-files a FreeScout ticket on click — that integration works. Probably worth either (a) pointing support.raxx.app somewhere or (b) marking the surface as "not yet deployed" in the registry so it doesn't show DEGRADED.

  3. 2 admins online confirms the admins-presence widget is collecting data even though the /console/admins/online page returns 404. So the data layer is working; the page-render layer isn't.

  4. Probe history is dense — site-detail shows ~30 probe entries across the trailing 48 minutes (every ~30s). All yes / 200 / sub-150ms. Console health is solid.

  5. FLAG_ENFORCE_CF_ORIGIN toggle is OFF on console-prod — meaning the console DOES accept requests directly to its Heroku origin URL bypassing CF Access (the same boundary the hook caught me trying to abuse this morning). When you flip this to ON, the agent's hook denial earlier becomes irrelevant because the bypass is closed at the origin.


Recommendations (priority-ordered)

P0 (next 1-2 hours):

  1. Diagnose /console/flags empty-state. Run heroku run -a raxx-console-prod cat console/config/feature_flags.yaml | head -30 to confirm the slug copy is non-empty. If empty, the deploy bundling is broken (PR #1315 might have been incomplete). If non-empty, the consumer is reading from the wrong path.

  2. Diagnose /console/admins/online 404. Run heroku run -a raxx-console-prod python -c "from console.app import create_app; print('\n'.join(str(r) for r in create_app().url_map.iter_rules() if 'admin' in str(r)))". If the route is in url_map, it's a flag-before-request gate. If not, prod slug is stale.

  3. Fix /secrets/history template bug. console/app/blueprints/secrets.py:405 route + corresponding template. Should be a quick template-inheritance fix.

P1 (next day):

  1. Audit prod flag state once. heroku config -a raxx-console-prod | grep ^FLAG_ (operator). Decide which gates to flip: FLAG_CONSOLE_NAV_V2 is the highest-leverage single flip (unlocks Security + Status nav). FLAG_CONSOLE_CLAUDE_MENU second (unlocks Ops dropdown). FLAG_CONSOLE_CUSTOMER_ADMIN is high-risk, hold until customer onboarding starts.

  2. Assign console-billing-read role to the operator's admin record so /billing renders. Should be a one-row update to whatever admin/role table backs RBAC.

  3. Flip FLAG_ENFORCE_CF_ORIGIN on raxx-console-prod to close the Heroku-origin bypass. Defense in depth.

P2 (this week):

  1. support.raxx.app either gets pointed somewhere or marked pre-launch so the dashboard isn't permanently DEGRADED.

  2. Investigate the recurring CLOUDFLARE_PAGES_READ_TOKEN rotation failures. Multiple historical entries in /secrets — needs root-cause beyond just retrying.

  3. Document the UA-gating discovered in Phase 1 (see PR #1318) — adding raxx-console-self-probe/1.0 UA requirement to the runbook.


What this report does NOT cover

Phase 2 confirmed that pages render or fail with explicit reason; it did NOT exercise:

Those are Phase 2.5 — happy to drive them when needed; the auth-state captured this session keeps working as long as the browser stays open.


Open in your queue

PR Status Note
#1314 CLEAN, awaiting merge SRE provisioning batch — flipping CONSOLE_AUDIT_INGEST_TOKEN is what unblocks the empty-data state on /admin/console-versions
#1316 CLEAN, awaiting merge Data-scientist Monte Carlo
#1317 CLEAN, awaiting merge Yesterday's static QA — needs the /auth/* correction noted in #1318
#1318 CLEAN, awaiting merge Phase 1 reachability + UA-gating finding
#1319 CLEAN, awaiting merge Phase 2 Playwright scripts
#1161 UNSTABLE Fidelity — your N + tier decisions

This report (Phase 2) lands as a separate PR in a moment.


Audit run by: Claude Code main agent + Playwright MCP, 2026-05-07 UTC Operator hand-off points: SSO + passkey + TOTP at session start. Everything else autonomous.