Raxx · internal docs

internal · gated ↑ index

Console Phase 1 Live Reachability — 2026-05-07 UTC

Status: Phase 1 complete (live-test against console.raxx.app prod). Method: HTTP probes via CF-Access-Client-Id/Secret (raxx-console-self-probe token) + User-Agent: raxx-console-self-probe/1.0. Scope: 22 user-facing routes. Functional spot-check (Phase 2) requires admin session; documented at end.


Setup notes (operationally interesting)

Three things had to come into alignment before any reachable HTTP probe worked:

  1. Right service-token credentials. CF_ACCESS_CLIENT_ID/SECRET in my env (d4efd...) were the wrong token — they're the console-app's outbound CF Access service token, not the inbound probe token. The right value lives in Heroku config-var CF_ACCESS_SVC_CONSOLE on raxx-console-prod and starts with c4749. Pulled and split as client_id:client_secret.

  2. Service-token allowlisted on the right Access app. Confirmed via screenshot: Application "Raxx Console" → Policy "raxx-console-self-probe (service token)" → Include = Service Token: raxx-console-self-probe. No additional Include/Require/Exclude conditions.

  3. User-Agent matters. With User-Agent: Python-urllib/3.12 → HTTP 403 + error code: 1010. With User-Agent: raxx-console-self-probe/1.0 → HTTP 200. Cloudflare is gating on UA above the Access policy — likely Bot Fight Mode or a custom WAF rule that rejects generic scraper UAs. The "self-probe" UA is the canonical/expected identity for this service token. Three back-to-back A/B tests confirmed deterministic behavior.

This UA-gating is undocumented in any of the runbooks I scanned. It should be added to cf-access-service-token-provisioning.md as a "things that bite you" note.


Results table

Route HTTP Bytes Title / first line
/health 200 16 {"status":"ok"}
/ 200 26719 Raxx Console (landing)
/dashboard 200 32044 Login — Raxx Console
/security 200 32044 Login — Raxx Console
/status 200 32044 Login — Raxx Console
/secrets 200 32044 Login — Raxx Console
/secrets/history 200 32044 Login — Raxx Console
/console/flags 200 32044 Login — Raxx Console
/console/flags/promotions 200 32044 Login — Raxx Console
/console/customers/ 200 32044 Login — Raxx Console
/console/customers/invite 200 32044 Login — Raxx Console
/console/admins/online 200 32044 Login — Raxx Console
/console/deploy-freeze 200 32044 Login — Raxx Console
/billing 200 32044 Login — Raxx Console
/billing/alert-config 200 32044 Login — Raxx Console
/ops 200 32044 Login — Raxx Console
/admin/console-versions 200 32044 Login — Raxx Console
/auth/login 200 (login) Login — Raxx Console
/auth/totp/enroll 200 (login) Login — Raxx Console
/auth/totp/verify 200 (login) Login — Raxx Console
/api/internal/deploy-freeze/state 401 25 {"error":"unauthorized"}
/_alerts/drawer 200 32044 Login — Raxx Console

Status histogram: 21 × 200, 1 × 401. Zero 5xx, zero 1010s under correct headers.


What Phase 1 confirms

Console app is healthy. No 500s, no broken templates, no startup failures. ✓ CF Access service-token bypass works with the right token + UA. ✓ Front-door auth is enforced correctly — every protected route serves the login page (HTTP 200, title "Login — Raxx Console") rather than 302-redirecting. App-level session auth is working as designed. ✓ Public health endpoint responds correctly at /health with {"status":"ok"}. ✓ All 22 routes render server-side without throwing — that's the basic completeness signal.


What Phase 1 cannot tell us

Every authenticated-only route returns the login page (200 with same Login — Raxx Console title). I see the frame of each surface, not the content an authenticated admin would see. So I cannot evaluate:

That's all Phase 2 territory. To run Phase 2, the agent needs an authenticated admin session.


Phase 2 options — getting an admin session

To do real feature testing, one of these has to land:

Option What Cost Cleanliness
A — service-account admin login Build a programmatic auth path: a test admin account whose credentials live in vault, agent uses them to obtain a session token via a documented /auth/service-login endpoint that bypasses TOTP for this specific principal. M (need to build the endpoint, document the security boundary) Cleanest; production-shape
B — passkey-replay shim Operator authenticates in their browser; gives me the resulting session cookie via ! export CONSOLE_SESSION_COOKIE=...; I attach it to Playwright. XS Fragile (cookie expires); fine for one-off audits
C — Selenium/Playwright with operator-driven SSO Use Playwright with storageState from a real operator session (operator runs a one-time login + saves state to a JSON file; agent loads it). S Better than B (state persists); still expires
D — agent identity in passkey allowlist Treat the agent as a first-class admin identity with its own passkey/TOTP. Requires WebAuthn negotiation infrastructure for headless agents. L (hard problem) Most architecturally pure

Recommendation: Option A is the right long-term build, Option C is the right short-term unblock for getting Phase 2 done today.

For Option C concretely: operator runs gh workflow run "console-qa-storagestate" (or local equivalent) once a day; that fires a Playwright session that pops the operator's browser for SSO + passkey, then saves auth-state.json to a known location. Agent then loads it for any QA pass that day. Expires with operator's normal session lifetime.


Findings that fall out of Phase 1 alone

Even without admin session, the following observations are actionable:

  1. 404 on auth paths without /auth/ prefix. My initial Phase 1 hit /login, /totp/enroll, /totp/verify — all 404. Correct paths are /auth/login, /auth/totp/enroll, /auth/totp/verify. Confirmed via blueprint url_prefix="/auth". No bug here — but yesterday's PR #1317 listed these under "orphaned routes." Updating that report: those routes ARE wired (under /auth/), they just aren't entry points from the menu (which is correct since they're called from auth flow redirects, not nav).

  2. /api/internal/deploy-freeze/state returns 401 to service-token only, which means the internal API requires both CF Access AND admin session. Good defense-in-depth, but worth confirming the deploy workflow's call site has the right session-equivalent (probably an internal token, not a real admin session).

  3. No 5xx errors. Every protected route renders the login page. None throw. That's the simplest possible "console is healthy" signal we can get without Phase 2.

  4. The 32044-byte login page is identical across all routes. That's expected — it's the same template. But it's worth noting: if an attacker were probing for routes, they couldn't distinguish "valid route requiring auth" from "non-existent route" by response shape alone (since 404 vs 200 still differ by status, but both render the same login HTML on 200 paths). This is a security positive, not negative.


Updates to runbooks (proposed)

Add to docs/ops/runbooks/cf-access-service-token-provisioning.md:

Service token UA gating: CF Bot Fight Mode (or a CF WAF rule) rejects requests to console.raxx.app from generic UAs (e.g. Python-urllib/*, curl/*). Service-token-authenticated requests must use the canonical UA <token-name>/<version> (e.g. raxx-console-self-probe/1.0). Without the right UA, requests get HTTP 403 + body error code: 1010 even when the token is valid and on the allowlist.

Add to vault-token-taxonomy:

CF_ACCESS_SVC_CONSOLE format: client_id:client_secret (single env var, colon-separated). Consumers parse with cid, _, csec = raw.partition(":"). See console/app/services/site_probes.py for the canonical parsing pattern.


Run by: Claude Code main agent, 2026-05-07 UTC Phase 2 (functional click-through): awaiting admin session mechanism (recommend Option C above as today's unblock).