Raxx · internal docs

internal · gated ↑ index

ADR-0029 — Status Surface Registry: where and how is the surface list stored?

Status: Accepted Date: 2026-04-28 UTC Refs: #601, #581, status-raxx-app.md


Context

The surface registry is the canonical list of services tracked on status.raxx.app. Every tile on the status page maps to one registry entry. The registry drives: - Which surfaces the poller checks - What FreeScout component_tag values are valid - What the public API enumerates

Four storage approaches were considered:

  1. Hardcoded Python list — a list of dicts (or dataclasses) in a module like status/surfaces.py. Adding a surface = a code commit + deploy.
  2. Postgres table — a status_surfaces table, rows editable via admin UI or migration.
  3. YAML/JSON config file in repoconfig/status-surfaces.yaml committed to the repo. Adding a surface = a code commit (but no deploy of Python logic, just config).
  4. Pulled from FreeScout component tags — the surface list is derived from the component_tag field options in FreeScout.

Decision

YAML config file in repo (config/status-surfaces.yaml).

The surface list changes on a deployment cadence, not an operational cadence. Adding a new owned service (e.g., a new Heroku app) or removing a 3P integration is a deliberate platform decision that belongs in version control with a PR and review — not a runtime database mutation reachable to anyone with DB access.

YAML gives: - Auditability: every surface addition or removal is a commit with an author, timestamp, and context. This is part of the audit trail requirement for any change that affects the public-facing status page. - Simplicity: no migration required to add a field to the surface schema. The YAML schema evolves with the codebase. - Operator workflow: operators add a surface by opening a PR to config/status-surfaces.yaml, which also triggers review and ensures the FreeScout runbook is updated in the same PR. - Bootstrap: the config is available at server start without a DB query. The poller reads it at process start (and on SIGHUP for hot-reload without redeploy).

The YAML file is the single source of truth. The Postgres status_state table stores runtime state (current status, last probe result, ticket_pending) keyed by surface_id. The YAML is authoritative for what surfaces exist; Postgres is authoritative for what state they are in.


Consequences

Positive: - Adding a surface requires only a YAML edit + PR — no migration, no DB admin console needed. - The file is readable as documentation. Engineers and operators can see the full surface taxonomy by reading one file. - Bootstrap is synchronous — server start always has the full registry without a cold-start DB read. - The registry is version-controlled; rollback of a bad surface addition is git revert.

Negative: - Adding a surface requires a code deploy. For a platform that changes surfaces infrequently (estimated: once per quarter), this is an acceptable constraint. - FreeScout's component_tag dropdown options must be kept in sync with the YAML manually (the #605 runbook covers this). There is no automatic sync. This is a documented operational dependency. - If the surface list grows very large (>100 entries), YAML readability degrades. This is not a near-term concern.

Alternatives Rejected:


Schema

See status-raxx-app.md §3 for the full YAML field specification and example entries.

Key fields per entry: id, display_name, category (owned | upstream_3p | downstream_3p), probe_url (nullable), partner_status_url (nullable), public_description, partner_name (for 3P only — the partner's own product name, used in "tracking [partner]'s incident" copy).


Amendment — public field (2026-05-05, #1149)

A public: bool field (default true) was added to the surface entry schema following a security audit that identified four operator-internal surfaces leaking on the customer status page.

Semantics

public value Customer status page Operator console Probes + 3P poller
true (or absent) Visible Visible Active
false Hidden Visible Active

public: false means hidden from customers, not removed from monitoring. Probes and the 3P poller continue to write state for public: false surfaces. The operator console and /api/internal/status/* endpoints always see all surfaces.

Enforcement

Surfaces currently marked public: false

ID Reason
console-raxx-app Exposes existence and uptime of admin panel
internal-docs-raxx-app Name announces it is internal; no customer relevance
vault-raxx-app Exposes secrets infrastructure to attackers; "Secrets Vault" is an enumeration vector
ci-runners Internal build infrastructure; no customer relevance

Adding or changing public: false surfaces

  1. Update config/status-surfaces.yaml and backend_v2/status/surface_registry.yaml (add/remove public: false).
  2. Update INTERNAL_ONLY_SURFACE_IDS in frontend/status-worker/src/index.ts.
  3. Update INTERNAL_ONLY_IDS in backend_v2/tests/unit/test_status_surface_registry.py and the mirror set in frontend/status-worker/test/worker.test.ts.
  4. Open a PR — the test suite enforces the contract so CI will catch drift.