Status: Accepted Date: 2026-04-28 UTC Refs: #601, #581, status-raxx-app.md
The surface registry is the canonical list of services tracked on status.raxx.app. Every tile on the status page maps to one registry entry. The registry drives:
- Which surfaces the poller checks
- What FreeScout component_tag values are valid
- What the public API enumerates
Four storage approaches were considered:
status/surfaces.py. Adding a surface = a code commit + deploy.status_surfaces table, rows editable via admin UI or migration.config/status-surfaces.yaml committed to the repo. Adding a surface = a code commit (but no deploy of Python logic, just config).component_tag field options in FreeScout.YAML config file in repo (config/status-surfaces.yaml).
The surface list changes on a deployment cadence, not an operational cadence. Adding a new owned service (e.g., a new Heroku app) or removing a 3P integration is a deliberate platform decision that belongs in version control with a PR and review — not a runtime database mutation reachable to anyone with DB access.
YAML gives:
- Auditability: every surface addition or removal is a commit with an author, timestamp, and context. This is part of the audit trail requirement for any change that affects the public-facing status page.
- Simplicity: no migration required to add a field to the surface schema. The YAML schema evolves with the codebase.
- Operator workflow: operators add a surface by opening a PR to config/status-surfaces.yaml, which also triggers review and ensures the FreeScout runbook is updated in the same PR.
- Bootstrap: the config is available at server start without a DB query. The poller reads it at process start (and on SIGHUP for hot-reload without redeploy).
The YAML file is the single source of truth. The Postgres status_state table stores runtime state (current status, last probe result, ticket_pending) keyed by surface_id. The YAML is authoritative for what surfaces exist; Postgres is authoritative for what state they are in.
Positive:
- Adding a surface requires only a YAML edit + PR — no migration, no DB admin console needed.
- The file is readable as documentation. Engineers and operators can see the full surface taxonomy by reading one file.
- Bootstrap is synchronous — server start always has the full registry without a cold-start DB read.
- The registry is version-controlled; rollback of a bad surface addition is git revert.
Negative:
- Adding a surface requires a code deploy. For a platform that changes surfaces infrequently (estimated: once per quarter), this is an acceptable constraint.
- FreeScout's component_tag dropdown options must be kept in sync with the YAML manually (the #605 runbook covers this). There is no automatic sync. This is a documented operational dependency.
- If the surface list grows very large (>100 entries), YAML readability degrades. This is not a near-term concern.
Alternatives Rejected:
component_tag options from the canonical registry, not the other way. The registry must be authoritative for the status system; FreeScout is a downstream consumer of it.See status-raxx-app.md §3 for the full YAML field specification and example entries.
Key fields per entry: id, display_name, category (owned | upstream_3p | downstream_3p), probe_url (nullable), partner_status_url (nullable), public_description, partner_name (for 3P only — the partner's own product name, used in "tracking [partner]'s incident" copy).
public field (2026-05-05, #1149)A public: bool field (default true) was added to the surface entry schema following a security audit that identified four operator-internal surfaces leaking on the customer status page.
public value |
Customer status page | Operator console | Probes + 3P poller |
|---|---|---|---|
true (or absent) |
Visible | Visible | Active |
false |
Hidden | Visible | Active |
public: false means hidden from customers, not removed from monitoring. Probes and the 3P poller continue to write state for public: false surfaces. The operator console and /api/internal/status/* endpoints always see all surfaces.
status_registry.py): get_public_surface_registry() filters to public != false. All /api/status/public/* routes must call this function, not get_surface_registry(). The full get_surface_registry() is used by the prober, the 3P poller, and the webhook receiver so they accept state updates for all surfaces.frontend/status-worker/src/index.ts): INTERNAL_ONLY_SURFACE_IDS (a Set<string>) holds the IDs of public: false surfaces. handleGetSurfaces filters D1 rows through this set before building the response. The set must be kept in sync with the YAML whenever the public flag changes.is_known_surface(): Always checks the full registry — internal surfaces must be accepted by the webhook receiver.public: false| ID | Reason |
|---|---|
console-raxx-app |
Exposes existence and uptime of admin panel |
internal-docs-raxx-app |
Name announces it is internal; no customer relevance |
vault-raxx-app |
Exposes secrets infrastructure to attackers; "Secrets Vault" is an enumeration vector |
ci-runners |
Internal build infrastructure; no customer relevance |
public: false surfacesconfig/status-surfaces.yaml and backend_v2/status/surface_registry.yaml (add/remove public: false).INTERNAL_ONLY_SURFACE_IDS in frontend/status-worker/src/index.ts.INTERNAL_ONLY_IDS in backend_v2/tests/unit/test_status_surface_registry.py and the mirror set in frontend/status-worker/test/worker.test.ts.