Status: Accepted Date: 2026-05-06 UTC Owner: software-architect Parent card: #649 ADRs: 0052 · 0053
Raxx surfaces were bootstrapped ad hoc. Each new surface repeats the same boilerplate: CF Pages project creation, DNS CNAME, CF Access gating, GitHub Actions workflow, status-tile wiring, audit-log hookup. The work took a full day per surface. This document codifies the convention so the next surface takes 30 minutes and makes no ad-hoc decisions.
The doc covers: hosting-tier decision tree, domain attachment, workflow templates (deploy + preview), auth/access pattern, audit-log hookup, status tile registration, Sentry wiring, and the memory-note convention.
It does not prescribe exact library versions or force retrofits of existing surfaces. Surface owners retain flex room within the tier's boundary.
All platform invariants apply. Surface-specific constraints:
CF_PAGES_DEPLOY_TOKEN,
HEROKU_API_KEY, etc. are never written into workflow YAML.POST /api/internal/audit on the console's internal
endpoint. continue-on-error: true on the emit step so audit failure
never blocks a deploy..github/actions/check-deploy-freeze
before touching any environment. Break-glass override via
DEPLOY_FREEZE_OVERRIDE=1 repo secret.config/status-surfaces.yaml and a health probe URL registered before
the first production deploy. The console dashboard polls every entry; a
missing tile is a monitoring blind spot.D0AJ7K184TV on health-check failure for production deploys.
continue-on-error: true on the Slack step.Is the surface stateless / static / SSG output?
├─ Yes → CF Pages (Tier A)
└─ No
Is it a long-running daemon / VM workload (ticketing, self-hosted apps)?
├─ Yes → Lightsail + Terraform (Tier C)
└─ No
Does it need server-side rendering, session state, or a DB connection?
├─ Yes → Heroku (Tier B)
└─ No → re-evaluate; likely fits Tier A or needs a CF Worker (Tier A+)
| Tier | Platform | Examples | Preview support | Staging equivalent |
|---|---|---|---|---|
| A | Cloudflare Pages | getraxx.com, docs.raxx.app, status.raxx.app, internal-docs.raxx.app | Auto (branch deploys) | Preview URL per branch |
| A+ | CF Worker + D1/KV | status-worker, velvet-worker | Manual (wrangler deploy --env staging) | Separate worker name *-staging |
| B | Heroku | console.raxx.app (Raptor backend_v2) | Review Apps (#350, optional) | Separate Heroku app raxx-<surface>-staging |
| C | Lightsail + Terraform | tickets.raxx.app (FreeScout), vault.raxx.app (Infisical) | None — prod-only by design | None |
Rule of thumb: default to Tier A. Move to A+ only if edge persistence (KV/D1) is required but no SSR. Move to Tier B only when Flask/Node SSR or a persistent Postgres connection is unavoidable. Never use Tier C for new surfaces — it is reserved for third-party self-hosted apps that cannot run on managed PaaS.
All *.raxx.app subdomains live in the Cloudflare-managed raxx.app zone
(zone ID f12dbb5cac57d5591a5058874498a6d1). Every surface needs one
CNAME record added to that zone. The DNS token scoped to DNS:Edit on
the raxx.app zone is stored in vault at /MooseQuest/cloudflare/ as
CLOUDFLARE_EDIT_DNS.
For Tier A (CF Pages): CNAME <surface>.raxx.app → <project>.pages.dev (proxied)
For Tier A+ (CF Worker): CNAME <surface>.raxx.app → <worker>.workers.dev (proxied)
For Tier B (Heroku): CNAME <surface>.raxx.app → <heroku-app>.herokuapp.com (proxied)
For Tier C (Lightsail): A <surface>.raxx.app → <static-ip> (proxied)
DNS steps are idempotent: the workflow checks for an existing record before
creating. Pattern in deploy-customer-docs.yml §"Ensure docs.raxx.app DNS CNAME".
Any surface that is not intentionally public requires a CF Access application.
Token required: CF_ACCESS_MGMT_TOKEN stored in vault at /MooseQuest/cloudflare/
as CLOUDFLARE_ACCESS_MGMT_TOKEN.
Steps:
1. List existing Access apps: GET /accounts/{id}/access/apps
2. If hostname already has an app → no-op
3. If not → POST /accounts/{id}/access/apps with:
- type: self_hosted
- session_duration: 24h
- policies[0].include: [{email: {email: "kris@moosequest.net"}}]
Pattern in deploy-internal-docs.yml §"Ensure CF Access app".
Public surfaces (getraxx.com, docs.raxx.app, status.raxx.app) skip this step and
instead emit X-Robots-Tag: index, follow in their _headers file.
Heroku surfaces expose an origin URL (e.g., raxx-console-prod-ff30a22abccb.herokuapp.com)
allowlisted by the CF Access bypass for /health. The health check in the deploy
workflow uses the Heroku origin URL, not the CF-fronted domain, to avoid 302
redirects from Access. Pattern in deploy-console.yml §"Post-deploy health check".
Every deploy workflow must contain these jobs in order:
freeze-check → [build] → deploy → notify
freeze-check
freeze-check:
name: Deploy freeze check
runs-on: ubuntu-latest
# For Tier A: always runs (deploys are triggered from main)
# For Tier B prod-only: if: github.event_name == 'workflow_dispatch' && inputs.environment == 'production'
env:
DEPLOY_FREEZE_OVERRIDE: ${{ secrets.DEPLOY_FREEZE_OVERRIDE }}
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/check-deploy-freeze
with:
service-token-id: ${{ secrets.CF_DEPLOY_FREEZE_CLIENT_ID }}
service-token-secret: ${{ secrets.CF_DEPLOY_FREEZE_CLIENT_SECRET }}
console-url: https://console.raxx.app
build-and-deploy (Tier A) or deploy (Tier B)
Must include, in this order:
1. Notify — building (via .github/actions/notify-deploy-status)
2. Load vault secrets (via .github/actions/load-vault-secrets)
3. Build step (surface-specific)
4. Infrastructure bootstrap (idempotent: CF Pages project create, DNS CNAME, CF Access)
5. Notify — deploying
6. Deploy step (wrangler / heroku push)
7. Health check (5 retries × 10s; GET <health-url>/health must return 200)
8. Emit audit event — success
9. Emit audit event — failure (on: failure())
10. Slack DM on prod health-check failure (on: failure(), prod-only)
11. Notify — succeeded / Notify — failed
env:
INFISICAL_CLIENT_ID: ${{ secrets.INFISICAL_CLIENT_ID }}
INFISICAL_CLIENT_SECRET: ${{ secrets.INFISICAL_CLIENT_SECRET }}
INFISICAL_PROJECT_ID: ${{ secrets.INFISICAL_PROJECT_ID }}
CF_ACCESS_CLIENT_ID: ${{ secrets.CF_ACCESS_CLIENT_ID }}
CF_ACCESS_CLIENT_SECRET: ${{ secrets.CF_ACCESS_CLIENT_SECRET }}
concurrency:
group: deploy-<surface>[-${{ inputs.environment }}]
cancel-in-progress: false # never cancel an in-flight deploy
Tier B surfaces that support staging/production environments should include the
environment name in the group key (pattern from deploy-console.yml).
Every workflow emits a POST /api/internal/audit with this payload shape.
The emit step is continue-on-error: true on both success and failure paths.
{
"action": "<surface>.deploy.<environment>",
"actor": "<github.actor>",
"outcome": "success | deploy_failed | health_check_failed",
"timestamp_utc": "2026-01-01T00:00:00Z",
"source_sha": "<sha>",
"target_app": "<app-name-or-project>",
"environment": "production | staging",
"trigger": "ci | manual",
"workflow_run_id": "<run-id>",
"workflow_run_url": "<run-url>"
}
Token: CONSOLE_AUDIT_INGEST_TOKEN (GitHub Environment secret).
URL: CONSOLE_INTERNAL_URL (GitHub var, defaults to the Heroku origin URL).
Pattern: deploy-console.yml §"Emit audit event — success/failure".
CF Pages auto-generates preview URLs for every branch that gets a Pages
deployment. No additional workflow required. The preview URL pattern is:
https://<branch-name>.<project>.pages.dev
To enable branch-based previews for a surface, set --branch in the wrangler
deploy command. Branches other than main automatically land in the preview
environment. Pattern in deploy-status-page.yml.
Heroku Review Apps are optional and tracked in #350. When enabled, the
app.json at the surface's Heroku root configures the review app environment.
Review apps do not emit audit events and do not gate behind CF Access.
No preview environments. Tier A+ staging is a separate named worker
(<surface>-worker-staging). Tier C has no staging counterpart by design.
Every surface must have an entry in config/status-surfaces.yaml before its
first production deploy. Minimum required fields:
- id: <surface> # stable slug; never change post-deploy
hostname: <surface>.raxx.app
label: "<Human label>"
probe_url: https://<surface>.raxx.app/health # or /ping, / for static sites
probe_type: http # http | tcp | none
tier: A # A | A+ | B | C
access: public # public | internal
workflow: deploy-<surface>.yml
sentry_project: <sentry-slug> # omit if no server code
The console status poller (console/app/services/status_poller.py) reads this
YAML on startup. Adding a new entry automatically creates the tile on the
dashboard. No code change required — only the YAML update.
Every Tier B (Heroku) surface that runs server code must include a Sentry DSN.
/MooseQuest/sentry/<surface>/SENTRY_DSN.github/actions/load-vault-secretsheroku config:set SENTRY_DSN="$SENTRY_DSN" --app <app> >/dev/null 2>&1
(stdout silenced per feedback_heroku_config_set_echoes_secrets.md)sentry_errors_24h when
sentry_project is set in the surface registry.Tier A (static) surfaces do not run server code — Sentry is not applicable. Tier A+ (Worker) surfaces may use Sentry's Cloudflare integration; treated as optional for now.
A new surface follows this sequence:
Dark → Flag → Beta → GA
| Phase | What happens |
|---|---|
| Dark | Scaffold created (workflow, status entry, DNS). Surface not deployed. |
| Flag | First deploy to staging (or CF Pages preview). Access restricted. Operator validates. |
| Beta | Production deployed. CF Access or invite-only access gate active. |
| GA | Public (or fully-internal): Access gate lifted for public surfaces; CF Access confirmed for internal. |
load-vault-secrets action is the only allowed mechanism.heroku config:set is always
silenced with >/dev/null 2>&1.DEPLOY_FREEZE_OVERRIDE=1 repo secret freezes all deploys
even when the console is unreachable.workflow-uuid-tracing.md).When a new surface is significant enough to persist across sessions, add a memory entry at:
.claude/agents/memory/project_<surface>.md
Minimum content: - Surface name, URL, hosting tier - Key vault paths used by the surface - Any operational quirks (e.g., "health check uses origin URL to bypass CF Access") - Links to the deploy workflow and architecture doc
None blocking. The following are tracked for future refinement:
<worker>-staging naming is convention,
not enforced by tooling. A lint check (ADR-0051 §drift controls) could
enforce it.config/status-surfaces.yaml entry would
close the gap between "workflow exists" and "surface is monitored."