New-Surface Deploy/Preview Convention
Status: Accepted Date: 2026-05-06 UTC Owner: software-architect Parent card: #649 ADRs: 0124 · 0053
1. Context
Raxx surfaces were bootstrapped ad hoc. Each new surface repeats the same boilerplate: CF Pages project creation, DNS CNAME, CF Access gating, GitHub Actions workflow, status-tile wiring, audit-log hookup. The work took a full day per surface. This document codifies the convention so the next surface takes 30 minutes and makes no ad-hoc decisions.
The doc covers: hosting-tier decision tree, domain attachment, workflow templates (deploy + preview), auth/access pattern, audit-log hookup, status tile registration, Sentry wiring, and the memory-note convention.
It does not prescribe exact library versions or force retrofits of existing surfaces. Surface owners retain flex room within the tier's boundary.
2. Invariants
All platform invariants apply. Surface-specific constraints:
- No credentials in workflow files. Every secret is a GitHub Environment
secret sourced from vault at runtime.
CF_PAGES_DEPLOY_TOKEN,HEROKU_API_KEY, etc. are never written into workflow YAML. - Audit row on every deploy. Every surface deploy — success and failure —
emits a row to
POST /api/internal/auditon the console's internal endpoint.continue-on-error: trueon the emit step so audit failure never blocks a deploy. - Deploy-freeze gate. Every workflow calls
.github/actions/check-deploy-freezebefore touching any environment. Break-glass override viaDEPLOY_FREEZE_OVERRIDE=1repo secret. - Status tile mandatory. Every surface gets an entry in
config/status-surfaces.yamland a health probe URL registered before the first production deploy. The console dashboard polls every entry; a missing tile is a monitoring blind spot. - Internal surfaces behind CF Access. Any surface not intentionally public is gated via a Cloudflare Access application with an email-allowlist policy. No surface is "internal by convention" — the gate must be enforced in infrastructure.
- Slack DM on prod deploy failure. Every surface workflow must send a DM
to
D0AJ7K184TVon health-check failure for production deploys.continue-on-error: trueon the Slack step.
3. Hosting Tier Classes
3.1 Decision Tree
Is the surface stateless / static / SSG output?
├─ Yes → CF Pages (Tier A)
└─ No
Is it a long-running daemon / VM workload (ticketing, self-hosted apps)?
├─ Yes → Lightsail + Terraform (Tier C)
└─ No
Does it need server-side rendering, session state, or a DB connection?
├─ Yes → Heroku (Tier B)
└─ No → re-evaluate; likely fits Tier A or needs a CF Worker (Tier A+)
3.2 Tier Definitions
| Tier | Platform | Examples | Preview support | Staging equivalent |
|---|---|---|---|---|
| A | Cloudflare Pages | getraxx.com, docs.raxx.app, status.raxx.app, internal-docs.raxx.app | Auto (branch deploys) | Preview URL per branch |
| A+ | CF Worker + D1/KV | status-worker, velvet-worker | Manual (wrangler deploy --env staging) | Separate worker name *-staging |
| B | Heroku | console.raxx.app (Raptor backend_v2) | Review Apps (#350, optional) | Separate Heroku app raxx-<surface>-staging |
| C | Lightsail + Terraform | tickets.raxx.app (FreeScout), vault.raxx.app (Infisical) | None — prod-only by design | None |
Rule of thumb: default to Tier A. Move to A+ only if edge persistence (KV/D1) is required but no SSR. Move to Tier B only when Flask/Node SSR or a persistent Postgres connection is unavoidable. Never use Tier C for new surfaces — it is reserved for third-party self-hosted apps that cannot run on managed PaaS.
4. Domain Attachment Pattern
4.1 Required DNS record
All *.raxx.app subdomains live in the Cloudflare-managed raxx.app zone
(zone ID f12dbb5cac57d5591a5058874498a6d1). Every surface needs one
CNAME record added to that zone. The DNS token scoped to DNS:Edit on
the raxx.app zone is stored in vault at /MooseQuest/cloudflare/ as
CLOUDFLARE_EDIT_DNS.
For Tier A (CF Pages): CNAME <surface>.raxx.app → <project>.pages.dev (proxied)
For Tier A+ (CF Worker): CNAME <surface>.raxx.app → <worker>.workers.dev (proxied)
For Tier B (Heroku): CNAME <surface>.raxx.app → <heroku-app>.herokuapp.com (proxied)
For Tier C (Lightsail): A <surface>.raxx.app → <static-ip> (proxied)
DNS steps are idempotent: the workflow checks for an existing record before
creating. Pattern in deploy-customer-docs.yml §"Ensure docs.raxx.app DNS CNAME".
4.2 CF Access for internal surfaces
Any surface that is not intentionally public requires a CF Access application.
Token required: CF_ACCESS_MGMT_TOKEN stored in vault at /MooseQuest/cloudflare/
as CLOUDFLARE_ACCESS_MGMT_TOKEN.
Steps:
1. List existing Access apps: GET /accounts/{id}/access/apps
2. If hostname already has an app → no-op
3. If not → POST /accounts/{id}/access/apps with:
- type: self_hosted
- session_duration: 24h
- policies[0].include: [{email: {email: "kris@moosequest.net"}}]
Pattern in deploy-internal-docs.yml §"Ensure CF Access app".
Public surfaces (getraxx.com, docs.raxx.app, status.raxx.app) skip this step and
instead emit X-Robots-Tag: index, follow in their _headers file.
4.3 CF Access for Heroku surfaces
Heroku surfaces expose an origin URL (e.g., raxx-console-prod-ff30a22abccb.herokuapp.com)
allowlisted by the CF Access bypass for /health. The health check in the deploy
workflow uses the Heroku origin URL, not the CF-fronted domain, to avoid 302
redirects from Access. Pattern in deploy-console.yml §"Post-deploy health check".
5. Deploy Workflow Template
Every deploy workflow must contain these jobs in order:
freeze-check → [build] → deploy → notify
5.1 Required jobs
freeze-check
freeze-check:
name: Deploy freeze check
runs-on: ubuntu-latest
# For Tier A: always runs (deploys are triggered from main)
# For Tier B prod-only: if: github.event_name == 'workflow_dispatch' && inputs.environment == 'production'
env:
DEPLOY_FREEZE_OVERRIDE: ${{ secrets.DEPLOY_FREEZE_OVERRIDE }}
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/check-deploy-freeze
with:
service-token-id: ${{ secrets.CF_DEPLOY_FREEZE_CLIENT_ID }}
service-token-secret: ${{ secrets.CF_DEPLOY_FREEZE_CLIENT_SECRET }}
console-url: https://console.raxx.app
build-and-deploy (Tier A) or deploy (Tier B)
Must include, in this order:
1. Notify — building (via .github/actions/notify-deploy-status)
2. Load vault secrets (via .github/actions/load-vault-secrets)
3. Build step (surface-specific)
4. Infrastructure bootstrap (idempotent: CF Pages project create, DNS CNAME, CF Access)
5. Notify — deploying
6. Deploy step (wrangler / heroku push)
7. Health check (5 retries × 10s; GET <health-url>/health must return 200)
8. Live verification gate — after health check passes, verify the custom domain root
returns 200 from a clean network (not a preview URL). Required command:
sh
curl -o /dev/null -sSL -w "%{http_code}" https://<custom-domain>/ | grep -q '^200$'
A non-200 response must fail the workflow. This step is mandatory for prod deploys.
Internal surfaces must include the CF Access service-token headers in this call.
See §9.1 for why this gate exists.
9. Emit audit event — success
10. Emit audit event — failure (on: failure())
11. Slack DM on prod health-check failure (on: failure(), prod-only)
12. Notify — succeeded / Notify — failed
5.2 Required env vars in every workflow
env:
INFISICAL_CLIENT_ID: ${{ secrets.INFISICAL_CLIENT_ID }}
INFISICAL_CLIENT_SECRET: ${{ secrets.INFISICAL_CLIENT_SECRET }}
INFISICAL_PROJECT_ID: ${{ secrets.INFISICAL_PROJECT_ID }}
CF_ACCESS_CLIENT_ID: ${{ secrets.CF_ACCESS_CLIENT_ID }}
CF_ACCESS_CLIENT_SECRET: ${{ secrets.CF_ACCESS_CLIENT_SECRET }}
5.3 Concurrency
concurrency:
group: deploy-<surface>[-${{ inputs.environment }}]
cancel-in-progress: false # never cancel an in-flight deploy
Tier B surfaces that support staging/production environments should include the
environment name in the group key (pattern from deploy-console.yml).
5.4 Audit event shape
Every workflow emits a POST /api/internal/audit with this payload shape.
The emit step is continue-on-error: true on both success and failure paths.
{
"action": "<surface>.deploy.<environment>",
"actor": "<github.actor>",
"outcome": "success | deploy_failed | health_check_failed",
"timestamp_utc": "2026-01-01T00:00:00Z",
"source_sha": "<sha>",
"target_app": "<app-name-or-project>",
"environment": "production | staging",
"trigger": "ci | manual",
"workflow_run_id": "<run-id>",
"workflow_run_url": "<run-url>"
}
Token: CONSOLE_AUDIT_INGEST_TOKEN (GitHub Environment secret).
URL: CONSOLE_INTERNAL_URL (GitHub var, defaults to the Heroku origin URL).
Pattern: deploy-console.yml §"Emit audit event — success/failure".
6. Preview Workflow Template
6.1 Tier A (CF Pages)
CF Pages auto-generates preview URLs for every branch that gets a Pages
deployment. No additional workflow required. The preview URL pattern is:
https://<branch-name>.<project>.pages.dev
To enable branch-based previews for a surface, set --branch in the wrangler
deploy command. Branches other than main automatically land in the preview
environment. Pattern in deploy-status-page.yml.
6.2 Tier B (Heroku Review Apps)
Heroku Review Apps are optional and tracked in #350. When enabled, the
app.json at the surface's Heroku root configures the review app environment.
Review apps do not emit audit events and do not gate behind CF Access.
6.3 Tier A+ and Tier C
No preview environments. Tier A+ staging is a separate named worker
(<surface>-worker-staging). Tier C has no staging counterpart by design.
7. Status Tile Registration
Every surface must have an entry in config/status-surfaces.yaml before its
first production deploy. Minimum required fields:
- id: <surface> # stable slug; never change post-deploy
hostname: <surface>.raxx.app
label: "<Human label>"
probe_url: https://<surface>.raxx.app/health # or /ping, / for static sites
probe_type: http # http | tcp | none
tier: A # A | A+ | B | C
access: public # public | internal
workflow: deploy-<surface>.yml
sentry_project: <sentry-slug> # omit if no server code
The console status poller (console/app/services/status_poller.py) reads this
YAML on startup. Adding a new entry automatically creates the tile on the
dashboard. No code change required — only the YAML update.
8. Sentry Integration
Every Tier B (Heroku) surface that runs server code must include a Sentry DSN.
- Vault path:
/MooseQuest/sentry/<surface>/SENTRY_DSN - Loaded at deploy time via
.github/actions/load-vault-secrets - Set as a Heroku config var:
heroku config:set SENTRY_DSN="$SENTRY_DSN" --app <app> >/dev/null 2>&1(stdout silenced perfeedback_heroku_config_set_echoes_secrets.md) - The console dashboard tile shows
sentry_errors_24hwhensentry_projectis set in the surface registry.
Tier A (static) surfaces do not run server code — Sentry is not applicable. Tier A+ (Worker) surfaces may use Sentry's Cloudflare integration; treated as optional for now.
9. Rollout Plan
A new surface follows this sequence:
Dark → Flag → Beta → GA
| Phase | What happens |
|---|---|
| Dark | Scaffold created (workflow, status entry, DNS). Surface not deployed. |
| Flag | First deploy to staging (or CF Pages preview). Access restricted. Operator validates. |
| Beta | Production deployed. CF Access or invite-only access gate active. |
| GA | Public (or fully-internal): Access gate lifted for public surfaces; CF Access confirmed for internal. |
9.1 Surface launch acceptance criteria
A new-surface task is not complete until every item below is checked. These
gates apply at the Beta → GA transition (or at the Flag phase for internal-only
surfaces). RCA: docs/ops/incidents/2026-05-08-getraxx-403.md — getraxx.com was
in a 403 state for ~15 days because React components were committed with no live
HTTP check.
- [ ] HTTP-200 live gate.
curl -o /dev/null -sSL -w "%{http_code}" https://<custom-domain>/returns200from a clean network (not a preview URL, not a branch deploy URL, not the Heroku origin URL). Must be run by the person marking the task done, not just by CI. Internal surfaces: passCF-Access-Client-IdandCF-Access-Client-Secretheaders. - [ ] Status tile registered.
config/status-surfaces.yamlhas an entry for the surface with the correctprobe_url. This entry is a hard prerequisite — it must exist before any production deploy fires (invariant 4). A missing tile is a monitoring blind spot. - [ ] Health endpoint responds.
curl -o /dev/null -sSL -w "%{http_code}" https://<custom-domain>/health(or/ping, or/for static sites) returns200. This is distinct from the root-path check above — both must pass. - [ ] CF Access gate confirmed (internal surfaces only). Accessing the surface without
credentials returns 403 or redirects to the Access login page. Confirm with an
incognito browser or
curlwithout auth headers. - [ ] PR description records the live check. The PR that ships the surface must include
the output of the
curlcommand above in the "How to test" section. A PR without this output has not satisfied the gate.
Include these checklist items verbatim in every new-surface task body and in the PR description for the deploy PR. The task is not closeable until the checklist is fully checked.
10. Security Considerations
- No secret ever travels through a workflow YAML file. Vault is the only
source. The
load-vault-secretsaction is the only allowed mechanism. - CF Access gates are infrastructure-level, not application-level. An internal surface without a CF Access app is a misconfiguration regardless of any login wall inside the surface.
- Heroku stdout must never log secrets.
heroku config:setis always silenced with>/dev/null 2>&1. - The deploy freeze mechanism is the kill switch for all deploy paths.
Break-glass:
DEPLOY_FREEZE_OVERRIDE=1repo secret freezes all deploys even when the console is unreachable. - Audit rows capture actor, SHA, and outcome for every deploy. Retention
governed by the console's audit log retention policy (90 days, per
workflow-uuid-tracing.md).
11. Memory-Note Convention
When a new surface is significant enough to persist across sessions, add a memory entry at:
.claude/agents/memory/project_<surface>.md
Minimum content: - Surface name, URL, hosting tier - Key vault paths used by the surface - Any operational quirks (e.g., "health check uses origin URL to bypass CF Access") - Links to the deploy workflow and architecture doc
12. Open Questions
None blocking. The following are tracked for future refinement:
- #350 — Heroku Review Apps: not yet enabled for any Tier B surface. The convention treats them as optional.
- CF Worker staging pattern —
<worker>-stagingnaming is convention, not enforced by tooling. A lint check (ADR-0051 §drift controls) could enforce it. - Automated status-tile validation — a CI lint job that verifies every
workflow file has a corresponding
config/status-surfaces.yamlentry would close the gap between "workflow exists" and "surface is monitored."