Raxx · internal docs

internal · gated ↑ index

New-Surface Deploy/Preview Convention

Status: Accepted Date: 2026-05-06 UTC Owner: software-architect Parent card: #649 ADRs: 0052 · 0053

1. Context

Raxx surfaces were bootstrapped ad hoc. Each new surface repeats the same boilerplate: CF Pages project creation, DNS CNAME, CF Access gating, GitHub Actions workflow, status-tile wiring, audit-log hookup. The work took a full day per surface. This document codifies the convention so the next surface takes 30 minutes and makes no ad-hoc decisions.

The doc covers: hosting-tier decision tree, domain attachment, workflow templates (deploy + preview), auth/access pattern, audit-log hookup, status tile registration, Sentry wiring, and the memory-note convention.

It does not prescribe exact library versions or force retrofits of existing surfaces. Surface owners retain flex room within the tier's boundary.

2. Invariants

All platform invariants apply. Surface-specific constraints:

No credentials in workflow files. Every secret is a GitHub Environment secret sourced from vault at runtime. CF_PAGES_DEPLOY_TOKEN, HEROKU_API_KEY, etc. are never written into workflow YAML.
Audit row on every deploy. Every surface deploy — success and failure — emits a row to POST /api/internal/audit on the console's internal endpoint. continue-on-error: true on the emit step so audit failure never blocks a deploy.
Deploy-freeze gate. Every workflow calls .github/actions/check-deploy-freeze before touching any environment. Break-glass override via DEPLOY_FREEZE_OVERRIDE=1 repo secret.
Status tile mandatory. Every surface gets an entry in config/status-surfaces.yaml and a health probe URL registered before the first production deploy. The console dashboard polls every entry; a missing tile is a monitoring blind spot.
Internal surfaces behind CF Access. Any surface not intentionally public is gated via a Cloudflare Access application with an email-allowlist policy. No surface is "internal by convention" — the gate must be enforced in infrastructure.
Slack DM on prod deploy failure. Every surface workflow must send a DM to D0AJ7K184TV on health-check failure for production deploys. continue-on-error: true on the Slack step.

3. Hosting Tier Classes

3.1 Decision Tree

Is the surface stateless / static / SSG output?
  ├─ Yes → CF Pages (Tier A)
  └─ No
      Is it a long-running daemon / VM workload (ticketing, self-hosted apps)?
        ├─ Yes → Lightsail + Terraform (Tier C)
        └─ No
            Does it need server-side rendering, session state, or a DB connection?
              ├─ Yes → Heroku (Tier B)
              └─ No → re-evaluate; likely fits Tier A or needs a CF Worker (Tier A+)

3.2 Tier Definitions

Tier	Platform	Examples	Preview support	Staging equivalent
A	Cloudflare Pages	getraxx.com, docs.raxx.app, status.raxx.app, internal-docs.raxx.app	Auto (branch deploys)	Preview URL per branch
A+	CF Worker + D1/KV	status-worker, velvet-worker	Manual (wrangler deploy --env staging)	Separate worker name `*-staging`
B	Heroku	console.raxx.app (Raptor backend_v2)	Review Apps (#350, optional)	Separate Heroku app `raxx-<surface>-staging`
C	Lightsail + Terraform	tickets.raxx.app (FreeScout), vault.raxx.app (Infisical)	None — prod-only by design	None

Rule of thumb: default to Tier A. Move to A+ only if edge persistence (KV/D1) is required but no SSR. Move to Tier B only when Flask/Node SSR or a persistent Postgres connection is unavoidable. Never use Tier C for new surfaces — it is reserved for third-party self-hosted apps that cannot run on managed PaaS.

4. Domain Attachment Pattern

4.1 Required DNS record

All *.raxx.app subdomains live in the Cloudflare-managed raxx.app zone (zone ID f12dbb5cac57d5591a5058874498a6d1). Every surface needs one CNAME record added to that zone. The DNS token scoped to DNS:Edit on the raxx.app zone is stored in vault at /MooseQuest/cloudflare/ as CLOUDFLARE_EDIT_DNS.

For Tier A (CF Pages): CNAME <surface>.raxx.app → <project>.pages.dev (proxied) For Tier A+ (CF Worker): CNAME <surface>.raxx.app → <worker>.workers.dev (proxied) For Tier B (Heroku): CNAME <surface>.raxx.app → <heroku-app>.herokuapp.com (proxied) For Tier C (Lightsail): A <surface>.raxx.app → <static-ip> (proxied)

DNS steps are idempotent: the workflow checks for an existing record before creating. Pattern in deploy-customer-docs.yml §"Ensure docs.raxx.app DNS CNAME".

4.2 CF Access for internal surfaces

Any surface that is not intentionally public requires a CF Access application. Token required: CF_ACCESS_MGMT_TOKEN stored in vault at /MooseQuest/cloudflare/ as CLOUDFLARE_ACCESS_MGMT_TOKEN.

Steps: 1. List existing Access apps: GET /accounts/{id}/access/apps 2. If hostname already has an app → no-op 3. If not → POST /accounts/{id}/access/apps with: - type: self_hosted - session_duration: 24h - policies[0].include: [{email: {email: "kris@moosequest.net"}}]

Pattern in deploy-internal-docs.yml §"Ensure CF Access app".

Public surfaces (getraxx.com, docs.raxx.app, status.raxx.app) skip this step and instead emit X-Robots-Tag: index, follow in their _headers file.

4.3 CF Access for Heroku surfaces

Heroku surfaces expose an origin URL (e.g., raxx-console-prod-ff30a22abccb.herokuapp.com) allowlisted by the CF Access bypass for /health. The health check in the deploy workflow uses the Heroku origin URL, not the CF-fronted domain, to avoid 302 redirects from Access. Pattern in deploy-console.yml §"Post-deploy health check".

5. Deploy Workflow Template

Every deploy workflow must contain these jobs in order:

freeze-check → [build] → deploy → notify

5.1 Required jobs

freeze-check

freeze-check:
  name: Deploy freeze check
  runs-on: ubuntu-latest
  # For Tier A: always runs (deploys are triggered from main)
  # For Tier B prod-only: if: github.event_name == 'workflow_dispatch' && inputs.environment == 'production'
  env:
    DEPLOY_FREEZE_OVERRIDE: ${{ secrets.DEPLOY_FREEZE_OVERRIDE }}
  steps:
    - uses: actions/checkout@v4
    - uses: ./.github/actions/check-deploy-freeze
      with:
        service-token-id: ${{ secrets.CF_DEPLOY_FREEZE_CLIENT_ID }}
        service-token-secret: ${{ secrets.CF_DEPLOY_FREEZE_CLIENT_SECRET }}
        console-url: https://console.raxx.app

build-and-deploy (Tier A) or deploy (Tier B)

Must include, in this order: 1. Notify — building (via .github/actions/notify-deploy-status) 2. Load vault secrets (via .github/actions/load-vault-secrets) 3. Build step (surface-specific) 4. Infrastructure bootstrap (idempotent: CF Pages project create, DNS CNAME, CF Access) 5. Notify — deploying 6. Deploy step (wrangler / heroku push) 7. Health check (5 retries × 10s; GET <health-url>/health must return 200) 8. Emit audit event — success 9. Emit audit event — failure (on: failure()) 10. Slack DM on prod health-check failure (on: failure(), prod-only) 11. Notify — succeeded / Notify — failed

5.2 Required env vars in every workflow

env:
  INFISICAL_CLIENT_ID: ${{ secrets.INFISICAL_CLIENT_ID }}
  INFISICAL_CLIENT_SECRET: ${{ secrets.INFISICAL_CLIENT_SECRET }}
  INFISICAL_PROJECT_ID: ${{ secrets.INFISICAL_PROJECT_ID }}
  CF_ACCESS_CLIENT_ID: ${{ secrets.CF_ACCESS_CLIENT_ID }}
  CF_ACCESS_CLIENT_SECRET: ${{ secrets.CF_ACCESS_CLIENT_SECRET }}

5.3 Concurrency

concurrency:
  group: deploy-<surface>[-${{ inputs.environment }}]
  cancel-in-progress: false   # never cancel an in-flight deploy

Tier B surfaces that support staging/production environments should include the environment name in the group key (pattern from deploy-console.yml).

5.4 Audit event shape

Every workflow emits a POST /api/internal/audit with this payload shape. The emit step is continue-on-error: true on both success and failure paths.

{
  "action": "<surface>.deploy.<environment>",
  "actor": "<github.actor>",
  "outcome": "success | deploy_failed | health_check_failed",
  "timestamp_utc": "2026-01-01T00:00:00Z",
  "source_sha": "<sha>",
  "target_app": "<app-name-or-project>",
  "environment": "production | staging",
  "trigger": "ci | manual",
  "workflow_run_id": "<run-id>",
  "workflow_run_url": "<run-url>"
}

Token: CONSOLE_AUDIT_INGEST_TOKEN (GitHub Environment secret). URL: CONSOLE_INTERNAL_URL (GitHub var, defaults to the Heroku origin URL). Pattern: deploy-console.yml §"Emit audit event — success/failure".

6. Preview Workflow Template

6.1 Tier A (CF Pages)

CF Pages auto-generates preview URLs for every branch that gets a Pages deployment. No additional workflow required. The preview URL pattern is: https://<branch-name>.<project>.pages.dev

To enable branch-based previews for a surface, set --branch in the wrangler deploy command. Branches other than main automatically land in the preview environment. Pattern in deploy-status-page.yml.

6.2 Tier B (Heroku Review Apps)

Heroku Review Apps are optional and tracked in #350. When enabled, the app.json at the surface's Heroku root configures the review app environment. Review apps do not emit audit events and do not gate behind CF Access.

6.3 Tier A+ and Tier C

No preview environments. Tier A+ staging is a separate named worker (<surface>-worker-staging). Tier C has no staging counterpart by design.

7. Status Tile Registration

Every surface must have an entry in config/status-surfaces.yaml before its first production deploy. Minimum required fields:

- id: <surface>          # stable slug; never change post-deploy
  hostname: <surface>.raxx.app
  label: "<Human label>"
  probe_url: https://<surface>.raxx.app/health   # or /ping, / for static sites
  probe_type: http                               # http | tcp | none
  tier: A                                        # A | A+ | B | C
  access: public                                 # public | internal
  workflow: deploy-<surface>.yml
  sentry_project: <sentry-slug>                  # omit if no server code

The console status poller (console/app/services/status_poller.py) reads this YAML on startup. Adding a new entry automatically creates the tile on the dashboard. No code change required — only the YAML update.

8. Sentry Integration

Every Tier B (Heroku) surface that runs server code must include a Sentry DSN.

Vault path: /MooseQuest/sentry/<surface>/SENTRY_DSN
Loaded at deploy time via .github/actions/load-vault-secrets
Set as a Heroku config var: heroku config:set SENTRY_DSN="$SENTRY_DSN" --app <app> >/dev/null 2>&1 (stdout silenced per feedback_heroku_config_set_echoes_secrets.md)
The console dashboard tile shows sentry_errors_24h when sentry_project is set in the surface registry.

Tier A (static) surfaces do not run server code — Sentry is not applicable. Tier A+ (Worker) surfaces may use Sentry's Cloudflare integration; treated as optional for now.

9. Rollout Plan

A new surface follows this sequence:

Dark → Flag → Beta → GA

Phase	What happens
Dark	Scaffold created (workflow, status entry, DNS). Surface not deployed.
Flag	First deploy to staging (or CF Pages preview). Access restricted. Operator validates.
Beta	Production deployed. CF Access or invite-only access gate active.
GA	Public (or fully-internal): Access gate lifted for public surfaces; CF Access confirmed for internal.

10. Security Considerations

No secret ever travels through a workflow YAML file. Vault is the only source. The load-vault-secrets action is the only allowed mechanism.
CF Access gates are infrastructure-level, not application-level. An internal surface without a CF Access app is a misconfiguration regardless of any login wall inside the surface.
Heroku stdout must never log secrets. heroku config:set is always silenced with >/dev/null 2>&1.
The deploy freeze mechanism is the kill switch for all deploy paths. Break-glass: DEPLOY_FREEZE_OVERRIDE=1 repo secret freezes all deploys even when the console is unreachable.
Audit rows capture actor, SHA, and outcome for every deploy. Retention governed by the console's audit log retention policy (90 days, per workflow-uuid-tracing.md).

11. Memory-Note Convention

When a new surface is significant enough to persist across sessions, add a memory entry at:

.claude/agents/memory/project_<surface>.md

Minimum content: - Surface name, URL, hosting tier - Key vault paths used by the surface - Any operational quirks (e.g., "health check uses origin URL to bypass CF Access") - Links to the deploy workflow and architecture doc

12. Open Questions

None blocking. The following are tracked for future refinement:

#350 — Heroku Review Apps: not yet enabled for any Tier B surface. The convention treats them as optional.
CF Worker staging pattern — <worker>-staging naming is convention, not enforced by tooling. A lint check (ADR-0051 §drift controls) could enforce it.
Automated status-tile validation — a CI lint job that verifies every workflow file has a corresponding config/status-surfaces.yaml entry would close the gap between "workflow exists" and "surface is monitored."

Auto-generated from docs/ in raxx-app/TradeMasterAPI. Gated behind Cloudflare Access. Re-deployed on every push to main.