ADR 0053 — New-surface deploy workflow template structure
Status: Accepted
Date: 2026-05-06 UTC
Refs: #649, ADR-0052, docs/architecture/new-surface-convention.md
Context
Existing deploy workflows (deploy-console.yml, deploy-customer-docs.yml,
deploy-internal-docs.yml, deploy-status-page.yml) converged organically on
a shared structure: freeze-check → build → deploy → health-check → audit emit
→ notify. But each workflow re-implements that structure from scratch. When a
step was improved in one workflow (e.g., the Slack DM on health-check failure
landing in deploy-console.yml) it was not propagated to others.
We need to codify the required job sequence so new surfaces get it right the first time and so deviations are visible rather than silent.
Decision
Every deploy workflow for a new surface must contain these steps, in this order, with the documented behavior:
- freeze-check —
.github/actions/check-deploy-freeze. Runs before any build or deploy step. On Tier B prod-only surfaces, skipped for staging. - Notify building —
.github/actions/notify-deploy-statuswithstatus: building. Fires at start of build-and-deploy job. - Load vault secrets —
.github/actions/load-vault-secrets. No secrets from environment variables or hardcoded values. - Build — surface-specific. Fails fast; failing build blocks deploy.
- Infrastructure bootstrap — idempotent steps for CF Pages project creation, DNS CNAME, CF Access (if internal). Safe to re-run.
- Notify deploying —
.github/actions/notify-deploy-statuswithstatus: deploying. - Deploy — wrangler (Tier A/A+) or heroku git push (Tier B).
- Health check — 5 retries × 10 s.
GET <health-url>/health→ 200. Stale data annotations do not fail the gate (pattern from deploy-console.yml). - Emit audit event — success —
continue-on-error: true. POST to console internal audit endpoint. - Emit audit event — failure —
if: failure(),continue-on-error: true. - Slack DM on prod health-check failure —
if: failure(), prod-only,continue-on-error: true. ChannelD0AJ7K184TV. - Notify succeeded / failed —
.github/actions/notify-deploy-status.
The scaffold script generates this structure from a template. Deviations from the order require a comment explaining the reason.
Consequences
- New surfaces start with the full audit/notify/freeze pattern rather than discovering it incrementally.
- The
console_deploy_idinput (for callback tracking) is included in every workflow even if the surface is not yet wired to the console deploy UI — it is a no-op when empty, and its presence means wiring is a one-line change. - The audit emit step uses
continue-on-error: trueso a console outage during a deploy does not block the deploy itself. - Existing workflows are not required to be retrofitted; this ADR governs new surfaces.
Alternatives Considered
Reusable workflow (workflow_call): A single reusable workflow would
enforce the structure mechanically. Rejected because the build step is too
surface-specific to parameterise cleanly today. A future ADR could revisit
this if the number of surfaces grows to 10+.
Composite action for audit emit: The audit emit block (30 lines of Python) is duplicated across workflows. A composite action would DRY it. Deferred to a separate card — the Python inline block is readable and the duplication is contained to one block per workflow.
Skip freeze-check for Tier A surfaces: CF Pages deploys are lower risk than Heroku deploys, so the freeze check adds latency without proportional value. Rejected: the freeze gate is an ops-wide invariant, not risk-weighted. A frozen deploy window means all surfaces are frozen.