Status: Accepted
Date: 2026-05-06 UTC
Refs: #649, ADR-0052, docs/architecture/new-surface-convention.md
Existing deploy workflows (deploy-console.yml, deploy-customer-docs.yml,
deploy-internal-docs.yml, deploy-status-page.yml) converged organically on
a shared structure: freeze-check → build → deploy → health-check → audit emit
→ notify. But each workflow re-implements that structure from scratch. When a
step was improved in one workflow (e.g., the Slack DM on health-check failure
landing in deploy-console.yml) it was not propagated to others.
We need to codify the required job sequence so new surfaces get it right the first time and so deviations are visible rather than silent.
Every deploy workflow for a new surface must contain these steps, in this order, with the documented behavior:
.github/actions/check-deploy-freeze. Runs before any
build or deploy step. On Tier B prod-only surfaces, skipped for staging..github/actions/notify-deploy-status with
status: building. Fires at start of build-and-deploy job..github/actions/load-vault-secrets. No secrets
from environment variables or hardcoded values..github/actions/notify-deploy-status with
status: deploying.GET <health-url>/health → 200.
Stale data annotations do not fail the gate (pattern from deploy-console.yml).continue-on-error: true. POST to console
internal audit endpoint.if: failure(), continue-on-error: true.if: failure(), prod-only,
continue-on-error: true. Channel D0AJ7K184TV..github/actions/notify-deploy-status.The scaffold script generates this structure from a template. Deviations from the order require a comment explaining the reason.
console_deploy_id input (for callback tracking) is included in every
workflow even if the surface is not yet wired to the console deploy UI — it
is a no-op when empty, and its presence means wiring is a one-line change.continue-on-error: true so a console outage
during a deploy does not block the deploy itself.Reusable workflow (workflow_call): A single reusable workflow would
enforce the structure mechanically. Rejected because the build step is too
surface-specific to parameterise cleanly today. A future ADR could revisit
this if the number of surfaces grows to 10+.
Composite action for audit emit: The audit emit block (30 lines of Python) is duplicated across workflows. A composite action would DRY it. Deferred to a separate card — the Python inline block is readable and the duplication is contained to one block per workflow.
Skip freeze-check for Tier A surfaces: CF Pages deploys are lower risk than Heroku deploys, so the freeze check adds latency without proportional value. Rejected: the freeze gate is an ops-wide invariant, not risk-weighted. A frozen deploy window means all surfaces are frozen.