Slack notification policy runbook
System: GitHub Actions — Slack notification gate Owner: sre-agent Last reviewed: 2026-06-18
Policy summary
Slack notifications from CI/CD workflows are gated by a single repo-level
Actions variable: SLACK_NOTIFY_LEVEL.
| Variable value | Effect |
|---|---|
(not set / any value except all) |
Only loud-tier paths fire (see below) |
all |
Every Slack step fires as originally authored |
The variable is intentionally unset by default post-gate deployment, which means the silenced tier produces no noise.
How to revert (re-enable all notifications)
- Go to the repo Settings → Secrets and variables → Actions → Variables tab.
- Create or update
SLACK_NOTIFY_LEVELwith valueall. - Effective immediately on next workflow trigger — no deploy needed.
To silence again: delete the variable or set it to any other value.
Loud tier (always fires, gate does NOT apply)
These paths are never gated and remain loud regardless of SLACK_NOTIFY_LEVEL:
| Workflow | What it posts | Channel / target |
|---|---|---|
nightly-security-scan.yml |
Alert when the scan job itself fails, is skipped, or cancelled (operational gap — a dark scan cannot be silenced) | SLACK_WEBHOOK_URL |
security-zap.yml |
Alert when ZAP scan job fails or is skipped | SLACK_WEBHOOK_URL |
synth-probe-waitlist.yml |
SEV2 alert when the prod waitlist API probe fails (customer-facing lead capture) | SLACK_WEBHOOK_URL → #raxx-ops-alert-sev2 |
bcp-vault-snapshot-daily.yml |
Alert when the vault snapshot (BCP Win 3) fails — explicitly marked pageable in the workflow | SLACK_WEBHOOK_URL |
deploy-console.yml (health check step only) |
Immediate DM when the console prod health check fails post-deploy | SLACK_BOT_TOKEN → D0AJ7K184TV |
Silenced tier (gated by SLACK_NOTIFY_LEVEL == 'all')
| Workflow | What was being posted | Channel / target |
|---|---|---|
slack-notify.yml |
CI failure on main | SLACK_WEBHOOK_URL |
ci-digest-cron.yml |
Daily CI digest | SLACK_WEBHOOK_URL |
bcp-smoke-monthly.yml |
Monthly BCP smoke failure | SLACK_WEBHOOK_URL |
daily-bot-token-smoke.yml |
Bot token smoke failure | SLACK_WEBHOOK_URL |
daily-card-groomer.yml |
Groomer completion DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
deploy-antlers-cutover.yml |
Cutover + rollback outcome | SLACK_WEBHOOK_URL |
deploy-antlers-next-prod.yml |
Antlers prod deploy failure | SLACK_WEBHOOK_URL |
deploy-antlers-next-staging.yml |
Antlers staging deploy failure | SLACK_WEBHOOK_URL |
deploy-console.yml (notify job, DM step) |
Console prod deploy outcome DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
deploy-customer-docs.yml |
Docs health check failure DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
deploy-failure-streak-alert.yml |
Deploy failure streak alert | SLACK_WEBHOOK_URL → #raxx-ops-alert-sev2-5 |
deploy-heroku.yml |
Staging SEV-3 alert | SLACK_WEBHOOK_URL |
deploy-queue-failure-monitor.yml |
Queue deploy streak alert | SLACK_WEBHOOK_URL → #raxx-ops-alert-sev2-5 |
deploy-queue.yml |
Queue deploy outcome DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
deploy-velvet.yml |
Velvet deploy outcome DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
drift-orchestrator-cron.yml |
Drift reconciler failure DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
flag-drift-check.yml |
Flag drift detection (embedded in Python) | SLACK_WEBHOOK_URL / SLACK_BOT_TOKEN |
freescout-apply.yml |
Terraform apply failure DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
freescout-backup.yml |
FreeScout backup failure | SLACK_WEBHOOK_URL |
heroku-config-health-nightly.yml |
Config health critical findings | SLACK_WEBHOOK_URL |
queue-zero-dyno-monitor.yml |
Zero-dyno alert | SLACK_WEBHOOK_URL → #raxx-ops-alert-sev2-5 |
synthetic-gate.yml |
Staging synthetic check failure | SLACK_WEBHOOK_URL |
terraform-email-delivery-stack.yml |
TF plan/apply failure DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
tickets-e2e-smoke.yml |
Tickets e2e smoke failure | SLACK_WEBHOOK_URL |
waf-synthetic-probe.yml |
WAF probe failure DM | SLACK_BOT_TOKEN → D0AJ7K184TV |
Workflows with no Slack posting (no change needed)
| Workflow | Note |
|---|---|
alembic-version-cron.yml |
Comment reference only — no Slack post step |
console-degraded-auto-file.yml |
GH issues only — no Slack post |
launch-readiness-check.yml |
Step summary only — no Slack post |
mbt-assignment-at-expiry-cron.yml |
Sentry + digest only — no Slack post |
mbt-drift-daily.yml |
Digest only — no Slack post |
mbt-drift-per-symbol-weekly.yml |
Digest only — no Slack post |
mbt-resting-orders-cron.yml |
Sentry + digest only — no Slack post |
freescout-apply.yml |
TF_VAR_license_slack is a Terraform input, not a Slack post |
ci-pr.yml |
Reference to slack-notify workflow path in step summary only |
Implementation notes
- Gate is applied at the individual step
if:level (not job level) wherever possible, to preserve job-level transitive-skip guards (feedback_gh_actions_transitive_skip). flag-drift-check.ymluses an embedded Python call for Slack. The gate is applied viaSLACK_NOTIFY_LEVELenv var passed to the step and checked in the Python code.- Workflow cron/logic is fully intact — only the Slack posting side-effect is gated.
Escalation
If a silenced path represents a genuine prod incident, escalate via:
1. GH Actions run URL (always visible in the Actions tab)
2. ops@raxx.app (Postmark alerts remain active where configured)
3. Set SLACK_NOTIFY_LEVEL=all to re-enable all paths during a high-tempo incident window