CI Notification Posture
Decision date: 2026-05-08 UTC Issue: #1364 Status: Active (pre-launch digest posture)
Summary
Routine CI and cron-job Slack notifications are consolidated into a single
daily digest (07:00 UTC via ci-digest-cron.yml). Per-event pings are
reserved for production incidents and security findings at CRITICAL/HIGH.
This is a pre-launch posture. At launch — when Raxx has real customers using the platform — routine surfaces flip back to per-event per the flip-back checklist in this document.
Decision
Before Raxx reaches real production users, CI/cron notifications generate noise without actionable signal. Most pings are routine successes or no-drift confirmations. The operator confirmed on 2026-05-08 UTC:
"Digest posture is the standing default until Raxx has real customers."
Condensing to a daily digest reduces per-day routine Slack posts from approximately 12+ (baseline) to 1 (digest) + per-incident pings only.
Digest surface
One Slack message at 07:00 UTC daily, via .github/workflows/ci-digest-cron.yml.
Covers the prior 24 h window:
| Section | Source |
|---|---|
| Cron job pass/fail counts | GH Actions API (/runs) per workflow file |
| Flag drift detections | drift_run_results DB rows (if CONSOLE_DATABASE_URL set) |
| Security scan status | nightly-security-scan + security-zap run conclusions |
| PR throughput (merged/open) | GH Actions API |
Per-event alert surfaces (never digested)
These surfaces post to Slack immediately regardless of digest posture:
| Surface | Trigger | Why |
|---|---|---|
| Production incidents | H12 / WORKER TIMEOUT / vault degradation tile | Customer impact; cannot wait |
| Security findings | CRITICAL or HIGH severity | Requires immediate triage |
| Nightly security scan — failed | scan job result = failure |
Missed scan = operational gap |
| Nightly security scan — skipped | scan job result = skipped/cancelled |
Missed scan = operational gap |
| ZAP scan — failed | zap-antlers or zap-api result = failure |
HIGH alerts require immediate review |
| ZAP scan — skipped (no target) | resolve-targets.has_target == false on schedule/dispatch |
Schedule miss = operational gap |
| CI (PR) — failure | CI workflow conclusion = failure |
PR gate failure blocks the team |
| CI Digest build failure | build-digest result != success |
Broken digest = blind spot |
| Flag drift detected | mismatch_count > 0 |
Unauthorized flag change on prod |
Workflows modified by this posture (#1364)
.github/workflows/slack-notify.yml
- Before: Fires on every
CIworkflow completion (success + failure). - After: Fires only on
CIworkflow conclusion =failure. - Flip-back: Remove the
conclusion == 'failure'guard from thenotifyjob'sif:clause.
.github/workflows/synthetic-gate.yml
- Before: 31 cron entries — every 30 min during market hours (13:30–20:00 UTC Mon–Fri) + hourly off-hours. ~28 scheduled runs/day.
- After: 1 anchor cron — Mon–Fri at 13:00 UTC. Failure-triggered reruns
via
workflow_dispatch. - Flip-back: Restore the dense market-hours schedule from git history
(
git show origin/main:.github/workflows/synthetic-gate.yml | grep cron).
.github/workflows/nightly-security-scan.yml
- Added:
notify-scan-statusjob — Slack alert onscanresult =failure,skipped, orcancelled. Succeeds are silent.
.github/workflows/security-zap.yml
- Added:
notify-zap-statusjob — Slack alert whenhas_target == falseon a scheduled/dispatch run (scan skipped, no target resolved), or whenzap-antlersorzap-apiresult =failure.
.github/workflows/ci-digest-cron.yml (new)
- Daily digest cron at 07:00 UTC.
- Builds body via
scripts/ci/build_ci_digest.py. - Posts ONE Slack summary per day.
Flip-back checklist (at launch)
When Raxx has real customers, execute the following to restore per-event posture:
- [ ]
slack-notify.yml— removeconclusion == 'failure'guard fromnotifyjobif:clause. Restore "fires on every CI conclusion" behaviour. - [ ]
synthetic-gate.yml— restore dense market-hours schedule. Replace the single'0 13 * * 1-5'cron with the full list from git history. - [ ]
ci-digest-cron.yml— decide whether to retire, repurpose as a weekly summary, or keep daily. Operator decision at launch. - [ ]
flag-drift-check.yml— confirm the Slack gating logic (mismatch_count > 0) is appropriate for live operations. Consider adding a clean-run notification channel if regulatory audit needs it. - [ ] Update this document: change status to "Retired — per-event posture active".
Baseline noise levels (pre-#1364)
| Source | Estimated daily Slack posts |
|---|---|
| flag-drift-check (every 4 h) | ~6 |
| synthetic-gate (every 30 min market hours) | ~14 weekdays / ~24 weekends |
| slack-notify (every PR CI conclusion) | ~5–15 (varies by PR activity) |
| nightly-security-scan (silent on success, no failure alert) | 0 (missed gap) |
| security-zap (no schedule alert) | 0 (missed gap) |
Total routine Slack posts (pre-#1364, typical weekday): ~25–35
Post-#1364: 1 digest + per-incident pings only.
Related
- Issue #1364 — implementation
.github/workflows/ci-digest-cron.yml— digest cronscripts/ci/build_ci_digest.py— digest body builderdocs/agents/onboarding.md— repo structure reference