CI Notification Posture

Decision date: 2026-05-08 UTC Issue: #1364 Status: Active (pre-launch digest posture)

Summary

Routine CI and cron-job Slack notifications are consolidated into a single daily digest (07:00 UTC via ci-digest-cron.yml). Per-event pings are reserved for production incidents and security findings at CRITICAL/HIGH.

This is a pre-launch posture. At launch — when Raxx has real customers using the platform — routine surfaces flip back to per-event per the flip-back checklist in this document.

Decision

Before Raxx reaches real production users, CI/cron notifications generate noise without actionable signal. Most pings are routine successes or no-drift confirmations. The operator confirmed on 2026-05-08 UTC:

"Digest posture is the standing default until Raxx has real customers."

Condensing to a daily digest reduces per-day routine Slack posts from approximately 12+ (baseline) to 1 (digest) + per-incident pings only.

Digest surface

One Slack message at 07:00 UTC daily, via .github/workflows/ci-digest-cron.yml.

Covers the prior 24 h window:

Section	Source
Cron job pass/fail counts	GH Actions API (`/runs`) per workflow file
Flag drift detections	`drift_run_results` DB rows (if `CONSOLE_DATABASE_URL` set)
Security scan status	nightly-security-scan + security-zap run conclusions
PR throughput (merged/open)	GH Actions API

Per-event alert surfaces (never digested)

These surfaces post to Slack immediately regardless of digest posture:

Surface	Trigger	Why
Production incidents	H12 / WORKER TIMEOUT / vault degradation tile	Customer impact; cannot wait
Security findings	CRITICAL or HIGH severity	Requires immediate triage
Nightly security scan — failed	`scan` job result = `failure`	Missed scan = operational gap
Nightly security scan — skipped	`scan` job result = `skipped`/`cancelled`	Missed scan = operational gap
ZAP scan — failed	`zap-antlers` or `zap-api` result = `failure`	HIGH alerts require immediate review
ZAP scan — skipped (no target)	`resolve-targets.has_target == false` on schedule/dispatch	Schedule miss = operational gap
CI (PR) — failure	`CI` workflow conclusion = `failure`	PR gate failure blocks the team
CI Digest build failure	`build-digest` result != `success`	Broken digest = blind spot
Flag drift detected	`mismatch_count > 0`	Unauthorized flag change on prod

Workflows modified by this posture (#1364)

`.github/workflows/slack-notify.yml`

Before: Fires on every CI workflow completion (success + failure).
After: Fires only on CI workflow conclusion = failure.
Flip-back: Remove the conclusion == 'failure' guard from the notify job's if: clause.

`.github/workflows/synthetic-gate.yml`

Before: 31 cron entries — every 30 min during market hours (13:30–20:00 UTC Mon–Fri) + hourly off-hours. ~28 scheduled runs/day.
After: 1 anchor cron — Mon–Fri at 13:00 UTC. Failure-triggered reruns via workflow_dispatch.
Flip-back: Restore the dense market-hours schedule from git history (git show origin/main:.github/workflows/synthetic-gate.yml | grep cron).

`.github/workflows/nightly-security-scan.yml`

Added: notify-scan-status job — Slack alert on scan result = failure, skipped, or cancelled. Succeeds are silent.

`.github/workflows/security-zap.yml`

Added: notify-zap-status job — Slack alert when has_target == false on a scheduled/dispatch run (scan skipped, no target resolved), or when zap-antlers or zap-api result = failure.

`.github/workflows/ci-digest-cron.yml` (new)

Daily digest cron at 07:00 UTC.
Builds body via scripts/ci/build_ci_digest.py.
Posts ONE Slack summary per day.

Flip-back checklist (at launch)

When Raxx has real customers, execute the following to restore per-event posture:

[ ] slack-notify.yml — remove conclusion == 'failure' guard from notify job if: clause. Restore "fires on every CI conclusion" behaviour.
[ ] synthetic-gate.yml — restore dense market-hours schedule. Replace the single '0 13 * * 1-5' cron with the full list from git history.
[ ] ci-digest-cron.yml — decide whether to retire, repurpose as a weekly summary, or keep daily. Operator decision at launch.
[ ] flag-drift-check.yml — confirm the Slack gating logic (mismatch_count > 0) is appropriate for live operations. Consider adding a clean-run notification channel if regulatory audit needs it.
[ ] Update this document: change status to "Retired — per-event posture active".

Baseline noise levels (pre-#1364)

Source	Estimated daily Slack posts
flag-drift-check (every 4 h)	~6
synthetic-gate (every 30 min market hours)	~14 weekdays / ~24 weekends
slack-notify (every PR CI conclusion)	~5–15 (varies by PR activity)
nightly-security-scan (silent on success, no failure alert)	0 (missed gap)
security-zap (no schedule alert)	0 (missed gap)

Total routine Slack posts (pre-#1364, typical weekday): ~25–35

Post-#1364: 1 digest + per-incident pings only.

Issue #1364 — implementation
.github/workflows/ci-digest-cron.yml — digest cron
scripts/ci/build_ci_digest.py — digest body builder
docs/agents/onboarding.md — repo structure reference

CI Notification Posture

Summary

Decision

Digest surface

Per-event alert surfaces (never digested)

Workflows modified by this posture (#1364)

.github/workflows/slack-notify.yml

.github/workflows/synthetic-gate.yml

.github/workflows/nightly-security-scan.yml

.github/workflows/security-zap.yml

.github/workflows/ci-digest-cron.yml (new)