Raxx · internal docs

internal · gated

RCA — CI baseline degradation blocking merge queue (5 PRs, 5 failure classes)

Incident ID: 2026-05-13-ci-baseline-degradation-merge-queue Date: 2026-05-13 Severity: SEV-2 Duration: ~27h total (first blocked PR opened 2026-05-13T16:46Z → fix PRs open 2026-05-13T20:15Z; merge pending CI) Blast radius: All 5 open PRs blocked (#1998, #1999, #2000, #2001, #2002); no user-facing production impact Author: sre-agent

Summary

Five CI failure classes accumulated on main over several weeks, all pre-existing before any of the five blocked PRs were opened. With 9 days to v1 launch (2026-05-23 UTC), the blocked merge queue was classified SEV-2. Root causes: gitleaks FP allowlist missed three subdirectory path classes and two pre-restructure test paths; the backend-tests-postgres job was never updated to create the raptor_app database role that ci-pr.yml had already patched; the Velvet editable install fails under newer setuptools due to package_dir = velvet = . indirection; console tests transitively import numpy through the Raptor app factory; and a test-local session expiry used modulo arithmetic that produces past timestamps after 16:00 UTC. Two fix PRs (#2004 gitleaks, #2006 CI fixes) address all five classes.

Timeline (all times UTC)

Impact

What went well

What didn't go well

Root cause analysis

Detection

Resolution

Action items

# Action Owner Due Issue
1 Add a lint-rule or comment-gate requiring both ci.yml and ci-pr.yml Postgres setup sections to be updated atomically when a new migration adds a GRANT operator 2026-05-20 file after #2006 merges
2 Add gitleaks CI step that also runs on push to main (not just PRs) so FP accumulation is caught at merge time, not blocking the next PR sre-agent 2026-05-20 file after #2006 merges
3 Pin setuptools version in the Velvet CI job (or add a pip install setuptools>=X guard) to catch setuptools regressions in a dedicated dependency-audit PR rather than breaking smoke tests operator 2026-05-20 file after #2006 merges
4 Add timedelta as a required import to the console test fixture linting pass (advisory ruff rule) to prevent future clock-arithmetic bugs in test session helpers operator 2026-05-23 file after #2006 merges
5 Merge #2004 (gitleaks), then #2006 (CI fixes), then rebase and merge blocked PRs in sequence: #2002, #2001, #1998, #1999, #2000 operator 2026-05-13

References