Raxx · internal docs

internal · gated

RCA — Queue service CI blocked by three layered C++ build failures

Incident ID: 2026-06-17-queue-ci-cpp-build-failures Date: 2026-06-17 Severity: SEV-2 Duration: ~75 minutes detection to resolution (across session) Blast radius: Every push to main that touched queue/ was blocked from deploying; raxx-queue-prod had zero dynos running; Queue billing go-live gated. Author: sre-agent

Summary

Three layered C++ compilation and test failures blocked every Queue CI run. The first failure (CMake duplicate imported target) was introduced when the WAF/origin-guard feature was merged without fixing a pre-existing issue where tests/unit/CMakeLists.txt called find_package(Drogon) a second time after the root CMakeLists.txt already called it. The second and third failures were Drogon API mismatches in two test files: getContentTypeString() (a non-existent getter), then getHeader("Content-Type") (which is always empty for enum-typed content types in Drogon 1.9.13 unit tests), resolved by using getContentType() which returns the internal ContentType enum directly.

Timeline (all times UTC)

Impact

What went well

What didn't go well

Root cause analysis

Detection

Resolution

Action items

# Action Owner Due Issue
1 Write Queue service runbook at docs/ops/runbooks/queue.md covering CI failure modes, deploy process, and go-live checklist sre-agent 2026-06-18 (this incident)
2 Verify deploy-queue-failure-monitor.yml and queue-zero-dyno-monitor.yml are active and would have paged on this failure streak sre-agent 2026-06-18 (this incident)
3 Add Drogon API note to docs/ops/runbooks/queue.md: setContentTypeCode() vs getContentType() distinction for unit test authors sre-agent 2026-06-18 (this incident)
4 Update Node.js 20 action versions (actions/cache@v4 → v5, actions/checkout@v4 → v5, actions/github-script@v7 → v7+) before Sept 16, 2026 forced cutover sre-agent 2026-08-01 SEV-4 drift
5 Ensure all new C++ test files are compiled and run locally before merge (or add a fast pre-compile lint step to the PR gate) operator 2026-06-30 (this incident)

References