Raxx · internal docs

internal · gated

RCA — raxx-queue-prod zero-dyno state: vcpkg shallow clone blocks all CI deploys

Incident ID: 2026-05-27-queue-deploy-vcpkg-shallow-clone Date: 2026-05-27 Severity: SEV-2 Duration: ~9 days (first failure 2026-05-18 23:27 UTC → fix PR #2830 filed 2026-05-27) Blast radius: raxx-queue-prod never received a container deploy; Queue prod is not serving. Staging was unaffected (last deployed correctly, web.1 up). Author: sre-agent

Summary

Every deploy-queue.yml CI run triggered by a push to main since 2026-05-18 has failed at the build-test job. The build-test job shallow-clones vcpkg with --depth 1, but queue/vcpkg.json pins builtin-baseline to a historical SHA (3508985146f1b1d248c67ead13f8f54be5b4f5da). A shallow clone cannot resolve historical tree objects, causing vcpkg to abort with "failed to unpack tree object" for every port (jwt-cpp, spdlog, gtest, nlohmann-json, sentry-native). Because build-test gates both build-container and deploy-prod, no container has ever been pushed to raxx-queue-prod — Repo Size: 0 B. Fix is to remove --depth 1 from the CI test job clone, matching what the production Dockerfile already does.

Timeline (all times UTC)

Impact

What went well

What didn't go well

Root cause analysis

Detection

Resolution

Action items

# Action Owner Due Issue
1 Merge PR #2830, trigger workflow_dispatch prod deploy operator 2026-05-28 #2719
2 Write live Stripe keys to vault at /MooseQuest/Queue/prod/ operator 2026-05-28 #2443
3 Add deploy-queue CI failure monitor (alert when >2 consecutive failures on deploy-queue.yml) sre-agent 2026-06-03 new
4 Add Heroku zero-dyno monitor for raxx-queue-prod (alert if web dynos == 0 for >10 min) sre-agent 2026-06-03 new
5 Add comment in deploy-queue.yml build-test job cross-referencing Dockerfile vcpkg clone note sre-agent 2026-06-03 new

References