RCA — Investigate workflow silently creates no FreeScout tickets
Incident ID: 2026-06-19-investigate-no-tickets
Date: 2026-06-19
Severity: SEV-3
Duration: Unknown — regression was present since feature shipped; detected 2026-06-19 UTC when operator reported "NO tickets" in the Investigate workflow
Blast radius: Internal ops tooling only. No customer-facing impact. Every Investigate click on any degraded surface returned {degraded: true} silently rather than creating a FreeScout ticket.
Author: sre-agent
Summary
The Investigate button on the Console status dashboard has never successfully created a FreeScout ticket since the feature shipped. Two compounding root causes: (1) five required Heroku env vars were never set on raxx-console-prod (FREESCOUT_API_URL, FREESCOUT_API_KEY, CF_ACCESS_SVC_TICKETS_CLIENT_ID, CF_ACCESS_SVC_TICKETS_CLIENT_SECRET, FREESCOUT_INSTANCE_URL), so freescout_client.create_conversation() returned reason="freescout_not_configured" before making any network call; and (2) even if those vars had been set, the freescout_client did not send CF Access service-token headers, so every outbound call to tickets.raxx.app/api/* would have received HTTP 403 from CF Access (the /api path is gated by CF Access app ca6fd315, decision=non_identity). The recent SSL fix to tickets.raxx.app (CF 526 resolved) was a necessary but not sufficient condition for the path to work. Fix: env vars set via Heroku Platform API from vault; code change adds cf_access_tickets_headers() helper and _base_headers() merger into all six freescout_client call sites; 13 new tests; PR #3717.
Timeline (all times UTC)
- 2026-06-19 ~19:40 — Operator reports "Console Investigate workflow shows NO tickets" and requests end-to-end verification
- 2026-06-19 19:50 — sre-agent reads
freescout_client.pyandstatus_page.pyto trace code path - 2026-06-19 19:52 — Live probe of
tickets.raxx.app/api/conversationswithout service-token headers: HTTP 403. CF Access gate confirmed blocking - 2026-06-19 19:53 — Heroku config-vars check:
FREESCOUT_API_URL,FREESCOUT_API_KEY,CF_ACCESS_SVC_TICKETS_CLIENT_ID,CF_ACCESS_SVC_TICKETS_CLIENT_SECRET,FREESCOUT_INSTANCE_URLall absent from raxx-console-prod - 2026-06-19 19:54 — Vault check:
CF_ACCESS_SVC_TICKETS_CLIENT_ID,CF_ACCESS_SVC_TICKETS_CLIENT_SECRET,FREESCOUT_API_KEY,FREESCOUT_OPERATIONS_MAILBOX_IDall present at correct vault paths - 2026-06-19 19:55 — Heroku Platform API PATCH sets all five missing vars on raxx-console-prod; dynos restart automatically
- 2026-06-19 20:00 — Code fix:
_cf_access.pygainscf_access_tickets_headers();freescout_client.pygains_base_headers()and all six call sites updated; 13 tests written and passing - 2026-06-19 20:02 — E2E smoke:
POST /api/conversations→ HTTP 201, conversation_id=26 (tickets.raxx.app/conversation/26) - 2026-06-19 20:05 — PR #3717 filed; CI running
Impact
- Users affected: 0 (internal ops workflow)
- User-visible symptoms: none (Investigate is operator-only)
- Data integrity: ok
- Revenue / billing: ok
- Operational: every Investigate click since feature ship returned
{degraded: true}— ops could not auto-file investigation tickets from degraded tiles. Manual FreeScout ticket creation was the workaround.
What went well
- The
freescout_clientis designed fail-open (HTTP 200 withdegraded: true) — the console didn't 500 on every Investigate click - Vault paths for the service token and API key were correctly provisioned (the tokens existed; just never wired to the Heroku app)
- The 2026-06-05 FreeScout config audit (
docs/ops/runbooks/2026-06-05-freescout-config-audit.md) had correctly flagged thatFREESCOUT_OPERATIONS_MAILBOX_IDwasSETin vault but did not catch that the key was missing from Heroku - CF Access service-token provisioning SOP (
docs/ops/runbooks/cf-access-service-token-provisioning.md) was complete and covered the tickets service token
What didn't go well
- The feature ship (
a02c5279— "surface linked open FreeScout ticket") enabledFLAG_CONSOLE_INVESTIGATE_LINKED_TICKET=1andFLAG_CONSOLE_INVESTIGATE_FROM_STATUS=1on prod but did not include a deployment step to setFREESCOUT_API_URL,FREESCOUT_API_KEY, and the CF Access service-token vars on the Heroku app. The migration (0053_promote_console_investigate_from_status.py) listed these as prerequisites but they were never actioned. warn_if_investigate_misconfigured()was not being called at startup (or its warnings were not visible) — there was no observable signal that the feature was misconfigured.- The
freescout_clienthad no logging at thefreescout_not_configuredbranch — a dyno log search for the feature would show no errors at all, making silent failure hard to detect. - The CF Access requirement for the
/apipath was documented infreescout.mdbut thefreescout_client.pycode comment mentioned onlyX-FreeScout-API-Key. The code and the runbook were inconsistent.
Root cause analysis
- Contributing factor 1: Missing env vars on raxx-console-prod —
FREESCOUT_API_URL,FREESCOUT_API_KEY,CF_ACCESS_SVC_TICKETS_CLIENT_ID,CF_ACCESS_SVC_TICKETS_CLIENT_SECRET, andFREESCOUT_INSTANCE_URLwere never set on the Heroku app. The freescout_client checks forFREESCOUT_API_URLandFREESCOUT_API_KEYon every call and returnsreason="freescout_not_configured"immediately. The system allowed this because there was no deployment gate asserting env var presence when the flags were enabled. - Contributing factor 2: CF Access headers missing from freescout_client — Even with correct env vars,
freescout_client.pyonly sentX-FreeScout-API-Keytotickets.raxx.app/api/*. CF Access appca6fd315(decision=non_identity) gates that path and would have returned HTTP 403 for all requests withoutCF-Access-Client-Id/CF-Access-Client-Secret. The system allowed this because the CF Access requirement for FreeScout's API path was documented in the runbook but not enforced in code or tests. - Contributing factor 3: Silent failure mode —
create_conversation()returningreason="freescout_not_configured"is logged at WARNING level but thestatus_page.investigate()route returns HTTP 200 withdegraded: true. The UI shows a toast (behavior documented in the blueprint) but ops clicking the button during normal operations may have dismissed the toast without reading it carefully.
Detection
- What alerted us: Operator explicitly reported the symptom ("NO tickets") and requested investigation
- How long between cause and detection: Unknown — feature shipped at PR
a02c5279; detection on 2026-06-19 UTC - How to detect faster next time: See action items — add startup health check assertion for Investigate prerequisites; add Sentry alert for repeated
freescout_not_configuredevents
Resolution
- What was changed:
1. Set on raxx-console-prod via Heroku Platform API:
FREESCOUT_API_URL=https://tickets.raxx.app,FREESCOUT_INSTANCE_URL=https://tickets.raxx.app,FREESCOUT_API_KEY(from vault/MooseQuest/freescout/),CF_ACCESS_SVC_TICKETS_CLIENT_IDandCF_ACCESS_SVC_TICKETS_CLIENT_SECRET(from vault/MooseQuest/cloudflare/) 2. Code:_cf_access.pygainscf_access_tickets_headers();freescout_client.pygains_base_headers()helper; all six HTTP call sites updated;warn_if_investigate_misconfigured()now checks CF Access vars 3. PR #3717 filed against main - Validation: E2E smoke —
POST https://tickets.raxx.app/api/conversationswith vault-sourced credentials → HTTP 201, conversation_id=26 (tickets.raxx.app/conversation/26). All 13 new tests + 103 existing investigate/CF-access tests pass.
Action items
| # | Action | Owner | Due | Issue |
|---|---|---|---|---|
| 1 | Add startup_check_investigate_configured() call in console/app/__init__.py (log ERROR-level + emit Sentry event if any Investigate prereq var is missing when flag is ON) |
feature-developer | 2026-07-03 | file as type:reliability |
| 2 | Add Sentry alert rule: freescout_not_configured in freescout_client logger > 2 events in 10m → alert to ops@ |
sre-agent | 2026-06-26 | file as type:reliability |
| 3 | Add FREESCOUT_API_URL to console deployment checklist / migration prerequisite list so future flag enables check that it's provisioned |
sre-agent | 2026-06-26 | update docs/ops/runbooks/auto-ticketing-runbook.md |
| 4 | Update freescout_client module docstring and migration 0053 to list CF Access headers as a named prerequisite (not just FREESCOUT_API_KEY / FREESCOUT_API_URL) |
sre-agent | 2026-06-26 | in this PR (already done) |
References
- PR: https://github.com/raxx-app/TradeMasterAPI/pull/3717
- Runbook:
docs/ops/runbooks/freescout.md(tickets-agent-rw service token section) - CF Access provisioning SOP:
docs/ops/runbooks/cf-access-service-token-provisioning.md - 2026-06-05 FreeScout config audit:
docs/ops/runbooks/2026-06-05-freescout-config-audit.md - E2E smoke ticket:
https://tickets.raxx.app/conversation/26