RCA — tickets.raxx.app HTTP 526 (CF SSL mode Full Strict / snakeoil origin cert)
Incident ID: 2026-06-19-freescout-526-ssl-strict
Date: 2026-06-19
Severity: SEV-2
Duration: ~45m total (detection ~19:15 UTC — resolved ~20:00 UTC)
Blast radius: Customer support portal tickets.raxx.app fully unavailable; Console Investigate chip links broken; operator unable to reach FreeScout inbox.
Author: sre-agent
Summary
Cloudflare returned HTTP 526 "Invalid SSL Certificate" for tickets.raxx.app. The FreeScout Lightsail origin (raxx-tickets, 54.146.13.200) was healthy — Apache active, port 443 listening, snakeoil cert valid until 2036 — but the raxx.app zone SSL mode was set to full_strict. Full Strict validates the origin certificate chain and rejects self-signed certs; the snakeoil cert is not chain-trusted by Cloudflare. The fix was to issue a Cloudflare Origin Certificate for tickets.raxx.app (valid 15 years) using a short-lived Zone:SSL and Certificates:Write token minted from CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN, install it on the origin, and reload Apache — restoring Full Strict with a valid origin cert rather than downgrading the zone security posture.
Timeline (all times UTC)
- ~19:15 — Operator reports
https://tickets.raxx.appreturns CF error 526 - 19:15 — sre-agent begins investigation (this session)
- 19:17 — SSH to origin confirms Apache active, port 443 listening, snakeoil cert valid until 2036; no certbot installed;
apache2ctl configtest: Syntax OK - 19:20 — CF zone SSL setting confirmed
strictvia API — root cause identified - 19:25 — Vault authentication via Infisical universal-auth confirmed working
- 19:30 —
CLOUDFLARE_RAXX_AUTOMATION_API_TOKENverified active;permission_groupslist fetched; SSL+Certificates Write group IDc03055bc037c4ea9afb9a9f104b7b721confirmed - 19:33 — Short-lived token
CF_SSL_CERT_ORIGIN_CA_RAXX_APP_TMPminted (expires 2026-06-20T23:59:59Z); verified active - 19:34 — Lightsail SSH key retrieved from SSM
/raxx/freescout/ssh_key - 19:35 — CSR retrieved from
/tmp/cf-origin.csron origin (pre-staged by prior agent) - 19:36 — Origin CA certificate issued: ID
54595718253663594481859538935220508661437382197, expires 2041-06-15, viaPOST /certificates - 19:37 — Cert PEM written to
/etc/ssl/certs/cf-origin.pemon origin; Apache vhost updated (snakeoil refs replaced);apache2ctl configtest: Syntax OK;systemctl reload apache2: success - 19:40 —
curl -sS -o /dev/null -w "%{http_code}" https://tickets.raxx.app/→ 302 (FreeScout login redirect — correct) - 19:40 — CF-Ray header present; server: cloudflare; public LE cert on edge confirmed unchanged
- 19:40 — Origin serving
CN=CloudFlare Origin Certificateconfirmed via SSH s_client probe - 19:40 — Zone SSL mode confirmed
strict(unchanged — never touched) - 19:42 — Temporary SSL cert tokens revoked (2 tokens: cert issuance complete)
- 19:43 — Cert metadata written to vault:
CF_ORIGIN_CERT_TICKETS_RAXX_APP_ID,CF_ORIGIN_CERT_TICKETS_RAXX_APP_EXPIRES - ~20:00 — RCA, runbook update, and git commit completed
Impact
- Users affected: All beta testers attempting to reach
tickets.raxx.app(support portal, login, customer portal) - User-visible symptoms: Browser showed CF error 526 / "Invalid SSL certificate" — site completely inaccessible
- Data integrity: OK (no writes failed; MariaDB and FreeScout data intact on origin throughout)
- Revenue / billing: N/A (pre-launch)
What went well
- Root cause identified within 5 minutes of starting investigation — CF SSL mode API check (
GET /zones/{id}/settings/ssl) is fast and unambiguous - Origin health confirmed quickly: Apache status, port 443, cert validity, certbot absence — all via SSH in under 2 minutes
- The pre-staged CSR and private key (
/tmp/cf-origin.csr,/etc/ssl/private/cf-origin.key) by the prior agent made the fix path entirely unblocked CLOUDFLARE_RAXX_AUTOMATION_API_TOKENhadAPI Tokens:Writescope — short-lived scoped token minted without operator action- SSM path
/raxx/freescout/ssh_keywas documented and functional; no manual key retrieval needed - End-to-end fix (token mint → cert issue → SSH install → Apache reload → verify) completed in under 10 minutes of active work
- Zone SSL mode stayed
strictthroughout — no security posture downgrade
What didn't go well
- CF SSL mode was changed to
strictwithout a corresponding origin cert — this is the second most common 526 cause and the runbook only documented snakeoil as valid forfullmode, notfull_strict - No monitoring existed for CF zone SSL mode drift — operator had to report the 526; no automated alert fired
- The
freescout-cert-renewal.mdrunbook described snakeoil as permanently sufficient; it did not document the Full Strict failure mode or the CF Origin Certificate path until this incident - Multiple temporary SSL tokens were minted across session boundaries (3 total; 2 revoked mid-session, 2 revoked post-install) — a single-session approach would have been cleaner; shell state loss across Bash calls required repeated vault re-auth
Root cause analysis
-
Contributing factor 1: CF zone SSL mode changed to
full_strict— Theraxx.appzone SSL mode was at some point changed fromfulltofull_strict(CF dashboard, Terraform, or CF auto-upgrade). Full Strict validates the origin certificate chain. The snakeoil self-signed cert (CN=ip-172-26-11-76, self-signed) is not chain-trusted by Cloudflare Origin CA — CF rejects it with 526. This is a zone-wide setting; it affected all origins in the zone, but onlytickets.raxx.app(the only origin using snakeoil) surfaced the 526. -
Contributing factor 2: No CF Origin Certificate pre-installed — The architecture was designed for
full(non-strict) mode, where the snakeoil cert is sufficient. Had a CF Origin Certificate been installed on the origin from the start, the zone mode change fromfulltofull_strictwould have been a no-op for this origin. -
Contributing factor 3: No CF zone SSL mode drift monitoring — No synthetic probe or API poll watched the zone SSL setting. A daily
GET /zones/{id}/settings/sslcheck would have caught the drift tostrictbefore users experienced a 526. -
Contributing factor 4: Runbook described snakeoil as the permanent origin cert — The
freescout-cert-renewal.mdrunbook explicitly documented snakeoil as "correct by design" and did not address the Full Strict failure mode. A runbook update after this incident adds the CF Origin Certificate as the documented cert and documents Failure Mode E.
Detection
- What alerted us: Operator report
- Time between cause and detection: Unknown (CF Audit Log would show when SSL mode changed)
- How to detect faster: Synthetic probe (
curl tickets.raxx.app, alert on 526/525); daily CF SSL mode API poll
Resolution
- What was changed:
1. Short-lived CF token minted with
Zone:SSL and Certificates:Writeforraxx.appzone (24h TTL, revoked after use) 2. CF Origin Certificate issued fortickets.raxx.appviaPOST /api/v4/certificates(15-year validity, expires 2041-06-15) 3. Cert PEM written to/etc/ssl/certs/cf-origin.pemonraxx-tickets(54.146.13.200) 4. Apache vhost/etc/apache2/sites-available/freescout-ssl.confupdated:SSLCertificateFileandSSLCertificateKeyFilenow point tocf-origin.pem/cf-origin.key(snakeoil refs removed) 5.apache2ctl configtestpassed;systemctl reload apache2completed 6. Cert metadata written to vault:CF_ORIGIN_CERT_TICKETS_RAXX_APP_IDandCF_ORIGIN_CERT_TICKETS_RAXX_APP_EXPIRESat/MooseQuest/cloudflare - Validation:
curl tickets.raxx.app→ 302 (FreeScout login redirect — correct, not 526)- CF-Ray header present; server: cloudflare
- Public-facing cert: CN=raxx.app / Let's Encrypt E7 — unchanged
- Origin cert (via SSH s_client): CN=CloudFlare Origin Certificate, notAfter Jun 15 2041
- Zone SSL mode:
strict— confirmed unchanged
Action items
| # | Action | Owner | Due | Notes |
|---|---|---|---|---|
| 1 | Add synthetic 526/525 probe for tickets.raxx.app to docs/ops/runbooks/synth-probes.md |
sre-agent | 2026-06-26 | Closes detection gap; pairs with existing cert probe in #715 |
| 2 | Add daily CF zone SSL mode poll (GET /zones/{id}/settings/ssl != expected) to ops sweep |
sre-agent | 2026-06-26 | Catches zone-level drift before next 526 |
| 3 | Add Zone:SSL and Certificates:Write scope note to cloudflare-tokens.md inventory (which token can issue origin certs, which cannot) |
sre-agent | 2026-06-21 | CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN is confirmed capable |
| 4 | Check CF Audit Log to determine what changed SSL mode to strict (dashboard, Terraform, or CF auto-upgrade) |
operator | 2026-06-20 | CF dashboard: Account Home → Audit Log → filter zone=raxx.app, resource=ssl |
References
- Runbook:
docs/ops/runbooks/freescout-cert-renewal.md(updated this incident — see Failure Mode E) - Runbook:
docs/ops/runbooks/freescout.md - CF Origin Certificates:
https://developers.cloudflare.com/ssl/origin-configuration/origin-ca/ - CF cert API:
POST https://api.cloudflare.com/client/v4/certificates - Vault cert metadata:
/MooseQuest/cloudflare/CF_ORIGIN_CERT_TICKETS_RAXX_APP_ID - Prior diagnosis:
docs/ops/incidents/(prior agent worktree:agent-a83b7a1dd15af2397) - Issue #715 (cert monitoring — open)