Raxx · internal docs

internal · gated ↑ index

docs-customer-deploy runbook

System: Cloudflare Pages — raxx-docs (customer docs site at docs.raxx.app) Owner: operator Last incident: none Last reviewed: 2026-05-12

How to tell it's broken

How to diagnose (in order)

  1. Check the GH Actions workflow run for deploy-customer-docs.yml: https://github.com/raxx-app/TradeMasterAPI/actions/workflows/deploy-customer-docs.yml Expected: last run on main is green. A red run means the most recent push to docs/customer/** failed to deploy.

  2. Check the Cloudflare Pages dashboard for the raxx-docs project: https://dash.cloudflare.com/ → Pages → raxx-docs Expected: last deployment shows "Active" on the production branch. A "Failed" deployment means wrangler upload was rejected.

  3. Verify the DNS CNAME exists and points at raxx-docs.pages.dev: dig CNAME docs.raxx.app +short Expected output: raxx-docs.pages.dev. (proxied records return CF anycast IPs instead — use the CF API if dig returns IPs).

  4. Confirm the custom domain is attached to the raxx-docs Pages project: curl -sS "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/pages/projects/raxx-docs/domains" \ -H "Authorization: Bearer $CF_PAGES_DEPLOY_TOKEN" | python3 -m json.tool Expected: response contains "name": "docs.raxx.app" with "status": "active".

  5. Smoke the live site manually: curl -sS https://docs.raxx.app/ | grep "Raxx Documentation" Expected: non-empty output containing the docs marker string.

  6. Check the Cloudflare status page for Pages outages: https://www.cloudflarestatus.com/

Known failure modes

Failure mode A: Deploy freeze blocked the workflow

Symptom: Workflow fails at the Deploy freeze check step. No deploy happens. Cause: A global deploy freeze is active (see docs/ops/runbooks/deploy-freeze.md). Fix: Wait for the freeze to be lifted by the operator. Do not bypass the freeze gate. Verification: Re-run the workflow after the freeze is cleared.

Failure mode B: CF_PAGES_DEPLOY_TOKEN missing or expired

Symptom: Workflow fails at Load CF Pages deploy token from vault or at the wrangler deploy step with Authentication error or 10000. Cause: The CF_PAGES_DEPLOY_TOKEN secret is absent from Infisical at /MooseQuest/cloudflare/ or the token has been rotated and the vault value is stale. Fix: 1. Retrieve the current token value from the Cloudflare dashboard (Account → API Tokens → find the Pages deploy token for raxx-docs). 2. Rotate if compromised; otherwise confirm the token scope includes Cloudflare Pages: Edit and Account: Read. 3. Update the vault entry: infisical secrets set CF_PAGES_DEPLOY_TOKEN=<new_value> \ --path /MooseQuest/cloudflare/ --env prod 4. Re-run the workflow.

Verification: Workflow Load CF Pages deploy token from vault step succeeds without warnings.

Failure mode C: raxx-docs Pages project does not exist

Symptom: Wrangler deploy step fails with Project not found or 8000007. Cause: The raxx-docs Pages project was deleted from the Cloudflare account, or was never created. Fix: The workflow self-heals this case via the Ensure CF Pages project (raxx-docs) step — re-run the workflow. If the project was deleted intentionally and the wrangler step still fails, run:

export CLOUDFLARE_API_TOKEN=$(infisical secrets get CF_PAGES_DEPLOY_TOKEN \
  --path /MooseQuest/cloudflare/ --plain)
export CLOUDFLARE_ACCOUNT_ID=$(infisical secrets get CLOUDFLARE_ACCOUNT_ID \
  --path /MooseQuest/cloudflare/ --plain)
npx wrangler@4.84.0 pages project create raxx-docs --production-branch=main

Verification: wrangler pages project list shows raxx-docs.

Failure mode D: Custom domain not attached or in pending state

Symptom: https://docs.raxx.app/ redirects to raxx-docs.pages.dev or returns a CF domain-verification error. The custom domain API response shows "status": "pending_validation". Cause: The custom domain was removed from the Pages project, or the DNS CNAME is wrong and CF Pages could not verify ownership. Fix: 1. Confirm the CNAME exists (see diagnostic step 3 above). 2. Re-attach the domain via the Ensure CF Pages custom domain step by re-running the workflow. Alternatively, run the API call manually: curl -sS -X POST \ -H "Authorization: Bearer $CF_PAGES_DEPLOY_TOKEN" \ -H "Content-Type: application/json" \ "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/pages/projects/raxx-docs/domains" \ -d '{"name":"docs.raxx.app"}' 3. Wait up to 60 seconds for CF Pages to verify the CNAME and set the domain to active. Verification: Domain API response shows "status": "active" and curl https://docs.raxx.app/ returns HTTP 200.

Failure mode E: DNS CNAME missing (first-time setup)

Symptom: https://docs.raxx.app/ returns NXDOMAIN or a connection error. dig CNAME docs.raxx.app returns nothing. Cause: The DNS CNAME was never created. This is expected on first deploy if Terraform has not been applied. Fix (Terraform — preferred):

cd terraform/cf-pages-docs-customer
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CLOUDFLARE_EDIT_DNS \
  --path /MooseQuest/cloudflare/ --plain)
terraform init
terraform apply -target=cloudflare_record.docs_customer_cname

Fix (manual fallback — if Terraform is not available): The workflow Ensure docs.raxx.app DNS CNAME step will create the CNAME automatically. Trigger the workflow via workflow_dispatch from the Actions tab.

Verification: dig CNAME docs.raxx.app +short returns raxx-docs.pages.dev. or the site responds correctly (CF proxy returns IPs, not the bare CNAME).

Failure mode F: Lint violation blocks build

Symptom: Workflow fails at Build customer docs step with LINT FAIL in the logs — either a broker-name or forward-looking-phrase lint violation. Cause: A docs/customer/*.md file contains a broker vendor name (e.g., "Alpaca", "SnapTrade") or a forward-looking phrase (e.g., "will generate", "expected to"). Fix: 1. Identify the violating file and line from the workflow log output. 2. Edit the file to remove the violating phrase. Use vendor-agnostic copy per feedback_no_backend_branding.md and feedback_no_forward_looking_framing.md. 3. Push the fix to main. The workflow re-runs automatically. Verification: Workflow Build customer docs step completes with LINT PASS for both checks.

Emergency stop

To take the docs site offline (e.g., during a content incident requiring immediate removal):

Option 1 — Remove the DNS CNAME (fastest; site becomes unreachable at docs.raxx.app; Pages project and content remain intact):

cd terraform/cf-pages-docs-customer
export CLOUDFLARE_API_TOKEN=$(infisical secrets get CLOUDFLARE_EDIT_DNS \
  --path /MooseQuest/cloudflare/ --plain)
terraform destroy -target=cloudflare_record.docs_customer_cname

Option 2 — Delete the Pages project (destructive; removes all deployments): Do not do this without operator authorization. File a type:reliability issue first.

Operator action items (first-time setup)

These are one-time actions required before the workflow can complete a production deploy:

  1. Confirm CF_PAGES_DEPLOY_TOKEN exists in Infisical at /MooseQuest/cloudflare/ with Pages:Edit + Account:Read scope.
  2. Confirm CLOUDFLARE_ACCOUNT_ID exists in Infisical at /MooseQuest/cloudflare/.
  3. Confirm CLOUDFLARE_EDIT_DNS exists in Infisical at /MooseQuest/cloudflare/ with DNS:Edit scope on the raxx.app zone.
  4. Apply the Terraform stack to provision the Pages project and DNS CNAME: cd terraform/cf-pages-docs-customer terraform init terraform apply
  5. Trigger the first deploy via workflow_dispatch from the Actions tab or push a change to docs/customer/**.

References