Raxx · internal docs

internal · gated ↑ index

status-d1 Runbook

Status: Active Created: 2026-04-30 UTC Parent issue: #646 Architecture: docs/architecture/status-raxx-app.md

Status state lives in a Cloudflare D1 database (raxx-status-db) served by a Cloudflare Worker (raxx-status-worker) at status.raxx.app/api/*. This runbook covers provisioning, migration, debugging, and rotation.


1. Architecture overview

FreeScout webhook       3P poller (Raptor)
       |                       |
       v                       v
POST /api/internal/status/*  (X-Internal-Status-Token)
       |
       v
raxx-status-worker (CF Worker)
       |
       v
raxx-status-db (CF D1 / SQLite)
       |
       v
GET /api/status/public/*  (no auth)
       |
       v
status.raxx.app (React app, CF Pages)

Writers (FreeScout webhook, 3P poller) POST to the Worker's internal endpoints. The Worker reads from D1 and serves public endpoints. The React status page reads public endpoints at the same origin (status.raxx.app/api/*).


2. One-shot operator setup (run once per environment)

2a. Provision D1 database

cd frontend/status-worker
npm install

# Create the database (run once — idempotent on re-run)
npx wrangler d1 create raxx-status-db
# Output: database_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Copy the database_id into frontend/status-worker/wrangler.toml:

[[d1_databases]]
binding = "DB"
database_name = "raxx-status-db"
database_id = "PASTE_YOUR_DATABASE_ID_HERE"

2b. Run migrations

npx wrangler d1 migrations apply raxx-status-db --remote
# Output: Applied migration 0001_initial.sql

Migrations are idempotent (CREATE TABLE IF NOT EXISTS). Safe to re-run.

2c. Mint and vault the service token

The STATUS_INTERNAL_WRITE_TOKEN authenticates writes from Raptor (3P poller, FreeScout webhook) to the Worker.

Generate:

python3 -c "import secrets; print(secrets.token_urlsafe(32))"

Add to CF Worker secrets:

npx wrangler secret put STATUS_INTERNAL_WRITE_TOKEN --name raxx-status-worker
# Paste the token value at the prompt

Vault in Infisical at /MooseQuest/cloudflare/STATUS_INTERNAL_WRITE_TOKEN (production environment).

Add to Raptor (Heroku) config:

heroku config:set STATUS_INTERNAL_WRITE_TOKEN=<value> --app raxx-api-prod
heroku config:set STATUS_WORKER_URL=https://status.raxx.app --app raxx-api-prod

2d. Deploy the Worker

npx wrangler deploy

The CI workflow (deploy-status-worker.yml) handles this automatically on push to main when frontend/status-worker/** changes.


3. Inspecting D1 state

Via wrangler CLI

# List all surface states
npx wrangler d1 execute raxx-status-db --remote \
  --command "SELECT surface_id, state, state_since, ticket_pending, public_note FROM surface_state ORDER BY surface_id;"

# List open incidents
npx wrangler d1 execute raxx-status-db --remote \
  --command "SELECT * FROM status_incidents WHERE resolved_at IS NULL ORDER BY opened_at DESC;"

# Last 20 audit log entries
npx wrangler d1 execute raxx-status-db --remote \
  --command "SELECT * FROM status_audit_log ORDER BY created_at DESC LIMIT 20;"

# Audit log for a specific surface
npx wrangler d1 execute raxx-status-db --remote \
  --command "SELECT * FROM status_audit_log WHERE surface_id = 'app-raxx-app' ORDER BY created_at DESC LIMIT 20;"

Via public API (read-only, no auth)

# All surfaces + overall status
curl -s https://status.raxx.app/api/status/public/surfaces | python3 -m json.tool

# Current incidents (30-day window)
curl -s https://status.raxx.app/api/status/public/incidents | python3 -m json.tool

# Market time widget
curl -s https://status.raxx.app/api/status/public/widgets/market-time | python3 -m json.tool

4. Manual state override

Inject a state update directly via the internal API:

TOKEN=$(infisical secrets get STATUS_INTERNAL_WRITE_TOKEN --path /MooseQuest/cloudflare --env prod --plain)

curl -X POST https://status.raxx.app/api/internal/status/update-surface \
  -H "Content-Type: application/json" \
  -H "X-Internal-Status-Token: $TOKEN" \
  -d '{
    "surface_id": "app-raxx-app",
    "state": "DEGRADED",
    "state_source": "manual",
    "actor": "ops",
    "public_note": "Investigating login issues",
    "ticket_pending": true
  }'

To clear a surface back to OPERATIONAL:

curl -X POST https://status.raxx.app/api/internal/status/update-surface \
  -H "Content-Type: application/json" \
  -H "X-Internal-Status-Token: $TOKEN" \
  -d '{"surface_id": "app-raxx-app", "state": "OPERATIONAL", "state_source": "manual", "actor": "ops", "public_note": null, "ticket_pending": false}'

5. Running migrations (re-run after schema changes)

Migrations live in frontend/status-worker/migrations/. Filename convention: NNNN_description.sql.

To add a new migration: 1. Create frontend/status-worker/migrations/0002_description.sql 2. Write CREATE TABLE IF NOT EXISTS or ALTER TABLE statements 3. Commit and push — the deploy workflow applies it automatically

Manual apply:

npx wrangler d1 migrations apply raxx-status-db --remote

6. Seeding surface_state rows

After a fresh DB, surface_state is empty. The Worker serves an empty surfaces list until rows are seeded. To seed all 24 surfaces as UNKNOWN:

TOKEN="your-token"
# Run from the backend_v2 directory:
python3 scripts/seed_surface_state.py --worker-url https://status.raxx.app --token "$TOKEN"

(If the seed script doesn't exist yet, the 3P poller will create rows on first successful poll. The FreeScout webhook will create rows on first ticket open. Alternatively, use the manual override in §4 for each surface.)


7. Rotating the service token

  1. Generate a new token: python3 -c "import secrets; print(secrets.token_urlsafe(32))"
  2. Update in CF Worker: npx wrangler secret put STATUS_INTERNAL_WRITE_TOKEN --name raxx-status-worker
  3. Update in Heroku: heroku config:set STATUS_INTERNAL_WRITE_TOKEN=<new> --app raxx-api-prod
  4. Update in Infisical: /MooseQuest/cloudflare/STATUS_INTERNAL_WRITE_TOKEN

There is a brief window (seconds, during Heroku dyno restart) where writes will fail. The 3P poller is resilient to single-poll failures. The FreeScout webhook returns 500 on failure, which triggers FreeScout's built-in retry.


8. Debugging the 3P poller

Enable the flag and check Heroku logs:

heroku logs --tail --app raxx-api-prod | grep partner_status_poller

Expected log lines:

partner_status_poller: scheduler started (interval=60s)
partner_status_poller: polling 4 of 24 surfaces
partner_status_poller: cloudflare -> OPERATIONAL

If writes fail:

partner_status_poller: Worker write returned HTTP 401 for cloudflare: ...

→ Check STATUS_INTERNAL_WRITE_TOKEN is set and matches the Worker secret.

partner_status_poller: STATUS_WORKER_URL is not set

→ Set heroku config:set STATUS_WORKER_URL=https://status.raxx.app --app raxx-api-prod


9. Debugging the FreeScout webhook

See docs/ops/runbooks/freescout-webhook-debug.md for the full debug guide.

Key check: verify the Worker URL and token are set on Raptor:

heroku config --app raxx-api-prod | grep STATUS
# Should show:
# STATUS_INTERNAL_WRITE_TOKEN: <redacted>
# STATUS_WORKER_URL: https://status.raxx.app

Test the webhook locally:

SECRET="your-freescout-secret"
SERVICE_TOKEN="your-service-token"
PAYLOAD='{"event":"conversation.created","event_id":"manual-test-001","ticket":{"id":9001,"status":"open","updated_at":"2026-04-30T12:00:00Z","custom_fields":[{"slug":"component_tag","value":"app-raxx-app"},{"slug":"public_status","value":"Login degraded test"},{"slug":"incident_severity","value":"degraded"}]}}'
SIG=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print "sha256="$2}')
curl -X POST http://localhost:5001/api/internal/status/freescout-webhook \
  -H "Content-Type: application/json" \
  -H "X-FreeScout-Signature: $SIG" \
  -H "X-Service-Token: $SERVICE_TOKEN" \
  -d "$PAYLOAD"

10. Schema reference

surface_state

One row per surface. Written by: 3P poller, FreeScout webhook, probe worker, manual override.

Column Type Notes
surface_id TEXT PK FK to surface_registry.yaml id
state TEXT OPERATIONAL|DEGRADED|PARTIAL|DOWN|MAINTENANCE|UNKNOWN
state_since TEXT ISO 8601 UTC; updated on state change
state_source TEXT prober|freescout|3p_poller|manual|schedule
ticket_pending INTEGER 0 or 1; true = open FreeScout ticket
public_note TEXT Operator-written, max 280 chars; NULL if none
last_probe_at TEXT Last probe timestamp (probe worker only)
maintenance_until TEXT Set when state=MAINTENANCE
updated_at TEXT Always updated on write

status_incidents

Incident history. Active incidents: ticket_pending=1 on surface_state, not this table.

Column Type Notes
id INTEGER PK AUTOINCREMENT Opaque; public ID is inc_NNNNN
surface_id TEXT
opened_at TEXT
resolved_at TEXT NULL = still open
public_note TEXT Final operator note at close
freescout_ticket_id INTEGER UNIQUE Internal only; never in public API
severity TEXT degraded|partial|down|maintenance

status_audit_log

Every state transition. Retained indefinitely. No PII.

Column Type Notes
id INTEGER PK AUTOINCREMENT
surface_id TEXT
actor TEXT prober|freescout|3p_poller|operator_id
previous_state TEXT NULL on first write
new_state TEXT
source TEXT Same values as state_source
note TEXT Human-readable context
created_at TEXT