status-d1 Runbook
Status: Active
Created: 2026-04-30 UTC
Parent issue: #646
Architecture: docs/architecture/status-raxx-app.md
Status state lives in a Cloudflare D1 database (raxx-status-db) served by a Cloudflare Worker (raxx-status-worker) at status.raxx.app/api/*. This runbook covers provisioning, migration, debugging, and rotation.
1. Architecture overview
FreeScout webhook 3P poller (Raptor)
| |
v v
POST /api/internal/status/* (X-Internal-Status-Token)
|
v
raxx-status-worker (CF Worker)
|
v
raxx-status-db (CF D1 / SQLite)
|
v
GET /api/status/public/* (no auth)
|
v
status.raxx.app (React app, CF Pages)
Writers (FreeScout webhook, 3P poller) POST to the Worker's internal endpoints. The Worker reads from D1 and serves public endpoints. The React status page reads public endpoints at the same origin (status.raxx.app/api/*).
Token storage exception — CF_STATUS_WORKER_DEPLOY_TOKEN
This token lives in GitHub Actions repo secrets, NOT Infisical vault.
| Field | Value |
|---|---|
| Secret name | CF_STATUS_WORKER_DEPLOY_TOKEN |
| Storage | GH Actions repo secret (raxx-app/TradeMasterAPI) |
| CF Token ID | 099b43b3c253aa906998a5e8a3157085 |
| Scopes | D1 Write, Workers Scripts Write, Account Settings Read, Workers Routes Write |
| Expires | 2027-05-28 (rotate annually) |
| Minted | 2026-05-28 UTC |
Rationale: Infisical client-side E2EE blocks agent secret writes. The agent machine identity supports reads (server decrypts) but cannot write new secrets without the project workspace key. The operator authorized a repo-secret path as an autonomous unblock per 2026-05-28 decision (issue #2921). All other CF tokens default to vault; this is a named exception.
Rotation procedure (annual, due 2027-05-28):
1. Mint a new CF token via CF API or dashboard with identical scopes.
2. echo "<value>" | gh secret set CF_STATUS_WORKER_DEPLOY_TOKEN --repo raxx-app/TradeMasterAPI
3. Revoke the old token (ID 099b43b3c253aa906998a5e8a3157085) via CF dashboard.
4. Update the token ID in this runbook and in the workflow comment header.
Future migration path: Once the agent has a vault write path (Velvet-style rotation worker, tracked in docs/architecture/velvet.md), migrate this token to vault and remove the repo-secret exception.
2. One-shot operator setup (run once per environment)
2a. Provision D1 database
cd frontend/status-worker
npm install
# Create the database (run once — idempotent on re-run)
npx wrangler d1 create raxx-status-db
# Output: database_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Copy the database_id into frontend/status-worker/wrangler.toml:
[[d1_databases]]
binding = "DB"
database_name = "raxx-status-db"
database_id = "PASTE_YOUR_DATABASE_ID_HERE"
2b. Run migrations
npx wrangler d1 migrations apply raxx-status-db --remote
# Output: Applied migration 0001_initial.sql
Migrations are idempotent (CREATE TABLE IF NOT EXISTS). Safe to re-run.
2c. Mint and vault the service token
The STATUS_INTERNAL_WRITE_TOKEN authenticates writes from Raptor (3P poller, FreeScout webhook) to the Worker.
Generate:
python3 -c "import secrets; print(secrets.token_urlsafe(32))"
Add to CF Worker secrets:
npx wrangler secret put STATUS_INTERNAL_WRITE_TOKEN --name raxx-status-worker
# Paste the token value at the prompt
Vault in Infisical at /MooseQuest/cloudflare/STATUS_INTERNAL_WRITE_TOKEN (production environment).
Add to Raptor (Heroku) config:
heroku config:set STATUS_INTERNAL_WRITE_TOKEN=<value> --app raxx-api-prod
heroku config:set STATUS_WORKER_URL=https://status.raxx.app --app raxx-api-prod
2d. Deploy the Worker
npx wrangler deploy
The CI workflow (deploy-status-worker.yml) handles this automatically on push to main when frontend/status-worker/** changes.
3. Inspecting D1 state
Via wrangler CLI
# List all surface states
npx wrangler d1 execute raxx-status-db --remote \
--command "SELECT surface_id, state, state_since, ticket_pending, public_note FROM surface_state ORDER BY surface_id;"
# List open incidents
npx wrangler d1 execute raxx-status-db --remote \
--command "SELECT * FROM status_incidents WHERE resolved_at IS NULL ORDER BY opened_at DESC;"
# Last 20 audit log entries
npx wrangler d1 execute raxx-status-db --remote \
--command "SELECT * FROM status_audit_log ORDER BY created_at DESC LIMIT 20;"
# Audit log for a specific surface
npx wrangler d1 execute raxx-status-db --remote \
--command "SELECT * FROM status_audit_log WHERE surface_id = 'app-raxx-app' ORDER BY created_at DESC LIMIT 20;"
Via public API (read-only, no auth)
# All surfaces + overall status
curl -s https://status.raxx.app/api/status/public/surfaces | python3 -m json.tool
# Current incidents (30-day window)
curl -s https://status.raxx.app/api/status/public/incidents | python3 -m json.tool
# Market time widget
curl -s https://status.raxx.app/api/status/public/widgets/market-time | python3 -m json.tool
4. Manual state override
Inject a state update directly via the internal API:
TOKEN=$(infisical secrets get STATUS_INTERNAL_WRITE_TOKEN --path /MooseQuest/cloudflare --env prod --plain)
curl -X POST https://status.raxx.app/api/internal/status/update-surface \
-H "Content-Type: application/json" \
-H "X-Internal-Status-Token: $TOKEN" \
-d '{
"surface_id": "app-raxx-app",
"state": "DEGRADED",
"state_source": "manual",
"actor": "ops",
"public_note": "Investigating login issues",
"ticket_pending": true
}'
To clear a surface back to OPERATIONAL:
curl -X POST https://status.raxx.app/api/internal/status/update-surface \
-H "Content-Type: application/json" \
-H "X-Internal-Status-Token: $TOKEN" \
-d '{"surface_id": "app-raxx-app", "state": "OPERATIONAL", "state_source": "manual", "actor": "ops", "public_note": null, "ticket_pending": false}'
5. Running migrations (re-run after schema changes)
Migrations live in frontend/status-worker/migrations/. Filename convention: NNNN_description.sql.
To add a new migration:
1. Create frontend/status-worker/migrations/0002_description.sql
2. Write CREATE TABLE IF NOT EXISTS or ALTER TABLE statements
3. Commit and push — the deploy workflow applies it automatically
Manual apply:
npx wrangler d1 migrations apply raxx-status-db --remote
6. Seeding surface_state rows
After a fresh DB, surface_state is empty. The Worker serves an empty surfaces list until rows are seeded. To seed all 24 surfaces as UNKNOWN:
TOKEN="your-token"
# Run from the backend_v2 directory:
python3 scripts/seed_surface_state.py --worker-url https://status.raxx.app --token "$TOKEN"
(If the seed script doesn't exist yet, the 3P poller will create rows on first successful poll. The FreeScout webhook will create rows on first ticket open. Alternatively, use the manual override in §4 for each surface.)
7. Rotating the service token
- Generate a new token:
python3 -c "import secrets; print(secrets.token_urlsafe(32))" - Update in CF Worker:
npx wrangler secret put STATUS_INTERNAL_WRITE_TOKEN --name raxx-status-worker - Update in Heroku:
heroku config:set STATUS_INTERNAL_WRITE_TOKEN=<new> --app raxx-api-prod - Update in Infisical:
/MooseQuest/cloudflare/STATUS_INTERNAL_WRITE_TOKEN
There is a brief window (seconds, during Heroku dyno restart) where writes will fail. The 3P poller is resilient to single-poll failures. The FreeScout webhook returns 500 on failure, which triggers FreeScout's built-in retry.
8. Debugging the 3P poller
Enable the flag and check Heroku logs:
heroku logs --tail --app raxx-api-prod | grep partner_status_poller
Expected log lines:
partner_status_poller: scheduler started (interval=60s)
partner_status_poller: polling 4 of 24 surfaces
partner_status_poller: cloudflare -> OPERATIONAL
If writes fail:
partner_status_poller: Worker write returned HTTP 401 for cloudflare: ...
→ Check STATUS_INTERNAL_WRITE_TOKEN is set and matches the Worker secret.
partner_status_poller: STATUS_WORKER_URL is not set
→ Set heroku config:set STATUS_WORKER_URL=https://status.raxx.app --app raxx-api-prod
9. Debugging the FreeScout webhook
See docs/ops/runbooks/freescout-webhook-debug.md for the full debug guide.
Key check: verify the Worker URL and token are set on Raptor:
heroku config --app raxx-api-prod | grep STATUS
# Should show:
# STATUS_INTERNAL_WRITE_TOKEN: <redacted>
# STATUS_WORKER_URL: https://status.raxx.app
Test the webhook locally:
SECRET="your-freescout-secret"
SERVICE_TOKEN="your-service-token"
PAYLOAD='{"event":"conversation.created","event_id":"manual-test-001","ticket":{"id":9001,"status":"open","updated_at":"2026-04-30T12:00:00Z","custom_fields":[{"slug":"component_tag","value":"app-raxx-app"},{"slug":"public_status","value":"Login degraded test"},{"slug":"incident_severity","value":"degraded"}]}}'
SIG=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print "sha256="$2}')
curl -X POST http://localhost:5001/api/internal/status/freescout-webhook \
-H "Content-Type: application/json" \
-H "X-FreeScout-Signature: $SIG" \
-H "X-Service-Token: $SERVICE_TOKEN" \
-d "$PAYLOAD"
10. Schema reference
surface_state
One row per surface. Written by: 3P poller, FreeScout webhook, probe worker, manual override.
| Column | Type | Notes |
|---|---|---|
| surface_id | TEXT PK | FK to surface_registry.yaml id |
| state | TEXT | OPERATIONAL|DEGRADED|PARTIAL|DOWN|MAINTENANCE|UNKNOWN |
| state_since | TEXT | ISO 8601 UTC; updated on state change |
| state_source | TEXT | prober|freescout|3p_poller|manual|schedule |
| ticket_pending | INTEGER | 0 or 1; true = open FreeScout ticket |
| public_note | TEXT | Operator-written, max 280 chars; NULL if none |
| last_probe_at | TEXT | Last probe timestamp (probe worker only) |
| maintenance_until | TEXT | Set when state=MAINTENANCE |
| updated_at | TEXT | Always updated on write |
status_incidents
Incident history. Active incidents: ticket_pending=1 on surface_state, not this table.
| Column | Type | Notes |
|---|---|---|
| id | INTEGER PK AUTOINCREMENT | Opaque; public ID is inc_NNNNN |
| surface_id | TEXT | |
| opened_at | TEXT | |
| resolved_at | TEXT | NULL = still open |
| public_note | TEXT | Final operator note at close |
| freescout_ticket_id | INTEGER UNIQUE | Internal only; never in public API |
| severity | TEXT | degraded|partial|down|maintenance |
status_audit_log
Every state transition. Retained indefinitely. No PII.
| Column | Type | Notes |
|---|---|---|
| id | INTEGER PK AUTOINCREMENT | |
| surface_id | TEXT | |
| actor | TEXT | prober|freescout|3p_poller|operator_id |
| previous_state | TEXT | NULL on first write |
| new_state | TEXT | |
| source | TEXT | Same values as state_source |
| note | TEXT | Human-readable context |
| created_at | TEXT |