Raxx · internal docs

internal · gated

DET-BETA-002 — preview screen scraping / automated traversal

Rule ID: DET-BETA-002 Title: Automated traversal of beta preview screens — sequential rapid-fire GET requests on /api/beta/preview/<token>/screen/* Category: beta Last validated: 2026-06-12 (beta-launch campaign) State: live — beta preview endpoints active in prod. Manual query path until Heroku drain is wired.

Why this detection exists

The beta preview flow is designed as a human-paced 5-screen walkthrough. Each screen requires reading (40–120 seconds of human engagement) and a deliberate POST before the next screen unlocks. An automated scraper that harvests all five screens for their motive copy, screenshot IDs, and rubric questions would:

  1. Extract NDA-bound intellectual property (the motive copy and feature framing is pre-launch strategy language).
  2. Potentially submit synthetic rubric responses, contaminating the feedback dataset used to calibrate messaging.
  3. Probe the API surface for injection or state-machine bypass (e.g., can you GET screen/5 without POSTing through screens 1–4?).

The screen-lock enforcement in beta_preview.py (sequential access check via screen_completed_max) provides structural protection against skip-ahead. This detection confirms the structural control is firing correctly and catches a scraper that completes each screen legitimately but at machine speed.

Telemetry source

Telemetry gap: same as DET-BETA-001 — Heroku Logplex drain not wired. Manual query required until P1 prerequisite from the June 4 catalog is met.

Statistical method + baseline window

Threshold + expected FP rate

Alert route

Escalation owner

Test fixture / synthetic positive

See _fixtures/preview_screen_scraping_positive.json — a synthetic session for token hash synth-scraper-token-001 completing all 5 screens in 34 seconds, with inter-request gaps of 0.8–2.1 seconds.

Manual query (until Heroku drain exists)

# Identify all token slugs that appear in logs with screen requests,
# then compute time from first GET /screen/1 to last POST /screen/5
heroku logs --app raxx-api-prod --num 3000 \
  | grep 'beta/preview' \
  | grep -E '/screen/' \
  | awk '{print $1, $7}' \
  | sort -k2
# Manually compute per-token time spans from first to last request.

What NOT to do