Raxx · internal docs

internal · gated

Console Ghost-Reset Tool — Design

Status: Accepted
Date: 2026-06-26 UTC
Author: software-architect
Parent card: #3843
Parent epic: #3842 (passkey account recovery and re-enrollment)
ADR: 0133-ghost-reset-tool.md


1. Context

A beta tester can reach a "ghost" state: the Console walkthrough token is consumed (so re-invite is blocked), but no users row and/or no webauthn_credentials row was persisted in Raptor because the passkey ceremony failed mid-flight. The operator has manually resolved this 3x by executing raw SQL. This design productizes that procedure as a Console operator action.

The same flow applies when a users row exists but carries 0 credentials and 0 sessions — an empty shell that never completed enrollment.

Root-cause fix for the underlying ceremony failure shipped in #2771. This tool handles the remediation path for accounts already stuck in ghost state.


2. Invariants


3. Data model

3.1 Raptor — users FK children (complete enumeration)

All tables with a FK to users.id, their ondelete policy, and the correct handling during a ghost reset:

Table FK column ondelete Rows at ghost time Reset handling
webauthn_credentials user_id CASCADE 0 (ghost predicate) CASCADE — already empty
customer_sessions user_id CASCADE 0 live (ghost predicate) CASCADE — already empty
email_verifications user_id CASCADE 0 or 1 unconsumed CASCADE — acceptable loss on a ghost
onboarding_state user_id CASCADE 0 (never advanced) CASCADE
support_pending_submissions customer_id CASCADE 0 CASCADE
support_customer_map raxx_user_id CASCADE 0 CASCADE
signup_reminders user_id CASCADE possibly 1 CASCADE — no real data on a ghost
backtest_runs user_id CASCADE 0 CASCADE
customer_notifications user_id CASCADE 0 CASCADE
order_strategy_links user_id CASCADE 0 CASCADE
customer_trading_mode_overrides user_id CASCADE 0 CASCADE
customer_reminders user_id CASCADE 0 CASCADE
backup_codes customer_id CASCADE 0 CASCADE
freescout_ticket_cache customer_id SET NULL 0 SET NULL (existing behaviour)
lcc_roll_alerts user_id SET NULL 0 SET NULL (existing behaviour)
audit_log actor_user_id no FK (TEXT col, nullable) 0–N rows Preserve. No FK constraint — rows survive deletion automatically. The actor_user_id text value remains as a historical reference to a now-deleted user, which is correct for an immutable audit log.

Key findings:

3.2 No Raptor schema migration needed

The ghost reset operates via: 1. A safety-predicate query (read-only; no schema change). 2. DELETE FROM users WHERE id = :uid (Postgres CASCADE handles children).

No new columns. No Raptor migration.

3.3 Console — beta_walkthrough_tokens side

Token revocation already works via revoked_at = now(). Fresh token minting already works via generate_walkthrough_token() in app/services/beta_token.py. No Console schema change is needed for the ghost-reset action itself.

Flag migration: If the route is gated behind a new feature flag beta_ghost_reset, one promote-flag migration (0179_promote_ghost_reset.py, revising 0178) is required. See §7 for the flag decision.


4. APIs / contracts

4.1 Raptor — ghost-check endpoint (new)

POST /api/internal/users/ghost-check

Auth: Console-to-Raptor machine-auth HMAC service token, same pattern as the existing beta_cross_service_token_verifier endpoint. Header: Authorization: Bearer <RAPTOR_INTERNAL_TOKEN> (already in secret store).

Request body:

{ "email": "user@example.com" }

Response (200):

{
  "state": "ghost_no_users_row" | "ghost_empty_shell" | "mid_enrollment" | "healthy",
  "user_id": 42,
  "credential_count": 0,
  "active_session_count": 0,
  "created_at": "2026-06-12T10:00:00Z"
}

State definitions:

State Condition
ghost_no_users_row No users row for this email
ghost_empty_shell users row exists; credential_count == 0 AND active_session_count == 0; created_at older than 5 minutes
mid_enrollment users row exists; credential_count == 0 AND active_session_count == 0; created_at within last 5 minutes
healthy credential_count >= 1

The 5-minute grace window is the ghost/mid-enrollment disambiguation. A row created within the last 5 minutes with 0 credentials may still be in-flight through the passkey ceremony. After 5 minutes with 0 credentials, it is definitively a ghost.

4.2 Raptor — ghost-delete endpoint (new)

POST /api/internal/users/ghost-delete

Auth: Same HMAC service token.

Request body:

{ "email": "user@example.com", "actor_admin_id": "console-admin-uuid" }

Steps (inside a single DB transaction):

  1. Re-run the ghost-check predicate (belt-and-suspenders inside the TX).
  2. If state is healthy or mid_enrollment: ROLLBACK, return 409 Conflict with {"error": "not_ghost", "state": "<state>"}.
  3. Write account.ghost_reset to Raptor audit_log: actor_user_id = actor_admin_id, action = "account.ghost_reset", target_kind = "user", target_id = str(user_id) or email, context = {"credential_count": 0, "session_count": 0, "path": "ghost_reset"}.
  4. If state is ghost_empty_shell: DELETE FROM users WHERE id = :uid (CASCADE removes all child rows).
  5. If state is ghost_no_users_row: no-op on Raptor DB (no row to delete).
  6. COMMIT. Return 204 No Content.

If step 3 (audit write) fails: ROLLBACK, return 500.

This endpoint lives under /api/internal/ and is not exposed to the public API.

4.3 Console — ghost-reset action endpoint (new)

POST /console/beta/testers/<int:tester_id>/ghost-reset

RBAC: @require_role("superadmin") (matches the rest of beta.py). When RBAC v2 console-users-write role is provisioned, add @require_rbac_role(...) alongside.

Steps (ordered):

  1. Look up BetaTester by tester_id; return 404 if absent.
  2. Call Raptor POST /api/internal/users/ghost-check with tester.email.
  3. If state is healthy: return 409 {"error": "account_healthy", "credential_count": N}.
  4. If state is mid_enrollment: return 409 {"error": "mid_enrollment", "retry_after_seconds": 300}.
  5. Write Console audit event console.beta.ghost_reset.initiated to audit_log (actor = g.admin_id, target_type = "beta_tester", target_id = str(tester_id), payload = {"email": tester.email, "raptor_state": state}). If this write fails, do not proceed; flash error and return.
  6. Call Raptor POST /api/internal/users/ghost-delete {"email": tester.email, "actor_admin_id": str(g.admin_id)}. If Raptor returns non-204: flash error, write console.beta.ghost_reset.failed audit event, redirect.
  7. Revoke all existing BetaWalkthroughToken rows for tester.email: sql UPDATE beta_walkthrough_tokens SET revoked_at = now() WHERE tester_email = :email AND revoked_at IS NULL
  8. Idempotency check: query for an unexpired, unrevoked token minted in the last 60 seconds for this email. If one exists, reuse its raw token value for the email (skip minting).
  9. If no recent token: mint a fresh token via generate_walkthrough_token(tester.email).
  10. Send re-enrollment email via _send_welcome_email(email, name, walkthrough_url).
  11. Write Console audit event console.beta.ghost_reset.completed: {"email": tester.email, "new_token_id": token_id, "email_sent": bool, "actor": g.admin_id}.
  12. Flash success message and redirect to beta.beta_index. If Accept: application/json: return: json { "reset_at": "<ISO-8601 UTC>", "new_token_id": 99, "email_sent_to": "user@example.com", "actor_user_id": "console-admin-uuid" }

Idempotency guarantee: Step 8 prevents double-minting within 60 seconds. A second run on the same (still-ghost) account produces one additional audit event and one additional email — this is acceptable and matches the card's AC.


5. Ghost-state detection predicate

The canonical SQL, run by Raptor in both the ghost-check and ghost-delete endpoints:

SELECT
  CASE
    WHEN u.id IS NULL
      THEN 'ghost_no_users_row'
    WHEN creds.cnt = 0
         AND sess.cnt = 0
         AND u.created_at < now() - INTERVAL '5 minutes'
      THEN 'ghost_empty_shell'
    WHEN creds.cnt = 0
         AND sess.cnt = 0
         AND u.created_at >= now() - INTERVAL '5 minutes'
      THEN 'mid_enrollment'
    ELSE 'healthy'
  END                AS state,
  u.id               AS user_id,
  COALESCE(creds.cnt, 0) AS credential_count,
  COALESCE(sess.cnt, 0)  AS active_session_count,
  u.created_at
FROM (SELECT id, created_at FROM users WHERE email = :email) u
LEFT JOIN LATERAL (
  SELECT count(*) AS cnt
  FROM webauthn_credentials
  WHERE user_id = u.id
) creds ON true
LEFT JOIN LATERAL (
  SELECT count(*) AS cnt
  FROM customer_sessions
  WHERE user_id = u.id
    AND revoked_at IS NULL
    AND expires_at > now()
) sess ON true
;

This single query is the source of truth. If u.id IS NULL (no row), the LATERAL joins return null which COALESCE converts to 0.


6. Sequence diagram

sequenceDiagram
    participant Op as Operator (Console UI)
    participant Con as Console (beta.py)
    participant Rpt as Raptor (internal API)
    participant DB_C as Console DB
    participant DB_R as Raptor DB
    participant Mail as Postmark

    Op->>Con: POST /console/beta/testers/<id>/ghost-reset
    Con->>Con: Lookup BetaTester (404 if missing)
    Con->>Rpt: POST /api/internal/users/ghost-check {email}
    Rpt->>DB_R: Run ghost-state predicate query
    DB_R-->>Rpt: {state: "ghost_empty_shell", user_id: 42, ...}
    Rpt-->>Con: 200 {state: "ghost_empty_shell", user_id: 42}
    Con->>DB_C: INSERT audit: console.beta.ghost_reset.initiated
    Con->>Rpt: POST /api/internal/users/ghost-delete {email, actor_admin_id}
    Note over Rpt,DB_R: BEGIN TRANSACTION
    Rpt->>DB_R: Re-run ghost predicate (belt-and-suspenders)
    Rpt->>DB_R: INSERT audit_log: account.ghost_reset
    Rpt->>DB_R: DELETE FROM users WHERE id=42 (CASCADE)
    Note over Rpt,DB_R: COMMIT
    Rpt-->>Con: 204 No Content
    Con->>DB_C: UPDATE beta_walkthrough_tokens SET revoked_at=now()
    Con->>DB_C: Idempotency check: recent token?
    Con->>DB_C: INSERT new beta_walkthrough_tokens row
    Con->>Mail: _send_welcome_email(email, name, url)
    Mail-->>Con: 200
    Con->>DB_C: INSERT audit: console.beta.ghost_reset.completed
    Con-->>Op: Flash success / JSON response

7. Flag decision

Recommendation: gate behind beta_ghost_reset flag. The action is not needed by any automated path; it is a manual operator action. The flag acts as a kill switch if the endpoint causes regressions. One promote-flag migration required: 0179_promote_ghost_reset.py (revises 0178, additive, raxx-api-prod target, no soak required — this is a Console-only operator tool, not customer-facing).

The beta blueprint checks _flag_enabled() at the top of every route. Add the same check to the new route.

If the operator prefers to ship without a flag (given the small blast radius of a Console-only tool), the route can launch without the guard and the migration is skipped. Operator decides; default recommendation is flag-guarded.


8. Migrations

Component Migration Required?
Raptor (backend_v2) None Not required
Console (schema) None Not required
Console (flag promote) 0179_promote_ghost_reset.py If flag-guarded (recommended)

Rollback

No schema to roll back. If the endpoints cause regressions, flip the feature flag off (kills the route immediately without redeploy) or deploy a revert commit.


9. Rollout plan

Phase Gate Action
Dark beta_ghost_reset flag off Route registered, returns 404. No UI exposure.
Staging validation Operator flips flag on staging Run full flow against staging DB fixture. Assert: audit events written, old token revoked, new token minted, email enqueued.
Prod Operator flips flag on prod via heroku config:set After staging validates.
GA (optional) After 2+ successful prod resets Remove flag guard inline, keep flag row as no-op.

10. Security considerations

PII collected: users.email is already held by Console on beta_testers. No new PII fields are introduced. Audit events record the email and actor; they are stored in the Console audit_log under existing retention policy.

Credential replay: The ghost reset deletes a users row that has 0 credentials. No credential or recoverable secret is read, stored, or transmitted at any point.

Audit-before-mutate: console.beta.ghost_reset.initiated is written before any row mutation. If that write fails, the action does not proceed.

Server-side gate: The ghost-check predicate runs inside the Raptor ghost-delete endpoint as a re-check inside the database transaction. UI state does not gate the deletion.

RBAC: @require_role("superadmin") is the v1 gate. The endpoint checks server-side; the UI disabling the button for read-only roles is advisory.

Force flag deferral: The card's AC mentions force=true for non-ghost accounts. This is explicitly deferred (see open questions). The MVP returns 409 on any non-ghost account. The force path touches accounts with live credentials and requires its own threat model.

Kill switch: The beta_ghost_reset feature flag disables the route without a redeploy.

Secrets: No new secrets. RAPTOR_INTERNAL_TOKEN is already in the secret store and rotatable without redeploy.

Breach notification: No new PII is collected. Existing breach-notification runbook applies.

GDPR / DSR: The action deletes a users row, which is consistent with erasure rights. The beta_nda_acknowledgements row (keyed by email) is preserved — it is an immutable legal record of the NDA signing, not PII that must be erased on reset.

Retention: Console audit_log rows for this action are retained under existing policy (2 years per DPA requirement per ADR-0003).


11. Security + GDPR checklist


12. Open questions (deferred, do not block implementation)

  1. force=true for non-ghost accounts. Deferred to a follow-on card. The MVP returns 409 on any healthy account. The force path requires its own threat model because it involves deleting an account with live passkey credentials.

  2. RBAC v2 console-users-write role. The role does not exist yet. V1 ships under @require_role("superadmin"). Add @require_rbac_role("console-users-write") when the role is provisioned. Tracked in sub-card #4 below.

  3. Flag decision (ship with or without flag). Operator decides. Default recommendation is flag-guarded; blast radius is small because this is a Console-only operator tool.


13. Sub-cards

Filed on GitHub linked to this design and to #3843: