Raxx · internal docs

internal · gated

Raptor Postgres Roles Runbook

System: Raptor (raxx-api-staging, raxx-api-prod) Owner: operator / sre-agent Card: #1481 (SC-A Phase 0), #1455 (append-only invariant enforcement, pre-launch-blocker) Last incident: 2026-05-15 UTC (prod Essential-0 tier blocks pg:credentials:create — see #1455 + escalation issue) Last reviewed: 2026-05-15 UTC


Why a separate role

Raptor's DATABASE_URL credential is the Postgres owner. Running application queries under the owner credential means a bug (or a compromised process) can silently modify any table, alter schema, or wipe migration state — there is no database-level enforcement boundary.

The raptor_app role solves this:

This role is a prerequisite for every SC-A card from Phase 1 onward. Without it, REVOKE-based append-only enforcement is a no-op.


Prerequisites

bash heroku pg:info -a raxx-api-staging # Expect: Plan: Standard 0


How to tell it's broken


How to diagnose (in order)

  1. Verify the role exists in Postgres:

bash heroku pg:psql -a raxx-api-staging -c "SELECT rolname, rolcanlogin FROM pg_catalog.pg_roles WHERE rolname = 'raptor_app';" # Expected: raptor_app | t # If empty: role was never created — run the provisioning procedure below.

  1. Verify grants are intact:

bash heroku pg:psql -a raxx-api-staging -c "\dp historical_bars" | grep raptor_app # Expect: raptor_app=arwd/... (a=INSERT, r=SELECT, w=UPDATE, d=DELETE) # If absent: run the "Re-apply grants if drift occurs" procedure below.

  1. Verify the env var is set and non-empty:

bash heroku config:get RAPTOR_APP_DATABASE_URL -a raxx-api-staging | cut -c1-30 # Expect: postgres://raptor_app:... # If empty: run Step 3 of the provisioning procedure below.

  1. Verify the feature flag:

bash heroku config:get FLAG_RAPTOR_APP_ROLE_SEPARATION -a raxx-api-staging # Expected (once soak is complete): 1 # Expected (during soak period): 0


Known failure modes

Failure mode A: rds_password error on CREATE ROLE

Symptom:

ERROR: must be a member of rds_password to alter passwords

Cause: Heroku Postgres runs on AWS RDS. The DATABASE_URL owner is NOT in the rds_password group. Direct CREATE ROLE ... WITH LOGIN PASSWORD '...' fails because only rds_password members can set passwords.

Fix: Do NOT use raw SQL to create the role. Use heroku pg:credentials:create which goes through Heroku's RDS-aware path. See provisioning procedure below.

Verification: heroku pg:credentials:url DATABASE --name raptor_app -a <app> returns a valid postgres:// URL without error.


Failure mode B: Essential-tier block

Symptom:

Error: You can't create a custom credential on Essential-tier databases.

Cause: Essential-0/1/2 plans do not support the pg:credentials:create command at all. Role separation is impossible on Essential-tier.

Fix: Upgrade the Heroku Postgres addon to Standard-0 before attempting any role-separation work. Standard-0 is the minimum tier that supports this.

heroku addons:upgrade postgresql-adjacent-27271 heroku-postgresql:standard-0 -a raxx-api-staging

Verification: heroku pg:info -a raxx-api-staging shows Plan: Standard 0.


Failure mode C: Grant drift (new table missing grants)

Symptom: permission denied for table <new_table> in Raptor logs after a migration that added a new table.

Cause: GRANT ... ON ALL TABLES applies only to tables that exist at the time the statement runs. Tables created in later migrations do not inherit permissions automatically; the migration must include an explicit GRANT.

Fix: Run the GRANT block in the re-apply procedure below (it uses ON ALL TABLES IN SCHEMA public which re-covers newly added tables).

Then: update the migration file for the new table to include:

GRANT SELECT, INSERT, UPDATE, DELETE ON <new_table> TO raptor_app;
-- If the table has a sequence:
GRANT USAGE ON SEQUENCE <new_table_id_seq> TO raptor_app;

Verification: \dp <new_table> in heroku pg:psql shows raptor_app=arwd.

2026-06-01 catch-up (PR #2172): Migrations 0011-0020 created 6 tables without explicit raptor_app grants. Migration 0026_raptor_app_grant_catchup.py backfills the missing grants. After running alembic upgrade head on staging and prod, run \dp order_strategy_links (and the other 5 tables) to confirm. The regression test backend_v2/tests/integration/test_raptor_app_grant_catchup_2172.py can be run against a live Postgres to verify the full matrix.

The lint gate scripts/ci/lint_migration_grants.sh was expanded (#2172) to enforce that every op.create_table() call in a future Alembic migration includes a raptor_app GRANT. New migrations that create tables without grants will now fail CI before reaching staging.


Failure mode D: RAPTOR_APP_DATABASE_URL unset after dyno restart

Symptom: Raptor logs show connections from the DATABASE_URL owner, not raptor_app. heroku config:get RAPTOR_APP_DATABASE_URL is empty.

Cause: Heroku config vars persist across restarts. If this is empty, the var was never set or was explicitly removed (e.g., an operator ran heroku config:unset).

Fix: Re-provision via heroku pg:credentials:url and re-set the config var. See Step 3 of the provisioning procedure below.

Verification: heroku config:get RAPTOR_APP_DATABASE_URL -a <app> returns a non-empty postgres://raptor_app:... URL.


Provisioning procedure (staging-first)

Run on staging. Repeat for prod only after 72 h staging soak with no errors.

Step 1 — Verify Postgres tier

heroku pg:info -a raxx-api-staging
# Expect: Plan: Standard 0
# If Essential-tier: STOP — upgrade first (see Failure mode B above)

Step 2 — Create the role via Heroku CLI

heroku pg:credentials:create DATABASE --name raptor_app -a raxx-api-staging

DATABASE is the literal addon name (the positional argument to the CLI). --name raptor_app is the credential/role name. Do not swap them — the CLI will return Unknown database: raptor_app if the role name goes in the positional slot.

pg:credentials:create creates the Postgres role, generates a password via RDS, and grants CONNECT + SELECT on all tables automatically.

Step 3 — Retrieve and store the connection URL

heroku pg:credentials:url DATABASE --name raptor_app -a raxx-api-staging
# Copy the "Connection URL:" value (postgres://raptor_app:...) into the next step.

Set in Heroku config (stdout silenced — URL contains the password):

heroku config:set RAPTOR_APP_DATABASE_URL="<URL from above>" -a raxx-api-staging >/dev/null 2>&1

Verify the var is set without revealing the value:

heroku config:get RAPTOR_APP_DATABASE_URL -a raxx-api-staging | cut -c1-30
# Expect: postgres://raptor_app:...

Step 4 — Apply the audit role grant matrix (migration 031)

This step applies the full privilege matrix from backend_v2/db/migrations/031_audit_role_split.sql. The matrix has two tiers:

Open a psql session as the DATABASE owner:

heroku pg:psql -a raxx-api-staging

Run each block in order (all blocks are idempotent):

-- Block A: reset to clean slate
REVOKE ALL ON ALL TABLES IN SCHEMA public FROM raptor_app;
REVOKE ALL ON ALL SEQUENCES IN SCHEMA public FROM raptor_app;
REVOKE ALL ON TABLE alembic_version FROM raptor_app;

-- Block B: full DML on non-audit application tables
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE
    audit_log, backup_codes,
    bootstrap_tokens_pending, bootstrap_tokens_revoked, bootstrap_tokens_used,
    customer_sessions, email_verifications, freescout_ticket_cache,
    historical_bars, login_challenges, onboarding_state, paper_orders,
    postmark_delivery_events, postmark_inbound_dedup, quebec_waitlist,
    support_customer_map, support_pending_submissions, symbols,
    trace_events, trace_workflows, users,
    vendor_billing_alert_config, vendor_billing_alert_log,
    vendor_billing_fixed, vendor_billing_snapshots, webauthn_credentials
TO raptor_app;

-- Block C: INSERT + SELECT only on audit/history tables (append-only invariant)
GRANT INSERT, SELECT ON TABLE
    audit_archival_runs, audit_integrity_log, customer_audit_events,
    dsr_log, passkeys_history, positions_history, sessions_history,
    subscriptions_history, support_audit_log, trades_history, users_history
TO raptor_app;

-- Safety guard: explicit REVOKE of UPDATE/DELETE on audit tables
REVOKE UPDATE, DELETE ON TABLE
    audit_archival_runs, audit_integrity_log, customer_audit_events,
    dsr_log, passkeys_history, positions_history, sessions_history,
    subscriptions_history, support_audit_log, trades_history, users_history
FROM raptor_app;

-- Block D: sequence access for all INSERT-capable tables
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO raptor_app;

After applying, verify the invariant holds:

SELECT
  t.relname AS table_name,
  has_table_privilege('raptor_app', t.relname, 'INSERT') AS ins,
  has_table_privilege('raptor_app', t.relname, 'SELECT') AS sel,
  has_table_privilege('raptor_app', t.relname, 'UPDATE') AS upd,
  has_table_privilege('raptor_app', t.relname, 'DELETE') AS del
FROM pg_class t
JOIN pg_namespace n ON n.oid = t.relnamespace
WHERE n.nspname = 'public' AND t.relkind = 'r'
ORDER BY t.relname;
-- For audit/history tables: ins=t sel=t upd=f del=f
-- For application tables: ins=t sel=t upd=t del=t
-- For alembic_version: ins=f sel=f upd=f del=f

Step 5 — Verify the role

Still in psql:

SELECT rolname, rolcanlogin FROM pg_catalog.pg_roles WHERE rolname = 'raptor_app';
-- Expected: raptor_app | t

-- Spot-check DML on an application table
\dp historical_bars
-- Expected: raptor_app=arwd/... (a=INSERT r=SELECT w=UPDATE d=DELETE)

-- Confirm migration tables are locked out
\dp alembic_version
-- Expected: no raptor_app entry (or raptor_app= with no permissions)

Step 6 — Enable the flag on staging

heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=1 -a raxx-api-staging >/dev/null 2>&1

Restart and tail logs:

heroku restart -a raxx-api-staging
heroku logs -a raxx-api-staging --tail
# Watch for: no "permission denied" errors; normal request processing resumes

Run the smoke suite:

scripts/ci/run_smoke.sh --env=staging

Step 7 — Soak (72 h before prod)

Leave staging running for 72 hours. Check at 24 / 48 / 72 h:

Step 8 — Prod cutover (operator action, RM-10 gate)

Repeat Steps 1–6 for raxx-api-prod. Prod cutover is tracked as RM-10 (#1568). Do NOT perform prod cutover without operator sign-off.

heroku pg:credentials:create DATABASE --name raptor_app -a raxx-api-prod
heroku pg:credentials:url DATABASE --name raptor_app -a raxx-api-prod
heroku config:set RAPTOR_APP_DATABASE_URL="<URL>" -a raxx-api-prod >/dev/null 2>&1
heroku pg:psql -a raxx-api-prod
# (run the same GRANT/REVOKE block from Step 4)
heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=1 -a raxx-api-prod >/dev/null 2>&1
heroku restart -a raxx-api-prod
scripts/ci/run_smoke.sh --env=prod

Re-apply grants if drift occurs

If grants are missing (e.g. after a new migration added tables without explicit GRANTs), re-run the GRANT block from Step 4 against the affected environment:

heroku pg:psql -a raxx-api-staging
GRANT INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO raptor_app;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO raptor_app;
REVOKE ALL ON TABLE alembic_version FROM raptor_app;
REVOKE ALL ON TABLE schema_migrations FROM raptor_app;

This is idempotent. Running it when grants are already correct is safe.


Adding a new table after Phase 0

Every migration that creates a new table must include explicit grants:

CREATE TABLE IF NOT EXISTS my_new_table (
  id BIGSERIAL PRIMARY KEY,
  ...
);

-- Required: explicit grant for raptor_app on new tables
GRANT SELECT, INSERT, UPDATE, DELETE ON my_new_table TO raptor_app;
GRANT USAGE ON SEQUENCE my_new_table_id_seq TO raptor_app;

Omitting this grant causes permission denied for table my_new_table on the first application request that touches the new table when FLAG_RAPTOR_APP_ROLE_SEPARATION=1.


Environment variable reference

Var App Description
DATABASE_URL raxx-api-* Owner credential (Heroku-managed). Used for migrations (alembic upgrade) and emergency fallback. Never used by Raptor application code when flag is 1.
RAPTOR_APP_DATABASE_URL raxx-api-* Restricted application credential. Set by operator from heroku pg:credentials:url output. Raptor reads this at startup when FLAG_RAPTOR_APP_ROLE_SEPARATION=1.
FLAG_RAPTOR_APP_ROLE_SEPARATION raxx-api-* Feature flag. Set to 1 after role is provisioned and verified. Set to 0 to fall back to DATABASE_URL without a code deploy.

Rollback

The flag is the fast rollback mechanism. No code deploy required.

heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=0 -a raxx-api-staging >/dev/null 2>&1
heroku restart -a raxx-api-staging
# Verify Raptor is back up and using DATABASE_URL
heroku logs -a raxx-api-staging -n 20

The raptor_app role and its grants do not need to be removed for rollback. Leave them in place so re-enable is a single config:set.


Emergency stop

Take Raptor off the restricted role instantly (no data impact, no migrations):

heroku config:set FLAG_RAPTOR_APP_ROLE_SEPARATION=0 -a raxx-api-prod >/dev/null 2>&1
heroku restart -a raxx-api-prod

Escalation

Escalate to the operator when:


Connects to