Raxx · internal docs

internal · gated

pgaudit external sink runbook

System: Postgres DML audit → Heroku Logplex → S3 (raxx-heroku-logs-prod) Owner: sre-agent / operator Last incident: n/a Last reviewed: 2026-06-30 Migration: backend_v2/alembic/versions/0051_pgaudit_equivalent_dml_audit.py Related runbook: docs/ops/runbooks/heroku-log-drain-s3.md Issue: #1494 (SC-A14) ADRs: 0058, 0059, 0060, 0063


Overview

This runbook documents the DB-level DML audit layer for customer_audit_events and all *_history shadow tables, and how those events route to the external write-only log sink.

Platform constraint: pgaudit not available on standard Heroku Postgres

pgaudit requires shared_preload_libraries to be set in postgresql.conf. Heroku Postgres (managed Amazon RDS) does not expose this parameter to database users. ALTER SYSTEM SET is blocked.

Confirmed by:

SELECT * FROM pg_available_extensions WHERE name = 'pgaudit';
-- Returns 0 rows on standard Heroku Postgres plans.

See "Shield-plan upgrade path" section below if native pgaudit is required for a compliance certification.

Implemented equivalent

Two mechanisms together provide the same security property:

raptor_app or audit_archiver executes DML
          │
          ├─ DB-level trigger (trg_users_history, trg_sessions_history,
          │  trg_passkeys_history) captures row-level change → *_history table
          │  (raptor_app has REVOKE UPDATE/DELETE on *_history — append-only)
          │
          └─ Postgres statement log (log_statement = 'mod' set via ALTER ROLE)
             captures the SQL statement text
                       │
                       ▼
             Heroku Logplex (automatic — no config needed)
                       │
                       ▼
             Lambda log-drain receiver (raxx-heroku-log-drain-receiver)
                       │
                       ▼
             S3 bucket: raxx-heroku-logs-prod  ← write-only for Heroku drain
             Key layout: logs/<app>/<YYYY>/<MM>/<DD>/<epoch_ms>-<uuid>.gz

raptor_app has no AWS credentials to read, modify, or delete S3 objects. The S3 bucket has versioning enabled; even if an object is overwritten, prior versions are retained.


How to tell it's broken


How to diagnose (in order)

  1. Check drain status:

bash heroku drains --app raxx-api-prod heroku drains --app raxx-api-staging

Expected: one drain entry pointing to the Lambda Function URL. If empty or failed: follow heroku-log-drain-s3.md §Re-adding a drain.

  1. Check role log_statement setting:

bash heroku pg:psql --app raxx-api-prod

sql \drds

Expected output includes: Role Name | Database | Unset | Config --------------+----------+-------+-------- raptor_app | | | log_statement=mod audit_archiver | | | log_statement=mod

If missing: re-run the ALTER ROLE statements from the "Re-applying role log_statement" section below.

  1. Check S3 bucket for recent objects:

bash aws s3 ls s3://raxx-heroku-logs-prod/logs/raxx-api-prod/$(date +%Y/%m/%d)/ \ --recursive | tail -10

If empty: the drain isn't writing to S3. Check Lambda function logs: bash aws logs tail /aws/lambda/raxx-heroku-log-drain-receiver --since 30m

  1. Check trigger existence:

sql SELECT tgname, tgrelid::regclass, tgenabled FROM pg_trigger WHERE tgname IN ( 'trg_users_history', 'trg_sessions_history', 'trg_passkeys_history' );

Expected: 3 rows, all with tgenabled = 'O' (origin).


Known failure modes

Failure mode A: Log drain not configured

Symptom: No objects appear in S3 for the app. Cause: Heroku log drain was not added or was removed. Fix:

# Get the drain URL from SSM
DRAIN_URL=$(aws ssm get-parameter \
    --name /raxx/log-drain/drain_url \
    --with-decryption \
    --query Parameter.Value \
    --output text)

heroku drains:add "$DRAIN_URL" --app raxx-api-prod
heroku drains:add "$DRAIN_URL" --app raxx-api-staging

Verification: heroku drains --app raxx-api-prod shows the drain. Wait 60 seconds, then check S3 for new objects.

Failure mode B: Role log_statement not set

Symptom: \drds shows no log_statement entry for raptor_app. Cause: Migration 0051 may not have run, or was rolled back. Fix — Re-applying role log_statement:

heroku pg:psql --app raxx-api-prod
ALTER ROLE raptor_app SET log_statement TO 'mod';
ALTER ROLE audit_archiver SET log_statement TO 'mod';

Note: This takes effect for new connections. Existing connections retain the prior setting until they reconnect. Restart raptor dynos to force reconnection:

heroku dyno:restart --app raxx-api-prod

Verification: \drds shows log_statement=mod for both roles.

Failure mode C: Trigger missing or disabled

Symptom: Users/sessions/credentials table changes are not appearing in _history tables. Cause: Trigger was dropped or disabled (e.g., via ALTER TABLE ... DISABLE TRIGGER). Fix:*

Re-run the Alembic migration upgrade (safe — idempotent):

cd backend_v2
alembic upgrade 0051

Or manually via heroku pg:psql — copy the CREATE FUNCTION + CREATE TRIGGER blocks from migration 0051.

Verification: Run the DML smoke test below.

Failure mode D: S3 retention < 90 days

Symptom: Compliance audit flags log retention below 90-day requirement. Cause: S3 lifecycle rule misconfigured. Fix:

Check current lifecycle:

aws s3api get-bucket-lifecycle-configuration \
    --bucket raxx-heroku-logs-prod

Update the expiration rule via Terraform:

cd terraform/log-drain
# Update var.log_retention_days to 90 or higher
terraform apply

Smoke test — verify end-to-end DML logging

Run after applying migration 0051 or after any drain/role configuration change.

heroku pg:psql --app raxx-api-staging
-- Step 1: Insert a tracer audit event as raptor_app.
SET ROLE raptor_app;

INSERT INTO customer_audit_events
    (id, dimension, customer_id, actor_id, actor_type, action, at_utc)
VALUES (
    'pgaudit-smoke-' || extract(epoch from now())::bigint::text,
    'system_automated',
    0,
    'sre-smoke',
    'system_actor',
    'smoke.pgaudit_check',
    NOW()
);

RESET ROLE;

Then check the S3 drain bucket (~30 seconds propagation time):

# List recent objects
LATEST_KEY=$(aws s3 ls \
    s3://raxx-heroku-logs-prod/logs/raxx-api-staging/$(date -u +%Y/%m/%d)/ \
    --recursive \
    | sort | tail -1 | awk '{print $4}')

# Download and search for the INSERT statement
aws s3 cp "s3://raxx-heroku-logs-prod/$LATEST_KEY" /tmp/drain-check.gz
zcat /tmp/drain-check.gz | grep 'customer_audit_events' | grep 'pgaudit-smoke'

Expected: the INSERT statement text appears in the log.

To verify trigger capture:

-- Users_history should have a recent INSERT row from any users mutation.
-- Trigger test (connect as DB owner, not raptor_app):
INSERT INTO users (email, email_verified_at)
VALUES ('smoke-trigger-test@example.invalid', NULL)
ON CONFLICT DO NOTHING;

SELECT * FROM users_history
WHERE changed_at > NOW() - INTERVAL '2 minutes'
ORDER BY changed_at DESC LIMIT 5;

Expected: a row in users_history with operation = 'INSERT'.


Access control model

Actor S3 bucket access Postgres log access *_history table access
raptor_app None (no AWS creds) Cannot read/delete own logs INSERT + SELECT only
audit_archiver None (no AWS creds) Cannot read/delete own logs SELECT + UPDATE (DSR pseudonymization)
Heroku log drain (Lambda) Write only (PutObject) N/A None
Operator Read via AWS console (requires MFA) Read via CloudWatch / AWS CLI Via heroku pg:psql (owner role)
raptor_audit_compliance None None SELECT only

The S3 drain bucket uses: - Server-side encryption (SSE-S3) - Bucket versioning enabled (overwrites retain prior versions) - Block Public Access: all four options enabled - No bucket policy grants to raptor_app or any Heroku IAM identity


Log retention policy

Target: 90 days minimum for compliance.

Current S3 lifecycle rules (from terraform/log-drain/): - Tier: S3 Standard for 30 days - Transition to S3 Standard-IA at 30 days - Expiration at 90 days

Verify:

aws s3api get-bucket-lifecycle-configuration \
    --bucket raxx-heroku-logs-prod \
    --query 'Rules[*].{ID:ID,Expiration:Expiration}'

To increase retention, update terraform/log-drain/variables.tf:

variable "log_retention_days" {
  default = 90  # increase here
}

Shield-plan upgrade path (native pgaudit)

If native pgaudit becomes a compliance requirement (SOC 2 Type II, PCI DSS, HIPAA), upgrade to Heroku Postgres Shield plan:

  1. Verify Shield plan is available for the current Heroku team:

bash heroku addons:plans heroku-postgresql | grep shield

  1. Upgrade the addon:

bash heroku addons:upgrade <postgres-addon-name> shield-7 --app raxx-api-prod # (plan name may vary; shield-7 is approximately $700/mo as of 2026)

  1. After upgrade, verify pgaudit is available:

bash heroku pg:psql --app raxx-api-prod

```sql SELECT * FROM pg_available_extensions WHERE name = 'pgaudit'; -- Should return 1 row.

CREATE EXTENSION IF NOT EXISTS pgaudit;

ALTER SYSTEM SET pgaudit.log = 'write'; ALTER SYSTEM SET pgaudit.log_relation = 'on'; SELECT pg_reload_conf();

-- Scope to audit tables only (reduces log volume vs. cluster-wide logging): SECURITY LABEL FOR pgaudit ON TABLE customer_audit_events IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE users_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE subscriptions_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE positions_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE trades_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE sessions_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE passkeys_history IS 'OBJECT'; ```

  1. Verify pgaudit events appear in the log drain:

Run the smoke test above and confirm AUDIT: prefixed lines appear in the S3 drain objects.

  1. The trigger-based capture (from migration 0051) remains in place and provides a second, redundant capture path.

Escalation

Wake the operator when: - S3 drain is silent for > 15 minutes during a write-active period (no new objects, Lambda function is erroring) - A SELECT on *_history tables shows fewer rows than expected after a known DML operation on a source table (trigger may have been disabled) - \drds shows log_statement has been reset to null for raptor_app or audit_archiver (someone altered the role default without an incident ticket) - Any Sentry error involving fn_capture_*_history trigger functions

Contact: ops@raxx.app