pgaudit external sink runbook
System: Postgres DML audit → Heroku Logplex → S3 (raxx-heroku-logs-prod)
Owner: sre-agent / operator
Last incident: n/a
Last reviewed: 2026-06-30
Migration: backend_v2/alembic/versions/0051_pgaudit_equivalent_dml_audit.py
Related runbook: docs/ops/runbooks/heroku-log-drain-s3.md
Issue: #1494 (SC-A14)
ADRs: 0058, 0059, 0060, 0063
Overview
This runbook documents the DB-level DML audit layer for customer_audit_events
and all *_history shadow tables, and how those events route to the external
write-only log sink.
Platform constraint: pgaudit not available on standard Heroku Postgres
pgaudit requires shared_preload_libraries to be set in postgresql.conf.
Heroku Postgres (managed Amazon RDS) does not expose this parameter to database
users. ALTER SYSTEM SET is blocked.
Confirmed by:
SELECT * FROM pg_available_extensions WHERE name = 'pgaudit';
-- Returns 0 rows on standard Heroku Postgres plans.
See "Shield-plan upgrade path" section below if native pgaudit is required for a compliance certification.
Implemented equivalent
Two mechanisms together provide the same security property:
raptor_app or audit_archiver executes DML
│
├─ DB-level trigger (trg_users_history, trg_sessions_history,
│ trg_passkeys_history) captures row-level change → *_history table
│ (raptor_app has REVOKE UPDATE/DELETE on *_history — append-only)
│
└─ Postgres statement log (log_statement = 'mod' set via ALTER ROLE)
captures the SQL statement text
│
▼
Heroku Logplex (automatic — no config needed)
│
▼
Lambda log-drain receiver (raxx-heroku-log-drain-receiver)
│
▼
S3 bucket: raxx-heroku-logs-prod ← write-only for Heroku drain
Key layout: logs/<app>/<YYYY>/<MM>/<DD>/<epoch_ms>-<uuid>.gz
raptor_app has no AWS credentials to read, modify, or delete S3 objects. The S3 bucket has versioning enabled; even if an object is overwritten, prior versions are retained.
How to tell it's broken
- Symptom 1:
heroku drains --app raxx-api-prodreturns an empty list or a drain with statusfailed. - Symptom 2: S3 bucket
raxx-heroku-logs-prodshows no new objects underlogs/raxx-api-prod/for > 15 minutes during a write-active period. - Symptom 3:
\drdsin heroku pg:psql shows log_statement is not set to 'mod' for raptor_app. - Symptom 4: An INSERT into
customer_audit_eventsdoes not produce a log entry in S3 within 30 seconds (smoke test failure).
How to diagnose (in order)
- Check drain status:
bash
heroku drains --app raxx-api-prod
heroku drains --app raxx-api-staging
Expected: one drain entry pointing to the Lambda Function URL. If empty or failed: follow heroku-log-drain-s3.md §Re-adding a drain.
- Check role log_statement setting:
bash
heroku pg:psql --app raxx-api-prod
sql
\drds
Expected output includes:
Role Name | Database | Unset | Config
--------------+----------+-------+--------
raptor_app | | | log_statement=mod
audit_archiver | | | log_statement=mod
If missing: re-run the ALTER ROLE statements from the "Re-applying role log_statement" section below.
- Check S3 bucket for recent objects:
bash
aws s3 ls s3://raxx-heroku-logs-prod/logs/raxx-api-prod/$(date +%Y/%m/%d)/ \
--recursive | tail -10
If empty: the drain isn't writing to S3. Check Lambda function logs:
bash
aws logs tail /aws/lambda/raxx-heroku-log-drain-receiver --since 30m
- Check trigger existence:
sql
SELECT tgname, tgrelid::regclass, tgenabled
FROM pg_trigger
WHERE tgname IN (
'trg_users_history',
'trg_sessions_history',
'trg_passkeys_history'
);
Expected: 3 rows, all with tgenabled = 'O' (origin).
Known failure modes
Failure mode A: Log drain not configured
Symptom: No objects appear in S3 for the app. Cause: Heroku log drain was not added or was removed. Fix:
# Get the drain URL from SSM
DRAIN_URL=$(aws ssm get-parameter \
--name /raxx/log-drain/drain_url \
--with-decryption \
--query Parameter.Value \
--output text)
heroku drains:add "$DRAIN_URL" --app raxx-api-prod
heroku drains:add "$DRAIN_URL" --app raxx-api-staging
Verification: heroku drains --app raxx-api-prod shows the drain. Wait
60 seconds, then check S3 for new objects.
Failure mode B: Role log_statement not set
Symptom: \drds shows no log_statement entry for raptor_app.
Cause: Migration 0051 may not have run, or was rolled back.
Fix — Re-applying role log_statement:
heroku pg:psql --app raxx-api-prod
ALTER ROLE raptor_app SET log_statement TO 'mod';
ALTER ROLE audit_archiver SET log_statement TO 'mod';
Note: This takes effect for new connections. Existing connections retain the prior setting until they reconnect. Restart raptor dynos to force reconnection:
heroku dyno:restart --app raxx-api-prod
Verification: \drds shows log_statement=mod for both roles.
Failure mode C: Trigger missing or disabled
Symptom: Users/sessions/credentials table changes are not appearing in
_history tables.
Cause: Trigger was dropped or disabled (e.g., via ALTER TABLE ... DISABLE TRIGGER).
Fix:*
Re-run the Alembic migration upgrade (safe — idempotent):
cd backend_v2
alembic upgrade 0051
Or manually via heroku pg:psql — copy the CREATE FUNCTION + CREATE TRIGGER blocks from migration 0051.
Verification: Run the DML smoke test below.
Failure mode D: S3 retention < 90 days
Symptom: Compliance audit flags log retention below 90-day requirement. Cause: S3 lifecycle rule misconfigured. Fix:
Check current lifecycle:
aws s3api get-bucket-lifecycle-configuration \
--bucket raxx-heroku-logs-prod
Update the expiration rule via Terraform:
cd terraform/log-drain
# Update var.log_retention_days to 90 or higher
terraform apply
Smoke test — verify end-to-end DML logging
Run after applying migration 0051 or after any drain/role configuration change.
heroku pg:psql --app raxx-api-staging
-- Step 1: Insert a tracer audit event as raptor_app.
SET ROLE raptor_app;
INSERT INTO customer_audit_events
(id, dimension, customer_id, actor_id, actor_type, action, at_utc)
VALUES (
'pgaudit-smoke-' || extract(epoch from now())::bigint::text,
'system_automated',
0,
'sre-smoke',
'system_actor',
'smoke.pgaudit_check',
NOW()
);
RESET ROLE;
Then check the S3 drain bucket (~30 seconds propagation time):
# List recent objects
LATEST_KEY=$(aws s3 ls \
s3://raxx-heroku-logs-prod/logs/raxx-api-staging/$(date -u +%Y/%m/%d)/ \
--recursive \
| sort | tail -1 | awk '{print $4}')
# Download and search for the INSERT statement
aws s3 cp "s3://raxx-heroku-logs-prod/$LATEST_KEY" /tmp/drain-check.gz
zcat /tmp/drain-check.gz | grep 'customer_audit_events' | grep 'pgaudit-smoke'
Expected: the INSERT statement text appears in the log.
To verify trigger capture:
-- Users_history should have a recent INSERT row from any users mutation.
-- Trigger test (connect as DB owner, not raptor_app):
INSERT INTO users (email, email_verified_at)
VALUES ('smoke-trigger-test@example.invalid', NULL)
ON CONFLICT DO NOTHING;
SELECT * FROM users_history
WHERE changed_at > NOW() - INTERVAL '2 minutes'
ORDER BY changed_at DESC LIMIT 5;
Expected: a row in users_history with operation = 'INSERT'.
Access control model
| Actor | S3 bucket access | Postgres log access | *_history table access |
|---|---|---|---|
| raptor_app | None (no AWS creds) | Cannot read/delete own logs | INSERT + SELECT only |
| audit_archiver | None (no AWS creds) | Cannot read/delete own logs | SELECT + UPDATE (DSR pseudonymization) |
| Heroku log drain (Lambda) | Write only (PutObject) | N/A | None |
| Operator | Read via AWS console (requires MFA) | Read via CloudWatch / AWS CLI | Via heroku pg:psql (owner role) |
| raptor_audit_compliance | None | None | SELECT only |
The S3 drain bucket uses: - Server-side encryption (SSE-S3) - Bucket versioning enabled (overwrites retain prior versions) - Block Public Access: all four options enabled - No bucket policy grants to raptor_app or any Heroku IAM identity
Log retention policy
Target: 90 days minimum for compliance.
Current S3 lifecycle rules (from terraform/log-drain/):
- Tier: S3 Standard for 30 days
- Transition to S3 Standard-IA at 30 days
- Expiration at 90 days
Verify:
aws s3api get-bucket-lifecycle-configuration \
--bucket raxx-heroku-logs-prod \
--query 'Rules[*].{ID:ID,Expiration:Expiration}'
To increase retention, update terraform/log-drain/variables.tf:
variable "log_retention_days" {
default = 90 # increase here
}
Shield-plan upgrade path (native pgaudit)
If native pgaudit becomes a compliance requirement (SOC 2 Type II, PCI DSS, HIPAA), upgrade to Heroku Postgres Shield plan:
- Verify Shield plan is available for the current Heroku team:
bash
heroku addons:plans heroku-postgresql | grep shield
- Upgrade the addon:
bash
heroku addons:upgrade <postgres-addon-name> shield-7 --app raxx-api-prod
# (plan name may vary; shield-7 is approximately $700/mo as of 2026)
- After upgrade, verify pgaudit is available:
bash
heroku pg:psql --app raxx-api-prod
```sql SELECT * FROM pg_available_extensions WHERE name = 'pgaudit'; -- Should return 1 row.
CREATE EXTENSION IF NOT EXISTS pgaudit;
ALTER SYSTEM SET pgaudit.log = 'write'; ALTER SYSTEM SET pgaudit.log_relation = 'on'; SELECT pg_reload_conf();
-- Scope to audit tables only (reduces log volume vs. cluster-wide logging): SECURITY LABEL FOR pgaudit ON TABLE customer_audit_events IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE users_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE subscriptions_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE positions_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE trades_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE sessions_history IS 'OBJECT'; SECURITY LABEL FOR pgaudit ON TABLE passkeys_history IS 'OBJECT'; ```
- Verify pgaudit events appear in the log drain:
Run the smoke test above and confirm AUDIT: prefixed lines appear in the
S3 drain objects.
- The trigger-based capture (from migration 0051) remains in place and provides a second, redundant capture path.
Escalation
Wake the operator when:
- S3 drain is silent for > 15 minutes during a write-active period
(no new objects, Lambda function is erroring)
- A SELECT on *_history tables shows fewer rows than expected after a known
DML operation on a source table (trigger may have been disabled)
- \drds shows log_statement has been reset to null for raptor_app or
audit_archiver (someone altered the role default without an incident ticket)
- Any Sentry error involving fn_capture_*_history trigger functions
Contact: ops@raxx.app