Postmark runbook
System: Postmark (email delivery for raxx.app)
Owner: operator
Last incident: 2026-06-05 (see docs/ops/incidents/2026-06-05-session-auth-waitlist-postmark-exemptions.md)
Last reviewed: 2026-06-17 UTC
Related runbook: docs/ops/runbooks/postmark-delivery-monitor.md — hourly delivery-event
silence monitor (issue #3310). Closes the detection gap where the in-process webhook monitor
depends on webhooks to function. The delivery monitor polls Postmark's Messages API directly.
Architecture
Postmark is the transactional email provider for raxx.app. Two parallel notification paths exist:
Path A — Postmark native notifications (currently active):
Email event (Bounce/SpamComplaint)
-> Postmark internal notification engine
-> Postmark dashboard: Server -> Settings -> Notifications -> Slack webhook
-> TradeMasterAPI-Notify Slack channel
This path is active and has no minimum-denominator floor or dedup window.
Path B — Raptor in-process delivery monitor (currently dormant):
Postmark -> POST /webhooks/postmark/delivery (Raptor)
-> postmark_delivery_events DB table
-> _check_alert_thresholds() -> Slack DM (with 60-min suppression + min-denominator floor)
This path is dormant: FLAG_POSTMARK_DELIVERY_MONITOR=false, POSTMARK_SERVER_TOKEN is empty on raxx-api-prod. Blocked by CF Access on the webhook endpoint (issue #669).
Pre-launch posture: Path A produces per-event pings. Per feedback_pre_launch_digest_notifications.md, routine CI/cron Slack pings should be in a daily digest. Remove or reconfigure the Postmark Slack webhook when Path B (Raptor monitor) is enabled post-launch.
Vault and credentials
| Secret | Vault path | Heroku config var |
|---|---|---|
| Server API token (transactional send) | /MooseQuest/postmark/POSTMARK_SERVER_API_KEY |
POSTMARK_SERVER_TOKEN |
| Account API token (admin) | /MooseQuest/postmark/POSTMARK_ACCOUNT_API_KEY |
not set on Heroku |
| Delivery webhook secret | /MooseQuest/postmark/POSTMARK_DELIVERY_WEBHOOK_SECRET |
POSTMARK_DELIVERY_WEBHOOK_SECRET |
Fetch the server token:
# From vault (if vault is accessible):
export POSTMARK_SERVER_TOKEN=$(python3 scripts/ops/postmark_bounce_check.py 2>/dev/null || echo "")
# Or directly from Heroku (currently empty — must set first):
heroku config --app raxx-api-prod | grep POSTMARK_SERVER_TOKEN
Setting POSTMARK_SERVER_TOKEN on Heroku (post-incident procedure)
Follow these steps in order after any config:set that changes the token. Per
feedback_heroku_config_set_echoes_secrets.md, always silence stdout.
1. Fetch value from vault:
# Read from Infisical (preferred — never paste from memory):
export POSTMARK_SERVER_TOKEN=$(python3 scripts/ops/read_vault_secret.py \
/MooseQuest/postmark/POSTMARK_SERVER_API_KEY 2>/dev/null)
2. Set on Heroku (silence stdout — token must not appear in terminal history):
heroku config:set POSTMARK_SERVER_TOKEN="$POSTMARK_SERVER_TOKEN" \
--app raxx-api-prod >/dev/null 2>&1
3. Confirm the var is set (length check — does not reveal value):
heroku config:get POSTMARK_SERVER_TOKEN --app raxx-api-prod | wc -c
# Expect: a number > 5 (a valid Postmark server token is > 20 chars)
# If 0 or 1: the set did not take — repeat step 2.
4. Verify token works. Run the post-set smoke:
heroku run --app raxx-api-prod \
python -c "from api.services.postmark_client import test_postmark_token; test_postmark_token()"
Expected output: ✓ Postmark token valid server=<ServerName> id=<ServerID>
Any other output means the token is misconfigured — do not leave the runbook
until you see the ✓ line. Common failure modes:
✗ Postmark token missing— the config:set in step 2 did not take; try again.✗ Postmark token rejected — HTTP 401— wrong token value; verify vault path.✗ Network error— dyno cannot reach Postmark; check firewall / egress.
5. Repeat for raxx-api-staging (same steps, substitute raxx-api-staging).
How to tell it's broken
- Symptom 1: Slack alerts in TradeMasterAPI-Notify: "Postmark bounce alert — Bounce rate over last 1h: X% (N/M)"
- Symptom 2: Email delivery failures in FreeScout (support ticket emails not arriving at customers)
- Symptom 3:
GET /api/_internal/postmark/recent-deliveriesreturns empty or shows sustained bounce rate above 1% - Symptom 4: DMARC aggregate report (to
kris@moosequest.net) shows highdkim=failorspf=failrates
How to diagnose (in order)
-
Check Postmark dashboard — sign in at
https://account.postmarkapp.com/→ select raxx.app server - Activity tab: filter by Bounced — any recent bounces? - Suppressions: any addresses stuck in the suppression list? -
Run diagnostic script:
bash export POSTMARK_SERVER_TOKEN=<token-from-vault> python3 scripts/ops/postmark_bounce_check.py -
Check Raptor delivery monitor status (post-launch, when enabled):
bash curl -H "CF-Access-Client-Id: $CF_ACCESS_CLIENT_ID" \ -H "CF-Access-Client-Secret: $CF_ACCESS_CLIENT_SECRET" \ "https://api.raxx.app/api/_internal/postmark/recent-deliveries" -
Check alert path: the Postmark Slack webhook is configured in Postmark dashboard → Server → Settings → Notifications. If alerts are reaching Slack but the Raptor monitor shows nothing, Path A (Postmark native) is the source.
-
Check DNS authentication:
bash dig TXT _dmarc.raxx.app dig TXT pm._domainkey.raxx.app dig TXT google._domainkey.raxx.app # All should return records
Known failure modes
Failure mode A: Low-denominator alert misfire (pre-launch / low-volume)
Symptom: Repeated Slack pings "Bounce rate 100.0% (1/1)", "50.0% (1/2)", "33.3% (1/3)"
Cause: One hard-bounce event (typically ops@raxx.app suppressed pre-provisioning) stays in
Postmark's trailing window while new successful deliveries increment the denominator.
Postmark fires a notification on each new threshold crossing. No dedup window.
Fix:
1. Clear the suppression list (see "Reactivate suppressed addresses" below)
2. Remove or pause the Postmark Slack notification webhook in the dashboard
3. (Code) Raptor delivery monitor now has a minimum-denominator floor — won't fire at N<10
Verification: No new Slack pings after clearing suppressions + sending a test email
Incident: docs/ops/incidents/2026-05-13-postmark-bounce-alert-misfire.md
Failure mode B: ops@raxx.app hard-bounce (address suppressed)
Symptom: Every automated email to ops@raxx.app generates a bounce notification
Cause: ops@raxx.app was emailed before the Google Group was provisioned (pre-2026-05-06),
resulting in a hard-bounce record in Postmark's suppression list
Fix:
bash
export POSTMARK_SERVER_TOKEN=<token>
python3 scripts/ops/postmark_bounce_check.py --reactivate ops@raxx.app
# Repeat for billing@raxx.app and no-reply@raxx.app if suppressed
Verification: python3 scripts/ops/postmark_bounce_check.py --suppressions-only shows no ops@ entry
Failure mode C: DKIM signing not active in Google Workspace
Symptom: Sends from Google-signed @raxx.app addresses show dkim=fail in headers;
DMARC failures at p=quarantine cause some receiving servers to bounce mail
Cause: google._domainkey.raxx.app DNS record is live but "Start authentication" not clicked
in Workspace Admin Console
Fix:
1. https://admin.google.com → Apps → Google Workspace → Gmail → Authenticate email
2. Select raxx.app → confirm status is "Authenticating email" (green)
3. If not, click "Start authentication"
Verification: Send test email from no-reply@raxx.app → check headers for dkim=pass
Incident: docs/ops/sre-reports/2026-05-07-postmark-bounce-alerts.md
Failure mode D: Postmark account suspended / deactivated
Symptom: All sends fail with 401 or 422 from Postmark API; Activity tab shows "Server inactive"
Cause: Account spam ratio exceeded Postmark's platform threshold; account reviewed and restricted
Fix: Contact Postmark support at https://account.postmarkapp.com/support
Verification: Test send succeeds; server status shows "Active"
Failure mode E: CF Access blocks Postmark delivery webhook
Symptom: FLAG_POSTMARK_DELIVERY_MONITOR=1 is set but no events appear in recent-deliveries;
Postmark delivery webhook Activity shows "Failed" with HTTP 302
Cause: /webhooks/postmark/delivery is behind Cloudflare Access; Postmark's IPs cannot authenticate
Fix: Add CF Access bypass rule for Postmark IPs on the /webhooks/postmark/delivery path:
- CF Zero Trust → Access → Applications → api.raxx.app → add Policy: Bypass for Postmark IP ranges
- Postmark IP list: https://postmarkapp.com/support/article/800-ips-for-postmark-servers
Verification: POST https://api.raxx.app/webhooks/postmark/delivery with valid token returns 200
Reactivate suppressed addresses
export POSTMARK_SERVER_TOKEN=<token-from-vault>
# See the full suppression list
python3 scripts/ops/postmark_bounce_check.py --suppressions-only
# Reactivate specific addresses
python3 scripts/ops/postmark_bounce_check.py --reactivate ops@raxx.app
python3 scripts/ops/postmark_bounce_check.py --reactivate billing@raxx.app
python3 scripts/ops/postmark_bounce_check.py --reactivate no-reply@raxx.app
# Or via Postmark API directly:
curl -s -X PUT \
-H "X-Postmark-Server-Token: $POSTMARK_SERVER_TOKEN" \
-H "Content-Type: application/json" \
"https://api.postmarkapp.com/bounces/reactivate" \
-d '{"Address": "ops@raxx.app"}'
Enable Raptor delivery monitor (post-launch)
Prerequisites (all must be true before enabling):
1. CF Access bypass in place for Postmark IP ranges on /webhooks/postmark/delivery path
2. POSTMARK_SERVER_TOKEN set on raxx-api-prod
3. POSTMARK_DELIVERY_WEBHOOK_SECRET set on raxx-api-prod
4. Postmark dashboard → Delivery webhook URL configured to https://api.raxx.app/webhooks/postmark/delivery
5. Postmark Slack native notification webhook removed (or it will double-alert)
heroku config:set FLAG_POSTMARK_DELIVERY_MONITOR=1 --app raxx-api-prod >/dev/null 2>&1
heroku config:set POSTMARK_DELIVERY_WEBHOOK_SECRET=<value-from-vault> --app raxx-api-prod >/dev/null 2>&1
Verify:
curl -H "X-Postmark-Webhook-Token: $POSTMARK_DELIVERY_WEBHOOK_SECRET" \
-H "Content-Type: application/json" \
-d '{"RecordType":"Delivery","MessageID":"test-001","Recipient":"kris@moosequest.net"}' \
https://api.raxx.app/webhooks/postmark/delivery
# Expect: {"ok": true, "event_type": "Delivery", ...}
Alert threshold reference (Raptor in-process monitor)
| Alert | Threshold | Window | Minimum denominator | Suppression |
|---|---|---|---|---|
| Bounce rate | >1% | 1h | 10 (configurable) | 60 min in-memory |
| Spam complaint rate | >0.1% | 24h | 25 (configurable) | 60 min in-memory |
Override minimum denominators without redeploy:
heroku config:set POSTMARK_ALERT_MIN_DENOMINATOR_BOUNCE=50 --app raxx-api-prod >/dev/null 2>&1
heroku config:set POSTMARK_ALERT_MIN_DENOMINATOR_SPAM=100 --app raxx-api-prod >/dev/null 2>&1
Emergency stop
To stop all Postmark-originated Slack pings immediately:
Option A — Remove the Postmark Slack webhook (recommended):
1. https://account.postmarkapp.com/ → raxx.app server → Settings → Notifications
2. Remove the Slack webhook entry
Option B — Disable Raptor delivery monitor (if it's active):
heroku config:set FLAG_POSTMARK_DELIVERY_MONITOR=0 --app raxx-api-prod >/dev/null 2>&1
Escalation
Wake the operator when:
- Spam complaint is from an external (non-raxx.app) address — sender reputation at risk
- Postmark account is suspended or restricted
- Hard-bounce rate exceeds 5% with a denominator above 100
- Any email to the operator's personal address (kris@moosequest.net) bounces