Date: 2026-05-07 Prepared by: sre-agent Symptom: Consistent Postmark bounce + open-complaint alerts appearing in the TradeMasterAPI-Notify Slack channel. Severity classification: SEV-3 (no active mail outage; alerts are accurate but their root causes are remediable; no user data loss)
Three distinct root causes are contributing to the alerts. They are independent but compound each other. The most likely dominant cause is the Postmark account sandbox state, with DKIM misalignment as a contributing amplifier prior to 2026-05-06.
| # | Root Cause | Confidence | Urgency |
|---|---|---|---|
| 1 | Postmark account still in sandbox — unapproved sends trigger automatic bounce/complaint signals | HIGH | High — act now |
| 2 | Google google._domainkey.raxx.app DKIM added 2026-05-06 but DKIM signing not yet activated in Google Workspace |
HIGH | High — complete #1210 |
| 3 | Delivery webhook endpoint is blocked by CF Access — Postmark cannot POST bounce/spam events to Raptor | CONFIRMED | Medium — blocks internal monitoring |
The Slack alerts are not coming from the Raptor delivery monitor (postmark_delivery.py). They are coming directly from Postmark's own notification system, routed to the TradeMasterAPI-Notify channel via a Postmark-configured Slack incoming webhook or email notification rule in the Postmark dashboard.
Evidence:
- FLAG_POSTMARK_DELIVERY_MONITOR is not set in raxx-api-prod config vars. The flag defaults to false. The /webhooks/postmark/delivery endpoint returns HTTP 404 to any caller.
- POSTMARK_DELIVERY_WEBHOOK_SECRET is also absent from raxx-api-prod config vars.
- A test POST https://api.raxx.app/webhooks/postmark/delivery returns HTTP 302 → Cloudflare Access login. Even if the flag were on, Postmark cannot reach this endpoint because CF Access requires authentication that Postmark's delivery webhooks do not carry.
- SLACK_BOT_TOKEN is absent from raxx-api-prod config vars — the in-process Slack DM path in _post_slack_dm() would be a no-op even if the flag were enabled.
Therefore: Postmark's own notification channel (configured directly in the Postmark dashboard under Server → Settings → Notifications) is what is reaching Slack. The operator connected a Slack webhook to TradeMasterAPI-Notify in the Postmark dashboard at some point. Postmark sends these natively, bypassing Raptor entirely.
The Postmark account is awaiting review (operator replied with description email; 1-2 business day SLA). While in sandbox, Postmark restricts outbound to verified recipients only. Attempts to send to unverified addresses result in bounce events. Postmark's sandbox bounce behavior triggers its own notification pipeline, which feeds the Slack channel.
Every transactional send attempt that hits a non-whitelisted address in sandbox mode generates a HardBounce record. If the account sent to ops@raxx.app, billing@raxx.app, or no-reply@raxx.app before those Google Groups were provisioned, those are permanent hard bounce records now. Postmark blocks all future sends to hard-bounced addresses automatically.
Timeline of email authentication state:
- 2026-04-30: Return-Path CNAME (pm-bounces.raxx.app → pm.mtasv.net) verified.
- 2026-05-05 04:27 UTC: SRE agent added pm._domainkey.raxx.app canonical DKIM record (fixing a prior NXDOMAIN). Postmark DKIM was already verified against the date-stamped selector 20260430051323pm._domainkey.raxx.app, which was always present and valid.
- 2026-05-06: Operator added google._domainkey.raxx.app DKIM record for Google Workspace signing. DNS confirms the record is now live (v=DKIM1; k=rsa; p=MIIBIjAN...).
Current DNS state (verified 2026-05-07 via Google DNS-over-HTTPS):
| Record | Status |
|---|---|
TXT raxx.app — SPF v=spf1 include:_spf.google.com include:spf.mtasv.net ~all |
LIVE |
TXT google._domainkey.raxx.app — DKIM1 RSA public key |
LIVE (added 2026-05-06) |
TXT pm._domainkey.raxx.app — Postmark DKIM public key |
LIVE (added 2026-05-05) |
TXT _dmarc.raxx.app — v=DMARC1; p=quarantine; rua=mailto:kris@moosequest.net; fo=1 |
LIVE |
CNAME pm-bounces.raxx.app → pm.mtasv.net |
LIVE |
| MX — Google Workspace servers | LIVE |
The Google DKIM record is in DNS. However, issue #1210 acceptance criteria include "DKIM signing enabled in Google Workspace" (step 6 of the operator checklist). If the operator added the TXT record but did not click "Start authentication" in the Workspace Admin console, Google is publishing a key but not signing with it. Mail from @raxx.app Google-sent addresses would have no DKIM-Signature header, causing DMARC failures under p=quarantine policy.
DMARC policy is p=quarantine, not p=none. Failed DMARC alignment causes receiving servers to quarantine (route to spam) or reject. Rejected mail generates bounce reports back to Postmark. Those reports feed Postmark's notification pipeline, which hits Slack.
ops@raxx.app hard bounce candidateIssue #1212 comments confirm that ops@, billing@, and no-reply@ on raxx.app were provisioned as Google Groups on 2026-05-06. Before that date, these addresses did not accept mail. If any automated process (FreeScout ticket response, Raptor alert, or Velvet test) sent to ops@raxx.app before 2026-05-06, Postmark recorded a HardBounce. Hard bounces are sticky: Postmark suppresses all future sends to a hard-bounced address and notifies via the configured notification channel each time a suppressed send is attempted.
This is the most operationally urgent scenario: if a high-frequency automated sender (e.g., a status poller alert, a billing notification, or FreeScout reply-to header) is pointed at ops@raxx.app, it will produce a Slack alert on every attempted send until the bounce is manually cleared in the Postmark dashboard.
The Raptor delivery webhook is not active (FLAG_POSTMARK_DELIVERY_MONITOR is off; endpoint is also blocked by CF Access). The GET /api/_internal/postmark/recent-deliveries endpoint is therefore unavailable. The Postmark API itself requires POSTMARK_SERVER_TOKEN to query outbound message events programmatically.
Operator must query the Postmark dashboard directly to see the bounce list. Steps:
https://account.postmarkapp.com/Type column: HardBounce is permanent; Transient is auto-retried; SpamComplaint is permanentAlternatively, using the Postmark Server API (requires POSTMARK_SERVER_TOKEN from vault at /MooseQuest/postmark/POSTMARK_SERVER_API_KEY):
# List active bounces (redact token before sharing output)
curl -s \
-H "X-Postmark-Server-Token: $POSTMARK_SERVER_TOKEN" \
"https://api.postmarkapp.com/bounces?count=50&offset=0" \
| python3 -m json.tool | grep -E '"Email"|"Type"|"Description"|"BouncedAt"'
# Get bounce count by type
curl -s \
-H "X-Postmark-Server-Token: $POSTMARK_SERVER_TOKEN" \
"https://api.postmarkapp.com/bounces/tags" \
| python3 -m json.tool
Do not echo the token value in any terminal with screen sharing active.
The relay path is:
Postmark event (Bounce/SpamComplaint)
→ Postmark internal notification engine
→ Postmark dashboard: Server → Settings → Notifications → Slack webhook URL
→ TradeMasterAPI-Notify Slack channel
This is entirely within Postmark's infrastructure. The Raptor delivery webhook (/webhooks/postmark/delivery) is a separate, not-yet-active parallel path that was designed to provide in-app visibility. It is NOT what is generating the Slack alerts the operator is seeing.
The Postmark-direct Slack integration was presumably configured when the operator set up the Postmark server for raxx.app. It is working correctly as designed — it alerts on every bounce and spam complaint. The alerts are accurate; the problem is that there are too many genuine bounces happening.
Primary hypothesis (confidence: HIGH): The Postmark account is in sandbox mode awaiting approval. Sandbox-mode bounces are generated for sends to non-whitelisted recipients and produce Postmark notification events. This is the most likely source of high-frequency, persistent alerts.
Secondary hypothesis (confidence: HIGH): ops@raxx.app (and possibly billing@raxx.app) received at least one send attempt before the Google Groups were provisioned on 2026-05-06. Those addresses hard-bounced. Postmark's suppression list is blocking all future sends to them and firing a notification on each retry attempt. If any recurring job is pointed at these addresses, this produces one alert per job execution.
Tertiary hypothesis (confidence: MEDIUM): DKIM signing was not fully activated in Google Workspace after the google._domainkey.raxx.app record was added on 2026-05-06. Mail signed under Google's infrastructure is failing DMARC alignment against the p=quarantine policy, causing receiving servers to reject or bounce the messages back to Postmark's delivery record. This would produce SpamComplaint or Bounce events for any mail sent from @raxx.app Google addresses.
Complete these in order. Each step can be done independently but they address different contributing causes.
The Postmark approval email was replied to on 2026-05-06. Postmark's SLA is 1-2 business days. If approval has not arrived by 2026-05-09 09:00 UTC, follow up via Postmark support chat at https://account.postmarkapp.com/support.
Verification: In the Postmark dashboard, the server badge should change from "Test" or "Sandbox" to "Active". The bounce volume should drop immediately after approval because legitimate sends to real addresses will no longer be sandbox-rejected.
ops@ and billing@Now that the Google Groups exist, Postmark's suppression records for these addresses are stale. Reactivate them:
https://account.postmarkapp.com/ → select the raxx.app serverops@raxx.app — if present, click Reactivatebilling@raxx.app — if present, click Reactivateno-reply@raxx.app — if present, click ReactivateOr via API (requires POSTMARK_SERVER_TOKEN from vault):
# Reactivate a bounced address — replace EMAIL with the actual address
curl -s -X PUT \
-H "X-Postmark-Server-Token: $POSTMARK_SERVER_TOKEN" \
-H "Content-Type: application/json" \
"https://api.postmarkapp.com/bounces/reactivate" \
-d '{"Address": "ops@raxx.app"}'
# Expect: {"Name": "ops@raxx.app", "Reactivated": true, ...}
Repeat for billing@raxx.app and no-reply@raxx.app.
After reactivation, verify the suppression list no longer contains these addresses before sending test mail.
The DNS record google._domainkey.raxx.app is live, but activation in the Workspace Admin console must be confirmed:
https://admin.google.comraxx.app@raxx.app Google-signed address to kris@moosequest.netAuthentication-Results headerdkim=pass and spf=pass are presentIf dkim=fail appears in the header, the key in DNS does not match what Google is attempting to sign with. In that case, regenerate the DKIM key in Workspace Admin (same step), copy the new TXT value, update the Cloudflare DNS record for google._domainkey.raxx.app, and start authentication again.
With DMARC at p=quarantine, misaligned DKIM causes quarantine, not hard bounce. However, some receiving servers may be configured to reject quarantined mail, which comes back as a bounce to Postmark. After completing R3:
no-reply@raxx.app (Google-signed) to kris@moosequest.netdkim=pass (google.com: domain of no-reply@raxx.app)dkim=pass, the DMARC alignment issue is resolvedThe /webhooks/postmark/delivery endpoint at api.raxx.app is behind CF Access, which returns HTTP 302 to Postmark's delivery webhook POST requests. Postmark cannot authenticate against CF Access. This means the Raptor-side delivery monitor will never receive events even after FLAG_POSTMARK_DELIVERY_MONITOR=1 is set.
This is a pre-existing structural issue separate from the alert storm, but it must be resolved before the delivery monitor can go live.
Two options:
Option A (recommended): Add a CF Access bypass rule for the webhook path
In Cloudflare Zero Trust → Access → Applications → find the api.raxx.app application → edit → add a policy bypass for path /webhooks/postmark/delivery:
- Policy type: Bypass
- Selector: IP ranges → add Postmark's sending IP ranges (documented at https://postmarkapp.com/support/article/800-ips-for-postmark-servers)
This allows Postmark's known IP ranges to bypass CF Access for the webhook path only.
Option B: Move the webhook to a separate subdomain not behind CF Access
Create webhooks.raxx.app pointing to the same Heroku app but excluded from CF Access. This is a larger change and is out of scope until the delivery monitor flag is enabled.
Operator action: implement Option A after sandbox exit (R1) and bounce clearance (R2) are done. Then:
# Enable the delivery webhook after CF Access bypass is in place
heroku config:set FLAG_POSTMARK_DELIVERY_MONITOR=1 --app raxx-api-prod >/dev/null 2>&1
heroku config:set POSTMARK_DELIVERY_WEBHOOK_SECRET=<value-from-vault> --app raxx-api-prod >/dev/null 2>&1
Vault path for the secret: /MooseQuest/postmark/POSTMARK_DELIVERY_WEBHOOK_SECRET
While R1–R4 are in progress, the Slack alerts will continue. Options:
DMARC aggregate reports are sent to kris@moosequest.net (per rua=mailto:kris@moosequest.net in the DMARC record). Reports arrive from Google's aggregate reporter (usually within 24h of the reporting day). Review the XML reports to identify:
If DKIM was misaligned before 2026-05-06 (which it was — google._domainkey was not in DNS, and Workspace signing may not have been active), the reports for 2026-05-05 and 2026-05-06 will show dkim=fail entries. Those reports confirm the contribution of DKIM misalignment to the bounce volume.
| # | Action | Owner | Due | Blocks |
|---|---|---|---|---|
| 1 | Follow up on Postmark sandbox-exit approval if not received by 2026-05-09 09:00 UTC | Kristerpher | 2026-05-09 | Bounce volume reduction |
| 2 | Clear hard bounce suppressions for ops@, billing@, no-reply@ in Postmark dashboard |
Kristerpher | 2026-05-08 | Recurring alert suppression |
| 3 | Confirm Google Workspace DKIM signing is active (not just DNS record added) — close #1210 | Kristerpher | 2026-05-08 | DMARC alignment |
| 4 | Add CF Access bypass rule for /webhooks/postmark/delivery path (Postmark IPs) |
Kristerpher | 2026-05-10 | Enables Raptor delivery monitor |
| 5 | Enable FLAG_POSTMARK_DELIVERY_MONITOR + set POSTMARK_DELIVERY_WEBHOOK_SECRET after R5 |
Kristerpher | After #4 | In-app bounce observability |
| 6 | Decide: silence Postmark-direct Slack webhook for 48h during remediation window | Kristerpher | Immediate | Alert fatigue |
| Item | State |
|---|---|
| Postmark account sandbox | Active (awaiting approval) |
Postmark DKIM pm._domainkey.raxx.app |
LIVE (verified in Postmark) |
Google DKIM google._domainkey.raxx.app |
DNS record LIVE; signing activation status unknown |
SPF raxx.app |
LIVE (include:_spf.google.com include:spf.mtasv.net ~all) |
DMARC _dmarc.raxx.app |
LIVE (p=quarantine; rua=kris@moosequest.net) |
Return-Path CNAME pm-bounces.raxx.app |
LIVE (pm.mtasv.net) |
ops@raxx.app Google Group |
Provisioned 2026-05-06 |
billing@raxx.app Google Group |
Provisioned 2026-05-06 |
no-reply@raxx.app Google Group |
Provisioned 2026-05-06 |
FLAG_POSTMARK_DELIVERY_MONITOR on raxx-api-prod |
OFF (not set) |
POSTMARK_DELIVERY_WEBHOOK_SECRET on raxx-api-prod |
NOT SET |
CF Access on /webhooks/postmark/delivery |
BLOCKING (returns 302 to Postmark) |
| Postmark → Slack notification channel | ACTIVE (direct Postmark dashboard integration) |
docs/ops/runbooks/freescout-postmark-relay.mddocs/ops/runbooks/rotation/postmark-server-token.mddocs/ops/triage/2026-05-05-velvet-first-deploy.md (Addendum — DKIM TXT record remediation #1144)pm._domainkey.raxx.app Postmark DKIM record — confirmed live and correct (2026-05-05 fix, issue #1144 closed)._spf.google.com) and Postmark (spf.mtasv.net); no changes needed.p=quarantine is correct for this stage; do not change to p=reject until DKIM is fully verified across all sending paths.pm-bounces.raxx.app Return-Path CNAME — live and correct.