ADR-0074: Email Delivery v1 — Hybrid Architecture (Postmark + SNS/SQS/Lambda)
Status: Accepted
Date: 2026-05-11 UTC
Supersedes / amends: ADR-0072 (v1 implementation strategy only; ADR-0072's target-state topology is unchanged)
Refs: #1657, docs/architecture/durable-email-delivery.md
Context
ADR-0072 (merged 2026-05-11 UTC, PR #1663) adopted SNS FIFO + SQS FIFO + DLQ at both layers + Lambda as the durable email primitive. It specified SES as both the inbound parser (SES receipt rules) and the outbound delivery agent (ses:SendEmail). Sub-cards #1664–#1672 were filed to execute the migration.
The operator raised a follow-up question (2026-05-11 UTC):
"If we wanted to keep Postmark in the loop — how would that work. Would that save us some work here?"
This ADR evaluates two architectures:
Architecture A (Hybrid): Keep Postmark as the delivery agent (inbound webhook parser + outbound API sender). Add the SNS/SQS/DLQ durability primitive as originally designed. Postmark is the leaf node; the durability layer wraps it.
Architecture B (Full SES): The ADR-0072 target state. SES handles receipt rules (inbound) and ses:SendEmail (outbound). Postmark is fully retired.
The honest frame: SNS/SQS/DLQ/Lambda is the durability answer regardless. SES vs Postmark is a separate question about which delivery agent sits at the leaf. Both are valid.
Decision
Adopt Architecture A (Hybrid) for v1.
Keep Postmark as the delivery leaf. Build the SNS/SQS/DLQ/Lambda durability layer around it. Defer the SES migration to post-v1. Reserve SES as the future upgrade path; do not close the door.
This is not a reversal of ADR-0072's durability architecture — it is a v1 implementation scope decision. ADR-0072's topology (SNS FIFO → SQS FIFO → DLQ at both layers → Lambda) ships in v1 exactly as specified. Only the leaf node changes.
Architecture A (Hybrid) — Topology
Inbound
Customer email
→ MX record → Google Workspace (moosequest.net / raxx.app MX)
→ forward rule → Postmark Inbound Processing
→ Postmark POSTs webhook → API Gateway endpoint (HTTPS)
→ API Gateway: validate Postmark-Token signature header
→ publish to SNS FIFO: raxx-email-inbound.fifo
├── SNS DLQ: raxx-email-inbound-sns-dlq.fifo
└── SQS FIFO: raxx-email-inbound-bridge.fifo
├── SQS DLQ: raxx-email-inbound-bridge-dlq.fifo
└── Lambda: raxx-email-inbound-bridge
└── FreeScout API POST /api/conversations
Outbound
Raptor / Console / FreeScout publisher
→ sns:Publish → SNS FIFO: raxx-email-outbound.fifo
├── SNS DLQ: raxx-email-outbound-sns-dlq.fifo
└── SQS FIFO: raxx-email-outbound-send.fifo
├── SQS DLQ: raxx-email-outbound-send-dlq.fifo
└── Lambda: raxx-email-outbound-sender
→ Postmark API: POST /email (SendEmail)
→ customer
Sub-Card Delta — What Changes, What Survives, What Gets Cut
SC-E1 (#1664): SES domain verification + DKIM + sandbox-out request
Status: CUT
Rationale: Postmark's domain verification (raxx.app), DKIM records, and sender signatures are already done (2026-05-09 UTC, project_postmark_approved.md). Postmark is already out of sandbox. The 2026-05-16 UTC deadline to initiate SES sandbox-out is no longer on the critical path. SC-E1 can be closed.
SC-E2 (#1665): Terraform email-delivery stack — SNS/SQS/DLQs/alarms/IAM
Status: SAME (minor scope adjustment)
The entire Terraform stack ships as designed: SNS FIFO topics, SQS FIFO queues, DLQs, CloudWatch alarms, IAM roles, DynamoDB dedup table, SSM parameters. Two adjustments:
raxx-email-lambda-outboundIAM role: removeses:SendEmailandses:SendRawEmail. Addssm:GetParameteron/raxx/email/postmark_server_token.raxx-email-lambda-inboundIAM role: addexecute-api:Invokeis not needed (API Gateway is the entry point, not Lambda). Add permission to verify Postmark webhook signature (implementation detail for feature-developer — HMAC withPostmark-Tokenheader).- New SSM parameter:
/raxx/email/postmark_server_token(SecureString). Replaces/raxx/email/ses_from_domain. ExistingPOSTMARK_SERVER_TOKENon Heroku apps is not the same token — Lambda needs its own server token scoped to a single Postmark server.
Everything else (SNS topics, SQS queues, DLQ topology, alarms, DynamoDB dedup table, CloudWatch alarm counts) is identical.
SC-E3 (#1666): Lambda — inbound email bridge
Status: MODIFIED (Postmark webhook instead of SES event)
Core change: the Lambda no longer parses a raw SES receipt event from S3. Instead, it receives a structured Postmark inbound webhook JSON payload (already parsed: subject, from, to, text body, HTML body, attachments as metadata). This is simpler — Postmark does the MIME parsing; SES would give raw RFC 2822 bytes that the Lambda would need to parse itself.
Changes to the Lambda:
- Input schema: Postmark inbound JSON (not SES receipt notification).
- Signature verification: validate Postmark-Token header on the API Gateway request before SNS publish (reject unsigned requests at the API Gateway layer, not Lambda).
- Attachment handling: Postmark provides attachment content and metadata inline in the webhook payload (up to 10 MB total); SES stores to S3 and provides a key. No S3 read access needed on the inbound Lambda.
- Idempotency key: use Postmark's MessageID field (equivalent to RFC 2822 Message-ID).
- Routing logic (FreeScout mailbox dispatch by To address): unchanged.
Everything else — DynamoDB dedup check, FreeScout API POST, structured logging, visibility timeout handling — is identical.
Note on F18 (SES inbound rule misconfigured): this failure mode is eliminated. Replace in the failure matrix with "Postmark webhook endpoint unreachable / API Gateway misconfigured" — same detection approach (synthetic probe, F1 unchanged).
SC-E4 (#1667): Lambda — outbound email sender
Status: MODIFIED (Postmark API call instead of ses:SendEmail)
Core change: the Lambda calls Postmark's POST /email API instead of ses:SendEmail. The structural pattern (SQS event → dedup check → send → delete message) is identical.
Changes to the Lambda:
- Postmark API call: use server token from SSM /raxx/email/postmark_server_token. The boto3 SES client call is replaced with an httpx or requests POST to https://api.postmarkapp.com/email.
- Error handling: Postmark API returns HTTP 422 for validation errors (bad From address, suppressed recipient), 429 for rate limiting, 5xx for server errors. Map these to the same retry/DLQ logic as ADR-0072 specified for SES throttling.
- Postmark rate limit: 1 000 messages/second published limit (compared to SES new-account 14/s). No reserved concurrency adjustment needed; Postmark headroom is much larger.
- correlation_id dedup logic: unchanged.
IAM: remove ses:SendEmail. Add ssm:GetParameter for Postmark token path.
SC-E5 (#1668): Migrate Raptor postmark_client.py → sns_publisher.py
Status: MODIFIED (scope reduced)
The migration now means: Raptor stops calling Postmark API directly and instead publishes to SNS. Raptor does not swap from Postmark to SES — it swaps from direct Postmark call to queued Postmark call. The Lambda (SC-E4) still calls Postmark at the end.
This is actually simpler than the full SES migration: the operator mental model stays "we use Postmark for email," only the call path changes from synchronous to async-via-queue. The sns_publisher.py wrapper and the feature flag logic are unchanged from the ADR-0072 spec.
SC-E6 (#1669): FreeScout SMTP cutover from Postmark to SES
Status: CUT
FreeScout continues using Postmark SMTP for outbound replies. No config change to FreeScout. SC-E6 can be closed.
SC-E7 (#1670): Synthetic probe + CloudWatch alarm
Status: SAME
The probe design (external sender → support@raxx.app → FreeScout conversation check) is completely delivery-agent-agnostic. No changes.
SC-E8 (#1671): DLQ redrive runbook
Status: SAME
Runbook covers SQS DLQ mechanics — independent of whether the leaf node is SES or Postmark. No changes.
SC-E9 (#1672): Postmark retirement
Status: CUT
Postmark stays. SC-E9 can be closed. If Postmark is ever retired post-v1, a new card handles it then.
New Card Required
SC-E10: API Gateway endpoint for Postmark inbound webhook → SNS publish
This work was implicit in ADR-0072 (SES receipt rules handled the inbound entry point natively). With Postmark, a new entry point is needed:
- AWS API Gateway (REST API or HTTP API) endpoint:
POST /webhooks/postmark/inbound. - Request validation: check
X-Postmark-Signatureor the static server token header. Reject 401 if invalid. - API Gateway integration: direct SNS publish (using API Gateway → SNS integration, no Lambda in between). This keeps the signature validation logic in API Gateway mapping templates, avoiding a Lambda hop just for routing.
- SSM parameter
/raxx/email/postmark_inbound_webhook_tokenfor the token validation. - Terraform additions to SC-E2 or a separate
terraform/email-delivery/api-gateway.tf. - Update Postmark inbound server webhook URL to the API Gateway endpoint.
- Retire the existing
POST /webhooks/postmark/inboundroute in Raptor (or feature-flag it off) once the new path is live.
This is a size:S card — API Gateway → SNS direct integration is a well-documented AWS pattern. Estimate: 0.5–1 day for feature-developer or sre-agent.
Engineering-Day Delta
| Card | Full SES estimate | Hybrid estimate | Days saved |
|---|---|---|---|
| SC-E1: SES domain verify + sandbox-out | 1 day (operator) | 0 (CUT) | 1 |
| SC-E2: Terraform stack | 2 days | 1.5 days (minor IAM adjustment) | 0.5 |
| SC-E3: Inbound Lambda | 3 days | 2 days (no S3/MIME; simpler Postmark JSON) | 1 |
| SC-E4: Outbound Lambda | 2.5 days | 2 days (Postmark API call vs ses:SendEmail — similar effort) | 0.5 |
| SC-E5: Raptor migration | 2 days | 1.5 days (SNS publish instead of SES client — simpler) | 0.5 |
| SC-E6: FreeScout SMTP cutover | 0.5 days | 0 (CUT) | 0.5 |
| SC-E7: Synthetic probe | 1 day | 1 day (SAME) | 0 |
| SC-E8: DLQ runbook | 0.5 days | 0.5 days (SAME) | 0 |
| SC-E9: Postmark retirement | 1 day | 0 (CUT) | 1 |
| SC-E10: API Gateway webhook entry | 0 (SES handles inbound) | 1 day (NEW) | -1 |
| Total | ~13.5 days | ~8.5 days | ~5 days (37%) |
The operator's estimate of 50–60% was slightly optimistic; the honest delta is approximately 37% reduction. The SC-E10 API Gateway work partially offsets the savings. The dominant savings are in SC-E1 (no sandbox-out wait), SC-E3 (no MIME parsing), and SC-E9 (no retirement work).
Critical path impact: The 2026-05-16 UTC deadline to initiate SES sandbox-out is eliminated. There is no longer a hard deadline gating the outbound path. The remaining critical path is SC-E2 (Terraform) → SC-E3 + SC-E10 (inbound) before 2026-05-23 UTC launch.
Failure Mode Delta
Changes to the 20-row failure matrix in docs/architecture/durable-email-delivery.md:
| Row | Change |
|---|---|
| F1 (domain/MX not verified) | SOFTENED — detection unchanged; "SES domain not verified" becomes "Postmark inbound webhook not configured"; already configured, so risk is near-zero at v1 |
| F2 (SES sandbox not lifted) | ELIMINATED — Postmark is already out of sandbox |
| F8 (SES throttles outbound) | REPLACED — "Postmark API rate limit hit (429)". Postmark published limit 1 000/s vs SES new-account 14/s. Risk is lower, not higher. Lambda error handling for 429 + exponential backoff is unchanged |
| F11 (SES signature verification) | REPLACED — "Postmark webhook token validation fails". Same detection approach: security log alert if postmark_signature_invalid count > 0. Recovery: rotate Postmark inbound webhook token in Postmark admin + update SSM |
| F15 (AWS region outage) | SOFTENED — Postmark is independent of AWS for outbound delivery. If us-east-1 goes down: (a) SQS/Lambda fails → mail queues until recovery (same as before); (b) but Postmark API itself is unaffected. The outbound path depends on SQS, not on AWS being the delivery rail. A partial improvement |
| F18 (SES inbound rule misconfigured) | REPLACED — "Postmark inbound webhook URL wrong or API Gateway misconfigured". Same detection: synthetic probe (F1 covers). Recovery: correct Postmark dashboard webhook URL |
| F19 (Lambda IAM missing ses:SendEmail) | REPLACED — "Lambda IAM missing ssm:GetParameter for Postmark token". Same detection: AccessDeniedException in Lambda logs |
| All other rows | UNCHANGED |
What We Keep vs Lose
Keep (Hybrid wins these)
- Postmark's deliverability IP reputation. Fresh SES accounts start with lower reputation than Postmark's established sending infrastructure. At v1 launch with real customers, deliverability matters immediately.
- Postmark's 45-day inbound archive. If a message falls between systems (failure mode F4), Postmark's dashboard allows replay. SES does not retain raw inbound emails by default — they go to S3 and would need a separate retention setup.
- Postmark's MIME parser. Postmark delivers structured JSON (subject, from, to, body, attachments parsed). SES inbound delivers raw RFC 2822 bytes from S3. The inbound Lambda would need to parse MIME itself — extra complexity and a potential bug surface.
- Existing operator investment. Postmark domain verification done, DKIM records in DNS, sender signatures confirmed, sandbox-out complete (2026-05-09). That work is already paid for.
- Existing Heroku env vars and vault paths for
POSTMARK_SERVER_TOKEN. Lambda will use a separate server token (scoped), but the general pattern is familiar. - No AWS Support wait. Hybrid ships the moment Terraform + Lambda code is ready.
Lose (Full SES wins these)
- Cost at scale. Postmark is $1.25/1 000 vs SES $0.10/1 000 — 12x difference. At v1 (~10k/mo), the absolute difference is ~$12/mo. At 1M/mo (post-v1 scale), it's ~$1 150/mo vs ~$100/mo. The hybrid is the right v1 choice; the SES migration is the right post-v1 choice.
- AWS-native single-vendor consistency. Hybrid introduces a dependency on Postmark that the full-SES design eliminates. If Postmark has an extended outage, mail is delayed (messages durably in SQS, but not delivered). With SES, outbound delivery is AWS-native and benefits from the same regional durability.
- "We control the whole stack" optionality. Postmark is a vendor dependency. Their pricing, reliability, ToS, and rate limits are outside operator control.
The architecture explicitly preserves the SES escape hatch. The outbound Lambda consumer is the only thing that changes in a future SES migration — everything upstream (SNS, SQS, publishers) is delivery-agent-agnostic. Switching from Postmark to SES post-v1 is a one-Lambda change + DNS/domain verification, not an architectural rebuild.
Migration Plan for Architecture A (Hybrid)
Phase 0: None needed
No SES sandbox-out, no domain re-verification. Postmark is already production-ready.
Phase 1: Terraform + inbound durability (target: before 2026-05-23 UTC)
- SC-E2: Terraform SNS/SQS/DLQ/DynamoDB/alarms/IAM + SSM parameters.
- SC-E10 (new): API Gateway endpoint, Postmark webhook → SNS publish. Update Postmark inbound webhook URL.
- SC-E3 (modified): inbound Lambda (Postmark JSON → FreeScout API).
- SC-E8: DLQ runbook.
- SC-E7: Synthetic probe Lambda + alarm.
- Cutover: update Postmark inbound server to POST to API Gateway URL instead of Raptor's
/webhooks/postmark/inbound. Feature-flagpostmark_inbound_to_freescoutOFF in Raptor.
Phase 2: Outbound queue (post-v1 acceptable)
- SC-E4 (modified): outbound Lambda (Postmark API call).
- SC-E5 (modified): Raptor
postmark_client.py→sns_publisher.py(feature-flagged). - 7-day soak period.
Phase 3 (optional, post-launch): Switch to SES
If deliverability, cost, or vendor concerns justify it:
1. SC-E1 equivalent: SES domain verify + DKIM + sandbox-out request.
2. Swap outbound Lambda consumer from Postmark API → ses:SendEmail. No publisher changes.
3. FreeScout SMTP cutover (equivalent to original SC-E6).
4. Postmark retirement.
This is now an optional, undated future card — not a v1 commitment.
Consequences
Positive
- Ships faster: ~5 engineering-days saved vs full SES; critical-path deadline eliminated.
- Preserves Postmark's deliverability, MIME parsing, and inbound archive.
- The durability architecture (SNS/SQS/DLQ/Lambda) ships exactly as designed in ADR-0072.
- SES migration remains available as a future option; the hybrid does not close the door.
Negative / Trade-offs
- Cost at scale: $12/mo additional at v1 volume; grows linearly. SES migration becomes economically compelling at ~100k/mo outbound.
- Two-vendor email dependency (Postmark for delivery, AWS for durability queue). An extended Postmark outage means messages queue in SQS but do not deliver until Postmark recovers.
- The API Gateway → SNS integration (SC-E10) is new scope that does not exist in the full-SES design.
Alternatives Considered
Architecture B: Full SES (ADR-0072 target state)
Rejected for v1 because: 1. SES sandbox-out requires AWS Support case with 24–48 h turnaround — a hard deadline constraint at 12 days to launch. 2. SES inbound delivers raw MIME to S3; the Lambda must parse RFC 2822 — added complexity and bug surface vs Postmark's structured JSON. 3. Postmark's IP reputation at launch is stronger than a fresh SES account. 4. All prior Postmark setup work (domain verification, DKIM, sandbox-out, sender signatures) was done and paid for.
Accepted as the future post-v1 migration path if cost or vendor risk justifies it.
Architecture A with SNS-direct (no API Gateway)
Alternative: Postmark POSTs directly to a Raptor endpoint, Raptor publishes to SNS. This avoids API Gateway but keeps the Raptor dyno in the inbound critical path — if Raptor restarts or is overloaded, the inbound webhook call fails. API Gateway is a more reliable entry point with retries and independent scaling.
Rejected: API Gateway cost at v1 volume is negligible (< $0.01/mo). Keeping Raptor out of the synchronous webhook path is worth it.
Notes
- This ADR amends ADR-0072 for v1 implementation scope only. The target-state topology in ADR-0072 is correct for post-v1. ADR-0072 remains Accepted.
- ADR numbering: ADR-0073 was concurrently used by the Stripe v1 home decision (also filed 2026-05-11 UTC). This ADR is therefore numbered 0074.
- SC-E1 (SES sandbox-out deadline 2026-05-16 UTC) is no longer a critical-path date. Operator does not need to act on SES before v1 launch.