ADR-0084 — Burr v2: Multi-Region OIDC Gateway with R53 Latency Routing + Auth Down Failover
Status: Accepted
Date: 2026-05-12 UTC
Deciders: Kristerpher (operator)
Scope: Post-launch Burr v2 architecture (target: post 2026-05-23)
Parent card: #1867
Refs: ADR-0082 (Terraform pipeline pattern), ADR-0083 (Infisical Google OIDC SSO via CF Access), #1859, #1864, #1866, #1868 (Auth Down UX), #1869 (security audit), terraform/modules/sso-oidc-gateway/
1. Context
What Burr v1 is
Burr v1 (PR #1866, ADR-0083) turns Cloudflare Access into an OIDC provider for downstream internal apps. CF Access, backed by Google Workspace, issues signed JWTs. Downstream apps — Infisical first, Grafana and future internal tools next — receive identity assertions without needing their own Google OAuth client registrations.
The v1 module (terraform/modules/sso-oidc-gateway/) is a Terraform-managed CF Zero Trust SaaS application (type = saas, auth_type = oidc) that exposes CF-standard OIDC endpoints:
issuer: https://<account_id>.cloudflareaccess.com
authorization_endpoint: .../sso/oidc/<client_id>/authorization
token_endpoint: .../sso/oidc/<client_id>/token
userinfo_endpoint: .../sso/oidc/<client_id>/userinfo
jwks_uri: .../cdn-cgi/access/certs
What v1 does not give us
Burr v1 sits entirely inside Cloudflare's global network. It does not have a regional deployment; there is no operator-controlled compute that can be replicated. A CF Access zone outage — partial or full — takes down all downstream SSO. For a single Infisical consumer, that is an acceptable v1 trade-off. Post-launch, as the number of downstream apps grows, the single zone becomes a hard dependency on CF's reliability posture.
The specific failure modes v1 does not address:
- CF Access zone outage: All OIDC endpoints become unreachable. Infisical (and any future consumers) fall back to their local session cache until it expires, then lock out all operators.
- CF Access misconfiguration propagation: A bad policy update applies globally and immediately. There is no regional staging ring.
- No operator-controlled OIDC signing key rotation: CF manages the signing keys. The
jwks_uriis CF-hosted. The operator cannot rotate independently. - No health-check-driven failover: Nothing monitors the OIDC endpoints and redirects traffic if they are degraded.
Why now (design only — not yet implementation)
v1 ships 2026-05-12 as part of the pre-launch sequence. v2 is post-launch work (post-2026-05-23). The design is authored now so implementation sub-cards are ready to claim after launch. The ADR establishes the architecture contract; implementation sub-cards will reference it.
2. Invariants
The following invariants from the platform's non-negotiable constraints apply to this design:
- No stored credentials. The self-owned OIDC service must not store, cache, or log credentials, tokens, or secrets in a form that could be replayed. OIDC signing keys are held in KMS; the service never has access to the private key material directly — it calls KMS Sign.
- Passkeys / WebAuthn only for user-facing authentication. Burr v2 is the identity gateway for internal tooling (Infisical, Grafana, etc.), not for end-user login. This ADR does not alter the passkey-only constraint on user-facing auth.
- Audit trail for every state change affecting permissions or data access. Every OIDC token issuance, every health-check state change, every failover event, and every JWKS rotation must emit an audit event.
- Secrets in infra, not in code. OIDC client secrets, KMS key ARNs, and downstream app registration data live in AWS SSM Parameter Store. Nothing sensitive ships in Terraform source or application code.
- Paper-first gating is not directly applicable to this component. Burr v2 is an internal identity service, not a trading execution path. This constraint is noted but does not gate this design.
- GDPR by default. Burr v2 processes identity assertions (email, name). Retention is session-scoped only. The service must not persist tokens or claims beyond the token lifetime. Operator interactive sessions are subject to the retention policies documented in ADR-0003.
3. Architecture decision summary
Deploy Burr v2 as a self-owned OIDC issuer running on AWS Lambda (Node.js or Python, choice deferred to implementer) in both us-west-2 and us-east-1. Each regional deployment is an independent compute unit with its own ALB. KMS multi-region keys provide OIDC signing key continuity across regions with a single logical key identity. Route53 latency-based routing with per-region health checks provides active-active serving when both regions are healthy and automatic failover to the surviving region when one degrades. A CloudFront + S3 static Auth Down page is the tertiary failover target when both regions are simultaneously unhealthy.
CF Access continues to serve as the upstream identity provider (Google Workspace IdP) — Burr v2 replaces CF Access as the OIDC issuer seen by downstream apps, while still delegating the upstream human authentication step to CF Access and Google.
4. Per-region stack
Compute: Lambda behind ALB (not Fargate)
Lambda is chosen over Fargate for the following reasons:
- OIDC token issuance and JWKS serving are stateless, low-latency operations with spiky demand. Lambda's cold-start characteristics are acceptable for an internal SSO path (staff-scale traffic, not end-user scale).
- Lambda provisioned concurrency eliminates cold starts for the health-check targets.
- ALB-to-Lambda integration is well-understood and reduces infrastructure surface (no ECS cluster, no task definitions, no ENI management per region).
- Cost at v1 scale (one Infisical client) is near-zero. See §10 for estimates.
Regional stack per region (us-west-2, us-east-1):
Internet Gateway / ALB (HTTPS:443, TLS terminated)
|
+-- Lambda: burr-oidc-handler
|
+-- /oidc/.well-known/openid-configuration (GET, unauthenticated)
+-- /oidc/.well-known/jwks.json (GET, unauthenticated)
+-- /oidc/authorize (GET, initiates CF Access redirect)
+-- /oidc/token (POST, exchanges CF Access code for Burr JWT)
+-- /oidc/userinfo (GET, bearer token required)
+-- /health (GET, used by R53 health check)
The ALB listener rules map path prefixes to the Lambda function. The Lambda uses provisioned concurrency (minimum 1 instance) so the health-check endpoint never cold-starts.
ALB configuration
- HTTPS:443 only. HTTP:80 redirects to HTTPS.
- TLS certificate from ACM, covering
burr.raxx.app+us-west-2.burr.raxx.app+us-east-1.burr.raxx.app. - Access logs to S3 (
raxx-burr-alb-logs-<region>) with 90-day retention (audit trail requirement). - WAF ACL attached (AWS Managed Core rule group + rate limiting; mirrors the pattern from ADR-0077).
KMS multi-region key strategy
OIDC JWTs must be signed with a stable key whose public component is discoverable via the JWKS endpoint. The key identifier (kid) in the JWKS must be consistent across both regions so that tokens issued by either region can be verified against either JWKS endpoint.
Decision: KMS multi-region key (primary in us-west-2, replica in us-east-1).
A KMS multi-region key is a single logical key (mrk-*) whose key material is replicated by KMS across specified regions. Both regional Lambda instances call kms:Sign against their region-local replica. Both replicas share the same key material and therefore produce JWTs that verify against the same public key. The JWKS endpoint in both regions returns the same public key material with the same kid.
Key lifecycle:
- Primary:
arn:aws:kms:us-west-2:<ACCOUNT>:key/mrk-<UUID> - Replica:
arn:aws:kms:us-east-1:<ACCOUNT>:key/mrk-<UUID> - Algorithm: RSA 2048, RSASSA_PKCS1_V1_5_SHA_256 (OIDC RS256)
- Rotation: annual KMS automatic rotation on the primary propagates to the replica. The JWKS endpoint serves both the active key and the previous key during the rotation window (30-day overlap). The Lambda reads the current and previous
kidvalues from SSM at startup.
Secret distribution:
- KMS key ARNs (not secret): stored in SSM at
/raxx/burr/<region>/kms_signing_key_arn(non-sensitive string). - ALB DNS names: SSM at
/raxx/burr/<region>/alb_dnsfor use in R53 record configuration. - OIDC client registrations (per downstream app): SSM at
/raxx/burr/clients/<app_name>/client_secret(SecureString). The Lambda retrieves client registrations at cold start and caches them in-memory for the Lambda container lifetime. - No secrets in environment variables, Terraform source, or application code.
5. R53 hosted zone design
Decision: records on raxx.app zone, not a separate burr.raxx.app zone
Rationale:
raxx.app DNS is managed by Cloudflare (nameservers delegated to CF). Creating a separate delegated zone for burr.raxx.app would require a Route53 hosted zone whose NS records are added to the raxx.app CF zone as a delegation. This is achievable but adds a layer: any CF zone incident could also affect the NS delegation lookup.
The alternative — keeping records inside the Cloudflare-managed raxx.app zone — means CF is in the DNS resolution path for burr.raxx.app. If CF has a zone outage, DNS for the OIDC endpoints fails even though the ALBs are healthy.
Chosen approach: Create a Route53 public hosted zone for burr.raxx.app and add NS delegation records in the Cloudflare raxx.app zone pointing to R53. This decouples the OIDC endpoint DNS from CF's DNS availability. A CF zone outage no longer affects resolution of burr.raxx.app — the NS delegation is cached by resolvers and R53 continues to serve the zone independently.
The raxx.app apex and all other subdomains remain on Cloudflare. Only the burr.raxx.app subtree delegates to R53.
Zone structure:
raxx.app (CF-managed)
burr.raxx.app NS → R53 hosted zone (4 NS records from R53)
R53 hosted zone: burr.raxx.app
burr.raxx.app LATENCY → ALB alias per region (active-active)
us-west-2.burr.raxx.app A/ALIAS → us-west-2 ALB
us-east-1.burr.raxx.app A/ALIAS → us-east-1 ALB
burr.raxx.app (failover) ALIAS → CloudFront Auth Down distribution
TTL: All latency-based records use TTL=60. The failover record TTL is irrelevant (it is an R53 Alias record and inherits the target's TTL behavior). Health check poll interval: 10 seconds. Failure threshold: 3 consecutive failures → region marked unhealthy. Expected failover time: 30–60 seconds from regional degradation to R53 stopping traffic to that region.
6. Health checks
R53 health checks are defined per region. Each region requires all of the following checks to pass to be considered healthy. R53 health checks are composed using a calculated health check (AND logic) so that a single endpoint failure marks the region unhealthy.
HC-1: OIDC discovery document
- Type: HTTPS
- Endpoint:
us-west-2.burr.raxx.app/oidc/.well-known/openid-configuration - Passing condition: HTTP 200, response body is valid JSON containing
issuer,jwks_uri,authorization_endpoint,token_endpointkeys - Why: The discovery document is the entry point for all OIDC clients. A malformed document causes client initialization failures silently.
HC-2: JWKS endpoint
- Type: HTTPS
- Endpoint:
us-west-2.burr.raxx.app/oidc/.well-known/jwks.json - Passing condition: HTTP 200, response body is valid JSON containing a
keysarray with at least one entry that haskty,kid,use,n,efields (RSA public key structure) - Why: Downstream apps cache the JWKS. A missing or malformed JWKS causes token verification failures on the next token validation.
HC-3: Token endpoint rejects malformed input without 5xx
- Type: HTTPS + string match
- Endpoint: POST
us-west-2.burr.raxx.app/oidc/tokenwith bodygrant_type=authorization_code&code=healthcheck-probe&redirect_uri=https://probe.invalid - Passing condition: HTTP 4xx (400 or 401), not 5xx. The probe is not a valid exchange — the health check only verifies the endpoint is running and handling requests without crashing.
- Why: A 5xx indicates the Lambda is running but failing internally (KMS unreachable, SSM read failure, unhandled exception). A 4xx confirms the endpoint is operating correctly and rejecting invalid input as designed.
HC-4: Lambda /health endpoint
- Type: HTTPS
- Endpoint:
us-west-2.burr.raxx.app/health - Passing condition: HTTP 200, response body includes
{"kms":"ok","ssm":"ok","region":"us-west-2"} - What the /health handler checks internally:
- KMS: calls
kms:DescribeKeyon the regional replica ARN. No signing; just confirms the key is reachable andKeyState == Enabled. - SSM: calls
ssm:GetParameteron/raxx/burr/<region>/kms_signing_key_arn. Confirms parameter is readable. - Google OIDC discovery doc: HTTP GET to
https://accounts.google.com/.well-known/openid-configuration. Confirms the upstream IdP (which CF Access uses) is reachable from the Lambda's VPC. A failure here does not necessarily mean Burr is broken (CF Access may cache the Google doc), but it is an early warning of upstream IdP issues. - Why a dedicated /health: The /health endpoint combines the sub-checks that cannot be expressed in a single R53 endpoint check, and it adds the KMS + SSM reachability signal that HC-1, HC-2, HC-3 do not cover.
HC-5: Calculated health check (the R53 gate)
R53 calculated health check (AND logic) across HC-1, HC-2, HC-3, HC-4. Only when all four pass is the region considered healthy by R53. A single sub-check failure routes traffic to the other region.
Duplicate the above four health checks for us-east-1.burr.raxx.app. Total: 8 health checks + 2 calculated checks = 10 health check resources.
7. Failover logic
Scenario A: one region unhealthy
R53 latency-based routing stops sending new DNS responses that alias to the unhealthy region's ALB alias. The surviving region absorbs 100% of new requests. In-flight sessions at the failed region are lost (OIDC is stateless; clients will re-authenticate via the surviving region). No operator action required.
Scenario B: both regions unhealthy (calculated health checks both failing)
R53 routes to the tertiary failover record: a CloudFront distribution serving the static Auth Down page. The failover record is an R53 Alias pointing to the CloudFront distribution, with routing policy = failover, record type = secondary. The primary records (latency-based, one per region) are the primary failover group. When both primaries are unhealthy, R53 falls back to the secondary.
Downstream apps (Infisical, etc.) will receive a non-OIDC response (HTML static page) and will surface an error to the operator. This is the intended degraded-state UX — the static page instructs the operator to check status.raxx.app.
Scenario C: CF Access zone outage (not a regional Burr outage)
Burr v2's Lambda and ALB remain healthy. R53 health checks pass. However, the authorization step of the OIDC flow redirects the user to CF Access (cloudflareaccess.com), which is unavailable. The token exchange fails. Burr's /oidc/authorize endpoint returns a 302 to CF Access; if CF Access is down, the browser receives a timeout or CF error page.
Mitigation: Burr v2 can detect CF Access unreachability during the /health check's Google OIDC upstream probe (since CF Access proxies through Google). The health check marks the region unhealthy if the upstream probe fails. This causes R53 to route to the other region — but if CF Access is globally down, both regions will fail HC-4 and traffic falls to Auth Down. This is the correct behavior: if the upstream identity provider is unavailable, Burr cannot issue tokens regardless of compute health.
8. Auth Down integration
Static page delivery
The Auth Down page is a CloudFront distribution backed by an S3 bucket (raxx-auth-down-static). No React, no JavaScript framework, no server-side rendering. The page must be servable when the entire backend stack is down.
S3 bucket policy: public read for the specific object path only (/auth-down/index.html, /auth-down/styles.css). No bucket-level public access. CloudFront Origin Access Control (OAC) is the delivery mechanism.
CloudFront configuration:
- Default root object:
auth-down/index.html - Cache behavior:
Cache-Control: no-store, no-cacheon all responses. The Auth Down page must never be served stale — if Burr recovers, clients should get the live OIDC endpoints, not a cached Auth Down page. - Custom error responses: 403 and 404 both return
auth-down/index.htmlwith HTTP 200 (handles direct URL access gracefully). - Alternate domain:
auth-down.burr.raxx.app(R53 alias to CloudFront). This domain is the tertiary failover record target.
Page content (copy):
Raxx — Authentication temporarily unavailable
We're working on it.
Current status and updates: status.raxx.app
If you need immediate assistance: support@raxx.app
No session state, no cookies, no JavaScript. Static HTML + minimal inline CSS only.
Cache-Control header rationale: CloudFront default TTL would serve a cached Auth Down page to returning users even after recovery. no-store prevents all caching layers from retaining the page, so the first request after recovery hits the live R53 record and resolves to the healthy Burr region.
9. OIDC client migration path
Current state (Burr v1)
Infisical is configured with:
- issuer: https://<CF_ACCOUNT_ID>.cloudflareaccess.com
- client_id: CF Access SaaS application client ID (output from terraform/modules/sso-oidc-gateway/)
- client_secret: CF Access SaaS application client secret (in SSM at /raxx/cf-access/infisical_oidc_client_secret)
- jwks_uri: https://<CF_ACCOUNT_ID>.cloudflareaccess.com/cdn-cgi/access/certs
Migration to Burr v2
When Burr v2 is deployed:
- Burr v2 is deployed and its health checks pass in both regions before any client migration.
- A new OIDC client registration for Infisical is created in Burr v2 (SSM:
/raxx/burr/clients/infisical/client_secret). Burr v2 issues a newclient_idandclient_secret. - Infisical's SSO config is updated with:
-
issuer:https://burr.raxx.app/oidc-client_id: the new Burr v2 client ID -client_secret: from Burr v2 registration -jwks_uri:https://burr.raxx.app/oidc/.well-known/jwks.json - The old CF Access SaaS application (Burr v1 Infisical instance) is kept alive for 7 days as a rollback path, then decommissioned.
Rollback if Burr v2 fails post-migration
- Revert Infisical SSO config to Burr v1 values (v1
issuer,client_id,client_secretfrom SSM). - All active Infisical sessions using Burr v2 tokens will expire and require re-authentication. Sessions are short-lived (24h max per CF Access policy); worst case the operator waits up to 24h for full rollback without forced session invalidation, or triggers manual session invalidation in Infisical.
- Burr v2 can be left running in parallel (zero downstream clients) while the issue is diagnosed.
Migration for future downstream apps
Each new downstream app (Grafana, future internal tools) follows the same pattern: register a new client in Burr v2 first, configure the app to use Burr v2 OIDC, validate, then decommission the corresponding Burr v1 CF Access SaaS application.
10. Cost estimate
Assumptions: Lambda provisioned concurrency = 1 per region. Lambda duration: 50ms average per request. Memory: 256 MB.
v1 scale: one OIDC client (Infisical only)
- Lambda invocations: ~100 OIDC token exchanges per day (operator interactive sessions) + ~8,640 health-check invocations per day (10-second poll × 4 HC types × 2 regions). Total: ~8,740 invocations/day × 365 = ~3.2M/year. Free tier covers 1M/month = 12M/year. Cost: $0.
- Lambda provisioned concurrency: 2 instances × 730 hours/month × $0.000004646/GB-s × 256 MB = ~$1.75/month × 2 regions = ~$3.50/month.
- R53 health checks: 10 health checks × $0.50/month = $5.00/month.
- R53 latency routing queries: negligible at operator scale (<1000 queries/month). <$0.01/month.
- KMS: 1 multi-region key primary + 1 replica = $1.00/month + $1.00/month + ~100 Sign requests/month (negligible). ~$2.00/month.
- ALB: 2 ALBs × $0.008/LCU-hour. At near-zero traffic: 2 × ~$16/month (ALB minimum). ~$32/month.
- CloudFront + S3 Auth Down: S3 storage negligible (<1KB). CloudFront: 1000 requests/month = $0.0001. <$1/month.
- Total v1 scale: ~$43/month.
10x scale: 10 downstream OIDC clients
- Lambda invocations scale proportionally. At 10x interactive clients: ~100K exchanges/day → still within free tier. Health checks are fixed. No Lambda cost increase.
- ALB, R53, KMS costs are unchanged.
- Total 10x scale: ~$43/month (ALB dominates; Lambda and invocation costs remain in free tier).
Note: The ALB cost ($32/month) dominates at small scale. If cost is a concern, API Gateway (HTTP API) can replace ALB at ~$1/month at this traffic level. This is flagged as an open question (§13) for implementer decision.
11. Risks and mitigations
| Risk | Blast radius | Mitigation |
|---|---|---|
| KMS multi-region key replication lag | Token signing fails in replica region during replication event | KMS replication is synchronous for Sign operations; replica key is always available once provisioned. Health check HC-4 catches KMS unavailability before R53 routes to the region. |
| Lambda cold start on health check | HC-3 (token POST) gets a 5xx during cold start, triggering false-positive failover | Provisioned concurrency eliminates cold starts on the health-check path. |
| R53 health check DNS resolution failure | Health check probes cannot reach regional ALB DNS | Health checks use IP-address-based checks against the ALB, not DNS lookups, wherever R53 supports it. ALB IPs are stable (ALB nodes). |
| Auth Down page served stale after recovery | Clients see Auth Down for minutes after Burr recovers | Cache-Control: no-store on CloudFront distribution; no CDN caching of the failover page. R53 TTL=60 means clients refresh DNS within 60 seconds of recovery. |
| NS delegation in CF zone becomes stale | CF zone change removes or corrupts the burr.raxx.app NS delegation |
NS delegation is Terraform-managed (cf-access root or a new burr-dns root). Protected by branch protection + plan-before-apply pipeline (ADR-0082). |
| OIDC client secret rotation disrupts downstream app | Client secret rotation invalidates all issued tokens mid-session | Rotation follows a mint-new → configure-downstream → revoke-old pattern (same model as Velvet ADR-0037). Old secret remains valid for 7 days during rotation window. |
| Burr v2 issues token with longer lifetime than v1 | Downstream app accepts overly-long-lived tokens, reducing revocability | Token lifetime is a config parameter per client registration. Default: 1 hour access token, 24 hour session. Mirroring CF Access session_duration from v1. |
| Single AWS account means no account-level isolation | A compromise of one region's IAM role could affect both | IAM roles are scoped per-region and per-function. KMS key policy restricts Sign permission to the Lambda execution role only. |
12. Out of scope for this ADR
- Terraform module implementation for the Burr v2 regional stack.
- Lambda function code.
- R53 record creation and management (Terraform sub-card).
- KMS key provisioning (Terraform sub-card).
- ALB provisioning (Terraform sub-card).
- Auth Down page HTML/CSS implementation (companion card #1868).
- Security audit of the OIDC implementation (companion card #1869).
- Implementation timelines and sprint assignment.
- Burr v2 Terraform pipeline workflow (follows ADR-0082 pattern; sub-card).
13. Open questions
-
ALB vs API Gateway. ALB at $16/month/region ($32 total) is the dominant cost at small scale. API Gateway HTTP API would cost ~$0.50/month at this traffic level. Trade-off: API Gateway adds a third-party in the request path; ALB gives lower latency and more granular WAF integration. Implementer should confirm cost posture with operator before provisioning ALBs.
-
VPC or not. Lambda can run in a VPC (for KMS + SSM private endpoint access) or without a VPC (uses public KMS/SSM endpoints, faster cold starts). Running in a VPC adds NAT Gateway cost (~$32/month/region if NAT GW is needed for outbound). Running without a VPC is simpler and has the same security properties if security groups and endpoint policies are correct. Decision needed before implementation.
-
R53 health check IP vs hostname. R53 HTTPS health checks against an ALB hostname vs IP address have different behavior. IP-based checks bypass DNS but require the ALB's TLS certificate to match the hostname in the
Hostheader. Confirm R53 supports hostname-in-header override for HTTPS health checks (it does;hostfield on the health check resource), and that the ALB certificate covers the per-region subdomain. -
CF Access upstream probe in HC-4. The
/healthendpoint's Google OIDC discovery check adds latency to the health check response. If Google returns a slow response, the R53 health check may time out and trigger false-positive failover. Consider caching the upstream probe result in Lambda memory (5-minute TTL) and only failing HC-4 if the cached result indicates Google has been unreachable for >5 minutes. Implementer decision. -
Burr v1 decommission timeline. After v2 migration, the CF Access SaaS application (Burr v1) should be decommissioned. A 7-day overlap window is proposed above. Confirm with operator whether a longer overlap is preferred for the initial Infisical migration.
14. Sequence diagram
sequenceDiagram
participant Client as Downstream App<br/>(Infisical)
participant R53 as Route53<br/>burr.raxx.app
participant BurrW as Burr Lambda<br/>us-west-2
participant BurrE as Burr Lambda<br/>us-east-1
participant AuthDown as CloudFront<br/>Auth Down
participant CF as Cloudflare Access<br/>(upstream IdP)
participant KMS as KMS<br/>(MRK Sign)
Note over R53: Healthy: both regions passing HCs
Client->>R53: DNS lookup burr.raxx.app
R53-->>Client: ALB alias (latency: us-west-2 wins)
Client->>BurrW: GET /oidc/.well-known/openid-configuration
BurrW-->>Client: 200 + discovery doc
Client->>BurrW: GET /oidc/authorize?...
BurrW-->>Client: 302 → CF Access authorization
Client->>CF: Google Workspace login
CF-->>Client: 302 → BurrW /oidc/token?code=...
Client->>BurrW: POST /oidc/token (code exchange)
BurrW->>KMS: Sign(JWT payload, mrk-key)
KMS-->>BurrW: Signed JWT
BurrW-->>Client: access_token + id_token
Note over R53,BurrW: us-west-2 HC fails
R53->>R53: Stop routing to us-west-2
Client->>R53: DNS lookup burr.raxx.app
R53-->>Client: ALB alias (us-east-1 only)
Client->>BurrE: POST /oidc/token
BurrE->>KMS: Sign(JWT payload, mrk-key replica)
KMS-->>BurrE: Signed JWT (same key material)
BurrE-->>Client: access_token + id_token
Note over R53,BurrE: Both regions HCs fail
R53->>R53: Route to failover record
Client->>R53: DNS lookup burr.raxx.app
R53-->>Client: CloudFront alias (Auth Down)
Client->>AuthDown: GET /
AuthDown-->>Client: 200 Static Auth Down page
15. Cross-references
- #1859 — Burr v1 deploy (Infisical OIDC SSO via CF Access)
- #1864 — terraform/cf-access state drift remediation
- #1866 — Burr v1 CF Access module PR (v2 baseline)
- #1867 — this card (parent)
- #1868 — Auth Down page UX design (companion card)
- #1869 — Burr v2 security audit (companion card)
- ADR-0082 — Terraform pipeline pattern (per-root workflow, OIDC credentials)
- ADR-0083 — Infisical Google OIDC SSO via CF Access (v1 design)
terraform/modules/sso-oidc-gateway/— Burr v1 module (v2 supersedes for self-owned issuer; v1 module remains for CF-Access-as-issuer use cases)memory/project_burr_sso_gateway.md— Burr codename and architectural role
16. Rollout plan
| Phase | Gate | Description |
|---|---|---|
| Dark | Post-2026-05-23 launch | Implementation sub-cards filed and groomed; no infra changes |
| Provision | Sub-card gate | KMS MRK, ALBs, Lambda functions deployed in both regions; no DNS changes; internal smoke test via per-region hostnames (us-west-2.burr.raxx.app, us-east-1.burr.raxx.app) |
| Health-check validation | All 10 HCs green for 48h | R53 health check resources created; monitor but do not route public traffic yet |
| Soft cutover | Operator confirmation | NS delegation added to CF raxx.app zone; burr.raxx.app begins serving from R53; Burr v1 CF Access OIDC app kept alive |
| Infisical migration | Operator action | Update Infisical SSO config to Burr v2 endpoints; validate interactive login |
| v1 decommission | 7-day overlap | Remove Burr v1 CF Access SaaS application for Infisical; keep module for other CF-Access-as-issuer use cases |