SRE Provisioning Batch — 2026-05-06
Date: 2026-05-06
Author: sre-agent
Session start: 10:10 UTC
Session end: 10:45 UTC
Session environment check
| Credential | Status | Notes |
|---|---|---|
heroku auth |
VALID — kris@moosequest.net | CLI authenticated |
aws sts |
VALID — arn:aws:iam::521228113048:user/aws-developer-alpha (AdministratorAccess) | |
gh auth |
VALID — MooseQuest | scopes: admin:public_key, repo, read:org |
INFISICAL_TOKEN / vault |
STALE — CF Access error 1010 | vault.raxx.app CF Access has no service token for agent access; unblocks when #680 lands |
CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN |
STALE — HTTP 401 | Token revoked or expired |
CLOUDFLAREROLLED |
PARTIAL — zone reads work, /user/tokens/verify returns 401 | Has Zone:Read; DNS write scope unconfirmed |
Impact of stale credentials: Infisical vault writes blocked (Items 1 vault write, 2 CF billing token, 2 vault write). CF DNS write blocked (Items 5, 6 partial).
Item 1 — Rotate Heroku Platform API tokens (issue #454)
Status: PARTIAL (3/5 steps automated; vault write + old auth revoke require operator)
What was done
-
Minted new Heroku OAuth authorization via
POST /oauth/authorizationsscope=global - New auth_id:9d5ec3c4-...-408b9cb3720f- Description:raxx-platform-token-2026-05-06- Token validated againstGET /account: PASS -
Distributed new token as
HEROKU_API_KEYto all 4 target apps: -raxx-console-prod: OK (HTTP 200) -raxx-console-staging: OK (HTTP 200) -raxx-api-prod: OK (HTTP 200) -raxx-api-staging: OK (HTTP 200) -
Updated GitHub Actions repo secret
HEROKU_API_KEY— updated_at: 2026-05-06T10:27:39Z
Pending (operator required)
Vault write is blocked (CF Access service token not provisioned for vault.raxx.app). Once fixed:
# Step A — Write new token to vault
# (run from a shell with vault access, e.g., heroku run --app raxx-console-prod)
# vault path: /MooseQuest/heroku/HEROKU_PLATFORM_API_TOKEN
# New token is live in Heroku config vars and GH secret — retrieve from Heroku to write back:
# heroku config:get HEROKU_API_KEY -a raxx-console-prod
# Then store in Infisical at /MooseQuest/heroku/HEROKU_PLATFORM_API_TOKEN
# Step B — Revoke old auth (once vault is updated)
heroku authorizations:revoke ba6a2961-...-7866b505a3a6
# ba6a2961 = "GitHub Actions deploy (raxx-console + raxx-api)" — the prior automation key
# Confirm: curl -H "Authorization: Bearer <old_token>" https://api.heroku.com/account
# Expect: HTTP 401
Do not revoke ba6a2961 until vault is updated. Old and new tokens coexist on Heroku; the old one is only needed if a rollback is required.
Secret locations
| Secret | Location | Notes |
|---|---|---|
HEROKU_API_KEY (new) |
Heroku config vars on 4 apps + GH repo secret | Live as of 10:27 UTC |
/MooseQuest/heroku/HEROKU_PLATFORM_API_TOKEN |
Infisical vault | NEEDS OPERATOR WRITE |
/MooseQuest/heroku/HEROKU_API_KEY__AUTH_ID |
Infisical vault | NEEDS OPERATOR WRITE — value: 9d5ec3c4-...-408b9cb3720f |
Issue #454: leave OPEN until vault write + revoke complete.
Item 2 — Provision read-only billing API tokens (issue #759)
Status: PARTIAL (2/3 providers done; Cloudflare blocked)
Heroku billing token
Done. Dedicated Heroku authorization minted with description billing-collector-readonly-2026-05-06.
- Auth ID:
99f3c0a4-...-dfc5842ef9a8 - Scope note: Heroku exposes no read-only billing scope.
globalscope is required to reach/account/invoices/*. This token is separate from the deploy token to limit blast radius, as specified in issue #759.
Scope tradeoff documented: Heroku Platform API does not offer fine-grained billing-read-only permissions. A global-scoped token is the minimum required. Recommend operator review whether this is acceptable before activating the billing collector.
Interim storage: Vault is blocked; credentials stored in SSM until vault access is restored:
- SSM path: /raxx/billing-readonly/heroku_billing_api_key (SecureString, us-east-1)
- SSM path: /raxx/billing-readonly/heroku_billing_auth_id (value: 99f3c0a4-...-dfc5842ef9a8)
Operator action required: Once vault CF Access is fixed, migrate both values to Infisical at /Raxx/Console/Billing/HEROKU_BILLING_API_KEY (create the /Raxx/Console/Billing/ folder via POST /api/v1/folders first per memory feedback_vault_folder_must_exist.md).
AWS billing credentials
Done. IAM user raxx-billing-readonly created (arn: arn:aws:iam::521228113048:user/raxx-billing-readonly).
Policy attached: raxx-billing-readonly-policy (arn: arn:aws:iam::521228113048:policy/raxx-billing-readonly-policy) — permissions: ce:Get*, ce:Describe*, ce:List*, billing:*ReadOnly, budgets:ViewBudget, cur:DescribeReportDefinitions.
Access key provisioned. Credentials stored in SSM (following feedback_aws_workloads_use_ssm_not_vault.md):
- SSM path: /raxx/billing-readonly/aws_access_key_id (SecureString, us-east-1)
- SSM path: /raxx/billing-readonly/aws_secret_access_key (SecureString, us-east-1)
Cloudflare billing token
BLOCKED. Requires an API token with User:API Tokens:Edit scope to mint a new Account.Billing:Read token. Both CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN and CLOUDFLAREROLLED are stale.
Operator steps (3 minutes in CF dashboard):
- Go to
https://dash.cloudflare.com/profile/api-tokens - Click "Create Token" → "Custom token"
- Token name:
raxx-billing-readonly - Permissions: Account > Billing > Read
- Account Resources: Include → your account
- Click "Continue to summary" → "Create Token"
- Copy the token value (shown once)
- Store in SSM:
aws ssm put-parameter --name "/raxx/billing-readonly/cloudflare_billing_token" --type SecureString --value "<token>" --region us-east-1 --overwrite >/dev/null - Also migrate to vault at
/Raxx/Console/Billing/CLOUDFLARE_BILLING_TOKENwhen vault access is restored
Issue #759: leave OPEN until CF billing token step is complete.
Item 3 — Secrets for deploy audit log (post-#1267)
Status: DONE
What was done
Generated CONSOLE_AUDIT_INGEST_TOKEN (32-byte random hex, 64-char string).
GitHub Environment secrets:
- production env: CONSOLE_AUDIT_INGEST_TOKEN — set 2026-05-06T10:29:26Z
- staging env: CONSOLE_AUDIT_INGEST_TOKEN — set 2026-05-06T10:29:28Z
GitHub Environment variables:
- staging env: CONSOLE_INTERNAL_URL = https://raxx-console-staging.herokuapp.com
- production env: CONSOLE_INTERNAL_URL = https://console.raxx.app
Heroku console apps:
- raxx-console-prod: CONSOLE_AUDIT_INGEST_TOKEN set, FLAG_CONSOLE_DEPLOY_AUDIT_INGEST=on
- raxx-console-staging: CONSOLE_AUDIT_INGEST_TOKEN set, FLAG_CONSOLE_DEPLOY_AUDIT_INGEST=on
Both apps verified via Heroku API: FLAG=on, AUDIT_TOKEN_PRESENT=True.
The same token value is set in the GH environment secret (used by deploy workflow to authenticate) and in the Heroku config var (used by the console endpoint to verify the incoming request). They match.
Item 4 — SSM + IAM for FreeScout backup (post-#1272)
Status: DONE — dry-run verified
What was done
SSM parameters (us-east-1):
- /raxx/freescout/db_password — already existed (version 1, 32 chars)
- /raxx/freescout/ssh_key — written from /tmp/lightsail_us_east_1.pem (RSA PEM, 1680 chars), SecureString, version 1
IAM user raxx-freescout-backup:
- ARN: arn:aws:iam::521228113048:user/raxx-freescout-backup
- Policy attached: raxx-freescout-backup-policy (ARN: arn:aws:iam::521228113048:policy/raxx-freescout-backup-policy)
- Permissions: ssm:GetParameter /raxx/freescout/*, s3:PutObject/GetObject/HeadObject/DeleteObject/ListBucket/CreateBucket on raxx-support-attachments, kms:GenerateDataKey/Decrypt/DescribeKey (via S3 KMS), lightsail:CreateInstanceSnapshot/GetInstanceSnapshots/DeleteInstanceSnapshot
- Access key: AKIA...DKLMQ (last 4 only — full ID stored in SSM + GH secrets)
GitHub Actions repo secrets:
- AWS_BACKUP_ACCESS_KEY_ID — set 2026-05-06T10:33:33Z
- AWS_BACKUP_SECRET_ACCESS_KEY — set 2026-05-06T10:33:34Z
Verification:
- IAM credentials validated: aws sts get-caller-identity as raxx-freescout-backup → PASS
- SSM read /raxx/freescout/db_password: accessible (version 1)
- SSM read /raxx/freescout/ssh_key: accessible (version 1)
- Dry-run workflow dispatch gh workflow run freescout-backup.yml --field dry_run=true → completed success at 2026-05-06T10:34:07Z (run ID 25430172194)
Note on verified restore: The runbook requires a verified restore against a scratch instance before marking #714 fully closed. That test requires SSH access to a restored Lightsail instance and is operator-run per the runbook.
Note on dry-run limitation: The workflow dry-run skips SSM reads (dry_run=true uses placeholder creds). The SSM access was validated separately via AWS CLI using the backup user's credentials. The first real cron run at 06:00 UTC will be the end-to-end proof.
Item 5 — Verify raxx.app domain + enable DKIM (issue #1210)
Status: PARTIAL (DNS state is favorable; DKIM record add blocked; Workspace UI steps required)
Current DNS state (as of 2026-05-06 10:40 UTC)
| Record | Status |
|---|---|
TXT raxx.app google-site-verification |
EXISTS — Ss0o0uHmeJNe-iafrf8DzI3ajM0SusH9JyUxEEl5DFY |
TXT raxx.app SPF |
EXISTS — v=spf1 include:_spf.google.com include:spf.mtasv.net ~all |
TXT _dmarc.raxx.app |
EXISTS — v=DMARC1; p=quarantine; rua=mailto:kris@moosequest.net; fo=1 |
MX raxx.app |
EXISTS — Google Workspace MX (aspmx.l.google.com + alt1-4) |
TXT google._domainkey.raxx.app |
MISSING — DKIM not yet configured |
The google-site-verification TXT is already present. Google Workspace may already show raxx.app as verified if the domain was added and this token was placed previously.
Operator checklist (Google Workspace Admin — 10 minutes)
- Open
https://admin.google.com→ Domains → Manage domains - Check if
raxx.appshows as "Verified". If yes, skip to step 5. - If not verified: in the domain entry click "Verify" → Google will check the TXT record
Ss0o0uHmeJNe-iafrf8DzI3ajM0SusH9JyUxEEl5DFYthat is already in Cloudflare DNS → confirm. - Once verified, confirm in a comment on issue #1210.
- Navigate to: Apps → Google Workspace → Gmail → Authenticate email (DKIM).
- Select domain
raxx.appfrom the dropdown. - Click "Generate new record". Default prefix
google, key length 2048-bit. - Copy the TXT record value (it will look like
v=DKIM1; k=rsa; p=MIIBIjAN...). - Add to Cloudflare DNS:
- In Cloudflare dashboard → raxx.app zone → DNS → Add record
- Type: TXT, Name:
google._domainkey, Content:<value from step 8>, TTL: Auto - (If CF DNS edit token is restored:curl -X POST "https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/dns_records" -H "Authorization: Bearer $CF_DNS_EDIT_TOKEN" -H "Content-Type: application/json" -d '{"type":"TXT","name":"google._domainkey.raxx.app","content":"<value>","ttl":1}') - Wait 5-10 min for DNS propagation.
- Back in Workspace Admin → Gmail → Authenticate email → Click "Start authentication" for
raxx.app. - Verify:
dig TXT google._domainkey.raxx.appreturns the DKIM public key. - Send a test message from a
@raxx.appGoogle address to an external address; check headers fordkim=passandspf=pass.
Issue #1210: leave OPEN until operator completes steps 2-13 above.
Item 6 — Provision ops@/billing@/no-reply@ on raxx.app (issue #1212)
Status: BLOCKED on item 5
Issue #1210 (domain verification + DKIM) must close before #1212 can proceed. The Google Workspace address provisioning requires the domain to be fully verified and active. Additionally, the raxx.app → secondary domain conversion (Group vs alias) is a one-way operation and the operator should confirm after reviewing the checklist output from item 5.
When item 5 is done, return here and follow this checklist:
Operator checklist (Google Workspace Admin — 15 minutes)
Step 1 — Convert raxx.app from alias to secondary domain (one-way)
1. Open https://admin.google.com → Domains → Manage domains
2. Find raxx.app → if it shows as "Domain alias", click the domain name → "Make this a secondary domain"
3. Confirm the warning. This is one-way. The domain becomes independently addressable.
Step 2 — Provision ops@raxx.app as a Google Group
1. Open https://admin.google.com → Directory → Groups → Create group
2. Group name: Raxx Ops, Group email: ops@raxx.app
3. Access type: Team (members can post)
4. Add Kristerpher as owner
5. Save. Verify: send a test email to ops@raxx.app and confirm it arrives in Kristerpher's inbox.
Step 3 — Provision billing@raxx.app as a Google Group (choice: Group, per operator default in task spec)
1. Same flow as ops@: Group name: Raxx Billing, Group email: billing@raxx.app
2. Add Kristerpher as owner and initial member
3. Save.
Step 4 — Configure no-reply@raxx.app as send-only
1. Open https://admin.google.com → Directory → Users → Add user
2. First name: No Reply, Last name: Raxx, Primary email: no-reply@raxx.app
3. Set a strong random password (this account will never be logged into interactively)
4. Navigate to the account → Add recovery email (optional, but recommended)
5. To make it send-only / bounce inbound: in Gmail settings for this account, set up a vacation responder with message "This address does not accept replies. For support, email support@raxx.app." — OR use a Group with external member receipt turned off.
6. The Postmark relay (already configured per #708) handles outbound; no-reply@ is a sender identity only.
Note: provisioning no-reply@ via Google Workspace API is possible but requires a service account with Domain-Wide Delegation — not available in this session. Steps 1-4 above are all achievable in the Admin UI.
Issue #1212: leave OPEN pending item 5 completion.
Summary table
| # | Item | Status | Issues |
|---|---|---|---|
| 1 | Heroku Platform API token rotation | PARTIAL — token live on 4 apps + GH, vault write + old auth revoke pending | #454 open |
| 2 | Billing read-only tokens | PARTIAL — Heroku + AWS done; CF blocked | #759 open |
| 3 | Secrets for deploy audit log (#1267) | DONE | — |
| 4 | SSM + IAM for FreeScout backup (#1272) | DONE — dry-run success | — |
| 5 | raxx.app domain verify + DKIM | PARTIAL — DNS state OK; DKIM add UI-locked | #1210 open |
| 6 | ops@/billing@/no-reply@ provisioning | BLOCKED on #1210 | #1212 open |
Blocked by: stale Cloudflare tokens
Both CLOUDFLARE_RAXX_AUTOMATION_API_TOKEN and CLOUDFLAREROLLED return HTTP 401 on /user/tokens/verify. This blocks:
- Minting new API tokens (needs API Tokens:Edit scope)
- CF billing token for item 2
- DNS writes for items 5/6 (though CLOUDFLAREROLLED can read zone data)
To unblock: Rotate CF API tokens via Cloudflare dashboard:
1. Go to https://dash.cloudflare.com/profile/api-tokens
2. For each stale token, click "Roll" to generate a new value
3. Run source scripts/ops/session-bootstrap.sh to refresh the local env
4. Re-run items 2 (CF billing token) and 5 (DKIM record add) from this batch
Blocked by: vault CF Access service token missing
Infisical vault at https://vault.raxx.app is protected by Cloudflare Access. The current session CF Access service token (CF_ACCESS_CLIENT_ID) is not authorized for the vault.raxx.app Access app (error 1010). This blocks:
- Writing new Heroku token to vault (item 1, step B)
- Writing billing tokens to vault (item 2)
To unblock: Provision a CF Access service token for vault.raxx.app per docs/ops/runbooks/cf-access-service-token-provisioning.md, then add it to the agent's environment. Issue #680 tracks this.
Secret inventory (this session)
No secret values are recorded here. This table documents WHERE secrets were written only.
| Secret | Location | Owner |
|---|---|---|
| Heroku Platform API token (new) | Heroku config vars (4 apps) + GH repo secret HEROKU_API_KEY |
Item 1 |
| Heroku billing token | SSM /raxx/billing-readonly/heroku_billing_api_key + /heroku_billing_auth_id |
Item 2 |
| AWS billing access key | SSM /raxx/billing-readonly/aws_access_key_id + /aws_secret_access_key |
Item 2 |
CONSOLE_AUDIT_INGEST_TOKEN |
GH env secrets (production + staging) + Heroku config vars (raxx-console-prod, raxx-console-staging) | Item 3 |
| FreeScout SSH key | SSM /raxx/freescout/ssh_key |
Item 4 |
| FreeScout DB password | SSM /raxx/freescout/db_password (pre-existing, verified) |
Item 4 |
| FreeScout backup IAM access key | GH repo secrets AWS_BACKUP_ACCESS_KEY_ID + AWS_BACKUP_SECRET_ACCESS_KEY |
Item 4 |