Status: RESOLVED — both staging and prod deployed, migrations applied, healthz passing Last updated: 2026-05-05 04:15 UTC Author: sre-agent
Velvet (credential-rotation service) first deploy completed across staging and prod. Three sequential blockers required three hotfix PRs before migrations could run. Both environments are now healthy with all 3 DB migrations applied.
Related issues: #1129 #1130 #1136 #1137 #1140
| Environment | App | Release | Slug | Migrations | Healthz |
|---|---|---|---|---|---|
| Staging | raxx-velvet-staging |
v15 | 2ef77cf3 |
3 applied | {"service":"velvet","status":"ok"} |
| Prod | raxx-velvet-prod |
v9 | 2ef77cf3 |
3 applied | {"service":"velvet","status":"ok"} |
Tables in both DBs: rotation_jobs, rotation_job_consumers, velvet_schema_migrations
| # | Action | Status | Notes |
|---|---|---|---|
| 1 | git pull origin main (get -e . fix from #1138) |
DONE | HEAD 9ded331 |
| 2 | git subtree push --prefix velvet heroku-velvet-staging main (v14, -e . fix) |
DONE | Staging v14, slug 86ceec1d |
| 3 | Verify staging boot / healthz | DONE | {"service":"velvet","status":"ok"} — 200 OK |
| 4 | heroku run python -m velvet.db.migrate (staging) |
BLOCKED → FIXED | ModuleNotFoundError: No module named 'sqlalchemy' — fixed by PR #1141 |
| 5 | Pull sqlalchemy fix (PR #1141, commit 98f2418) |
DONE | git fetch origin main + deploy-temp branch |
| 6 | Subtree push staging with sqlalchemy fix | DONE | Staging v15, slug 2ef77cf3 |
| 7 | Run migrations on staging | DONE | 3 migrations applied cleanly |
| 8 | Verify staging migration tables | DONE | ['rotation_job_consumers', 'rotation_jobs', 'velvet_schema_migrations'] |
| 9 | Staging healthz smoke test | DONE | {"service":"velvet","status":"ok"} — 200 |
| 10 | Subtree push to prod | DONE | Prod v9, slug 2ef77cf3 (same as staging) |
| 11 | Run migrations on prod | DONE | 3 migrations applied cleanly |
| 12 | Verify prod migration tables | DONE | ['rotation_job_consumers', 'rotation_jobs', 'velvet_schema_migrations'] |
| 13 | Prod healthz smoke test | DONE | {"service":"velvet","status":"ok"} — 200 |
raxx-velvet-staging-e . fix, v15 adds sqlalchemy)2ef77cf3 (subtree SHA from main@98f2418)web.1: up{"service":"velvet","status":"ok"} — HTTP 200raxx-velvet-staging-609f3019292a.herokuapp.comMigration output (verbatim):
2026-05-05 04:10:37,530 INFO __main__: Applying migration: 001_create_rotation_jobs_v2.sql
2026-05-05 04:10:37,618 INFO __main__: Applied migration: 001_create_rotation_jobs_v2.sql
2026-05-05 04:10:37,619 INFO __main__: Applying migration: 002_create_rotation_job_consumers.sql
2026-05-05 04:10:37,642 INFO __main__: Applied migration: 002_create_rotation_job_consumers.sql
2026-05-05 04:10:37,643 INFO __main__: Applying migration: 003_indexes.sql
2026-05-05 04:10:37,661 INFO __main__: Applied migration: 003_indexes.sql
2026-05-05 04:10:37,662 INFO __main__: Migrations complete.
Applied 3 migration(s): 001_create_rotation_jobs_v2.sql, 002_create_rotation_job_consumers.sql, 003_indexes.sql
raxx-velvet-prod2ef77cf3 (same as staging)web.1: up{"service":"velvet","status":"ok"} — HTTP 200raxx-velvet-prod-b0cea70d1b98.herokuapp.comMigration output (verbatim):
2026-05-05 04:13:13,991 INFO __main__: Applying migration: 001_create_rotation_jobs_v2.sql
2026-05-05 04:13:14,088 INFO __main__: Applied migration: 001_create_rotation_jobs_v2.sql
2026-05-05 04:13:14,089 INFO __main__: Applying migration: 002_create_rotation_job_consumers.sql
2026-05-05 04:13:14,116 INFO __main__: Applied migration: 002_create_rotation_job_consumers.sql
2026-05-05 04:13:14,117 INFO __main__: Applying migration: 003_indexes.sql
2026-05-05 04:13:14,146 INFO __main__: Applied migration: 003_indexes.sql
2026-05-05 04:13:14,146 INFO __main__: Migrations complete.
Applied 3 migration(s): 001_create_rotation_jobs_v2.sql, 002_create_rotation_job_consumers.sql, 003_indexes.sql
-e .)PR: #1138 — closed by #1136
Error: ModuleNotFoundError: No module named 'velvet' — gunicorn crash on every boot
Root cause: Heroku-24 buildpack runs only pip install -r requirements.txt, does not
run pip install . even when setup.cfg is present. Without -e . in requirements.txt,
the velvet package was never registered in site-packages.
Fix: Added -e . as last line of velvet/requirements.txt.
sqlalchemy dependencyPR: #1141 — closed by #1140
Error: ModuleNotFoundError: No module named 'sqlalchemy' — migration runner crash
Root cause: velvet/db/__init__.py imports velvet.db.models at package-import time.
velvet/db/models.py imports sqlalchemy. Neither velvet/requirements.txt nor
velvet/setup.cfg install_requires listed sqlalchemy.
Fix: Added sqlalchemy>=2.0,<3.0 to both files.
| Time (UTC) | Action | Outcome |
|---|---|---|
| 2026-05-05 01:09 | PR #1128 merged — setup.cfg + psycopg2-binary |
First subtree push attempted |
| 03:09 | v11–v13 deploys (pre--e . slug 2d03871a) |
Worker crash loop — No module named 'velvet' |
| 03:43 | Last crash before fix | Dyno state: crashed |
| 03:50 | v14 deploy — subtree push with -e . fix (PR #1138, main@9ded331) |
Slug 86ceec1d — gunicorn boots, healthz 200 |
| 03:52 | heroku run python -m velvet.db.migrate on staging |
BLOCKED — No module named 'sqlalchemy' |
| 03:53 | Escalation filed; prod deploy gated | — |
| 04:06 | PR #1141 merged — sqlalchemy fix | main@98f2418 |
| 04:09 | Subtree push to staging with sqlalchemy (temp branch → Heroku main) | v15 deployed, slug 2ef77cf3 |
| 04:10 | Migrations on staging | 3 applied in 131ms |
| 04:10 | Healthz staging | {"service":"velvet","status":"ok"} — 200 |
| 04:12 | Subtree push to prod (same slug 2ef77cf3) |
v9 deployed |
| 04:13 | Migrations on prod | 3 applied in 155ms |
| 04:13 | Healthz prod | {"service":"velvet","status":"ok"} — 200 |
| 04:15 | Deploy chain complete | Both envs healthy |
Total wall-clock time from PR #1128 merge → prod healthy: ~3h 6m (01:09 UTC merge → 04:15 UTC prod confirmed)
| Issue | Action | Status |
|---|---|---|
| #1129 | CI smoke test for velvet | OPEN — unblocked; test can now run against live DB |
| #1130 | Release phase (migrate in Procfile) |
OPEN — hold for next sprint |
| #1136 | -e . fix |
CLOSED — merged PR #1138, deployed in v14 |
| #1137 | Post-deploy healthz CI check | OPEN — hold for next sprint |
| #1140 | sqlalchemy missing from requirements |
CLOSED — merged PR #1141, deployed in v15/v9 |
git subtree push --prefix <dir> <remote> <local-branch> pushes the subtree
split to a remote branch of the same name as <local-branch>. If that name is
not main or master, Heroku skips the build. The reliable pattern is:
# Compute subtree split hash
SPLIT=$(git subtree split --prefix velvet HEAD)
# Push directly to Heroku's main
git push heroku-velvet-staging $SPLIT:refs/heads/main
Or use a local branch named main:
git subtree push --prefix velvet heroku-velvet-staging main
(This works when your current local branch is also named main.)
The velvet_schema_migrations table tracks applied migrations. Re-running
python -m velvet.db.migrate is safe — already-applied migrations are skipped.
Confirmed by design; not tested in this session.
heroku run syntaxThe correct flag syntax for non-TTY one-shot commands:
heroku run --app <appname> --no-tty "<command>"
(Not heroku run -a <appname> <cmd> — the -a flag does not exist in the
current CLI version; it reports "Nonexistent flag: -a".)
raxx-velvet-staging / raxx-velvet-staging-609f3019292a.herokuapp.comraxx-velvet-prod / raxx-velvet-prod-b0cea70d1b98.herokuapp.comdocs/ops/runbooks/velvet.md (action item #1130 follow-up)Time: 2026-05-05 04:27 UTC Severity: P2 HIGH (pre-condition for Postmark sandbox exit) Issue: https://github.com/moosequest/TradeMasterAPI/issues/1144 Author: sre-agent
Security-agent triage (2026-05-05) confirmed pm._domainkey.raxx.app returned NXDOMAIN on 8.8.8.8. The canonical Postmark DKIM selector was absent from the Cloudflare DNS zone for raxx.app, even though a date-stamped selector (20260430051323pm._domainkey.raxx.app) was already present and verified by Postmark.
DNS state at 04:20 UTC:
20260430051323pm._domainkey.raxx.app TXT k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNA... (PRESENT — CF record ID 83a6227b782ce931f97fd3d889ea28f6)
pm._domainkey.raxx.app TXT (ABSENT — NXDOMAIN)
Postmark domain state (before fix):
DKIMVerified: True (against date-stamped selector only)
DKIMHost: 20260430051323pm._domainkey.raxx.app
DKIMUpdateStatus: Verified
Fetched from GET https://api.postmarkapp.com/domains/4616861 using X-Postmark-Account-Token from vault at /MooseQuest/postmark/POSTMARK_ACCOUNT_API_KEY:
DKIMHost: 20260430051323pm._domainkey.raxx.app
DKIMTextValue: k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCxX9MjFeCMegRCNnuM0DhSgLBL7WfOAISymao02MgPq20oXEQJILhSWQP9xJLz4Ie3aMJpJJXd9cKkRQb/rn6cMxTFUrgzyHIoznWTekXf5IU0orPm4tibKe9GZL0Rr+OxVwjcZttZ4modiJeCb+m1Yg2VGkdfrYSOxiCPwE4GAQIDAQAB
This is the same value used for both the date-stamped and canonical selectors (same key pair, both selectors point to the same public key).
Created TXT record in Cloudflare via DNS-edit token (CLOUDFLARE_EDIT_DNS):
Zone: raxx.app (ID: f12dbb5cac57d5591a5058874498a6d1)
Name: pm._domainkey.raxx.app
Type: TXT
Content: k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCxX9MjFeCMegRCNnuM0DhSgLBL7WfOAISymao02MgPq20oXEQJILhSWQP9xJLz4Ie3aMJpJJXd9cKkRQb/rn6cMxTFUrgzyHIoznWTekXf5IU0orPm4tibKe9GZL0Rr+OxVwjcZttZ4modiJeCb+m1Yg2VGkdfrYSOxiCPwE4GAQIDAQAB
TTL: 1 (auto)
CF Record ID: e3963b1bf40bda34e11c99274915023c
Created: 2026-05-05T04:27:26.274254Z
API: POST https://api.cloudflare.com/client/v4/zones/f12dbb5cac57d5591a5058874498a6d1/dns_records
Checked at 04:27 UTC (~15s after record creation — Cloudflare propagated immediately):
$ dig +short TXT pm._domainkey.raxx.app @1.1.1.1
"k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCxX9MjFeCMegRCNnuM0DhSgLBL7WfOAISymao02MgPq20oXEQJILhSWQP9xJLz4Ie3aMJpJJXd9cKkRQb/rn6cMxTFUrgzyHIoznWTekXf5IU0orPm4tibKe9GZL0Rr+OxVwjcZttZ4modiJeCb+m1Yg2VGkdfrYSOxiCPwE4GAQIDAQAB"
$ dig +short TXT pm._domainkey.raxx.app @8.8.8.8
"k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCxX9MjFeCMegRCNnuM0DhSgLBL7WfOAISymao02MgPq20oXEQJILhSWQP9xJLz4Ie3aMJpJJXd9cKkRQb/rn6cMxTFUrgzyHIoznWTekXf5IU0orPm4tibKe9GZL0Rr+OxVwjcZttZ4modiJeCb+m1Yg2VGkdfrYSOxiCPwE4GAQIDAQAB"
Both public resolvers return the correct value. Local default resolver timed out (consistent with pre-existing behavior for all raxx.app DNS queries in this environment — not a propagation issue; the same timeout occurred for the pre-existing 20260430051323pm._domainkey record).
Postmark verifies DKIM against its own date-stamped selector (20260430051323pm._domainkey.raxx.app), not the canonical pm._domainkey selector. That date-stamped record was already present and verified before this remediation. Post-fix domain state:
DKIMVerified: True
DKIMHost: 20260430051323pm._domainkey.raxx.app
DKIMUpdateStatus: Verified
SPFVerified: True
ReturnPathDomainVerified: True
The pm._domainkey canonical selector is now present in DNS, satisfying the security-agent finding and unblocking sandbox-exit verification by recipient mail providers that check standard Postmark selectors.
SPF (v=spf1 a mx include:spf.mtasv.net ~all) and DMARC (v=DMARC1; p=quarantine; rua=mailto:kris@moosequest.net; fo=1) were not modified. pm-bounces.raxx.app CNAME (pm.mtasv.net) was not modified.
Kristerpher to complete Postmark sandbox-exit approval in the Postmark dashboard. Postmark's internal sandbox review does not require any further DNS changes — all three records (SPF, DKIM via date-stamped selector, Return-Path CNAME) are verified.
Issue #1144 closed — remediation applied, canonical selector live, no recurrence risk (both selectors now exist; Postmark's own selector was never missing).
Time: 2026-05-05 (approx. 17:00 UTC)
Operator action that preceded this: Kristerpher created the Operations mailbox in FreeScout Admin UI
SOP: docs/ops/runbooks/freescout-operations-mailbox-provisioning.md
Author: sre-agent
Retrieved via GET https://tickets.raxx.app/api/mailboxes with X-FreeScout-API-Key header
(auth header is X-FreeScout-API-Key, not Authorization: Bearer — FreeScout uses a custom header).
Response confirmed two mailboxes:
id=1 name=Raxx Support email=support@raxx.app
id=2 name=Operations email=ops@raxx.app
Mailbox 2 ("Operations", ops@raxx.app) matches the expected operations pattern. HTTP 200.
Note: GET /api/mailboxes/2 returns HTTP 405. Mailbox lookup must use GET /api/mailboxes (list) and filter by ID. Runbook does not call this out explicitly — updating the understanding here for reference.
/MooseQuest/freescout/FREESCOUT_OPERATIONS_MAILBOX_ID2prod/MooseQuest/freescout/ pre-existed (FREESCOUT_API_KEY and 13 other secrets already present).FREESCOUT_OPERATIONS_MAILBOX_ID, value 2 confirmed in write response.key=FREESCOUT_OPERATIONS_MAILBOX_ID value=2 confirmed.Both apps set with heroku config:set FREESCOUT_OPERATIONS_MAILBOX_ID=2 (stdout silenced per policy).
Readback verification:
| App | heroku config:get FREESCOUT_OPERATIONS_MAILBOX_ID |
Result |
|---|---|---|
raxx-console-staging |
returned | 2 |
raxx-console-prod |
returned | 2 |
heroku dyno:restart -a raxx-console-staging → "Restarting all dynos... done"
heroku dyno:restart -a raxx-console-prod → "Restarting all dynos... done"
Both dynos confirmed up (web.* present in heroku ps output).
| URL | HTTP code | Interpretation |
|---|---|---|
https://console-staging.raxx.app/health |
302 | CF Access redirect — healthy |
https://console.raxx.app/health |
302 | CF Access redirect — healthy |
Tailed raxx-console-staging logs for ~12 seconds after restart. Grep for
FREESCOUT_OPERATIONS|missing.*mailbox|mailbox.*id|WARNING|WARN|startup|error|fatal
returned zero matches. No startup warnings about the new env var.
FLAG_CONSOLE_INVESTIGATE_FROM_STATUS and FLAG_CONSOLE_ALERTS_AUTO_TICKET remain
default-off. The env var is staged idle. Operator will flip flags via /console/flags
(staging first, ~24h soak, then prod) per docs/ops/runbooks/auto-ticketing-rollout.md.
GET /api/mailboxes — id=2, name=Operations, email=ops@raxx.appFREESCOUT_OPERATIONS_MAILBOX_ID written to Infisical at /MooseQuest/freescout/raxx-console-staging (silenced)raxx-console-prod (silenced)heroku config:get returns 2 on both appsdocs/ops/runbooks/auto-ticketing-rollout.mdTime: 2026-05-05 11:11 UTC Issue: https://github.com/moosequest/TradeMasterAPI/issues/990 Epic: https://github.com/moosequest/TradeMasterAPI/issues/988 (console self-deploy) Sub-card: S2 of the self-deploy chain (#988) Author: sre-agent
Provisioned CONSOLE_CROSS_ENV_READ_TOKEN — the service token that authenticates
cross-env deploy status reads between the console apps and the CF Worker
(console-deploy-shim). This is the last open sub-card of the #988 self-deploy
epic. Token generation followed §7.4 of
docs/architecture/console-self-deploy-web-layer.md.
openssl rand -hex 32 — 64-char hex string (32 bytes entropy).Both paths updated from version 1 (placeholder) to version 2 (fresh token). Same token value in both paths — verified by SHA-256 hash comparison.
| Infisical path | Environment | Version | Status |
|---|---|---|---|
/Console/prod/CONSOLE_CROSS_ENV_READ_TOKEN |
prod | 2 | PATCHED + verified |
/Console/staging/CONSOLE_CROSS_ENV_READ_TOKEN |
prod | 2 | PATCHED + verified |
Vault host: https://vault.raxx.app
Project ID: 29b77751-f761-4afa-b3fa-2c842988f95c
| App | Result |
|---|---|
raxx-console-staging |
config:get hash match confirmed |
raxx-console-prod |
config:get hash match confirmed |
Commands used (stdout silenced per policy):
heroku config:set CONSOLE_CROSS_ENV_READ_TOKEN=<value> --app raxx-console-staging >/dev/null 2>&1
heroku config:set CONSOLE_CROSS_ENV_READ_TOKEN=<value> --app raxx-console-prod >/dev/null 2>&1
console-deploy-shiminfra/cf-workers/console-deploy-shim/wrangler.tomlCF_WORKER_DEPLOY from Infisical /MooseQuest/cloudflare/CF_WORKER_DEPLOYSuccess! Uploaded secret CONSOLE_CROSS_ENV_READ_TOKEN (wrangler 4.87.0)heroku ps:restart --app raxx-console-staging → done (web.1: up ~12s after restart)
heroku ps:restart --app raxx-console-prod → done (web.1: up ~10s after restart)
FLAG_CONSOLE_DEPLOY_XENV_READ temporarily enabled on staging for verification,
then unset.
| Test | Authorization | Expected | Actual |
|---|---|---|---|
| Flag off | Bearer valid-token | 501 | 501 |
| Valid token + unknown deploy_id (flag on) | Bearer valid-token | 404 | 404 {"error":"not_found"} |
| No token (flag on) | none | 401 | 401 {"error":"unauthorized"} |
| Wrong token (flag on) | Bearer wrongtoken | 401 | 401 {"error":"unauthorized"} |
Endpoint under test: GET /api/internal/deploys/<id>/xenv
Host: https://raxx-console-staging-58974c77617a.herokuapp.com
To rotate this token:
openssl rand -hex 32 — generate new token./Console/prod/CONSOLE_CROSS_ENV_READ_TOKEN (env: prod)./Console/staging/CONSOLE_CROSS_ENV_READ_TOKEN (env: prod, same value).heroku config:set CONSOLE_CROSS_ENV_READ_TOKEN=<new> --app raxx-console-staging >/dev/null 2>&1heroku config:set CONSOLE_CROSS_ENV_READ_TOKEN=<new> --app raxx-console-prod >/dev/null 2>&1printf '%s' "$NEW_TOKEN" | CLOUDFLARE_API_TOKEN="$CF_WORKER_DEPLOY" npx wrangler@4 secret put CONSOLE_CROSS_ENV_READ_TOKEN --name=console-deploy-shimCLOUDFLARE_API_TOKEN="$CF_WORKER_DEPLOY" npx wrangler@4 deploy --config infra/cf-workers/console-deploy-shim/wrangler.tomlNote: steps 4-5 trigger automatic dyno restarts (Heroku config change). Explicit
ps:restart is not required.
FLAG_CONSOLE_DEPLOY_XENV_READ must be flipped on both consoles (operator action via /console/flags).Issue #990 closed — token provisioned in all three locations (Infisical both paths, both Heroku apps, CF Worker secret). All three hold the same freshly generated token.