Raxx · internal docs

internal · gated

Console deploy — manual break-glass runbook

System: raxx-console-prod (Heroku) Owner: operator / sre-agent Last incident: 2026-05-15 (see #2201 — dispatch timed out, 926-minute silent failure) Last reviewed: 2026-05-15

Context

console/ is a subdirectory of the monorepo. Heroku expects to receive a repo whose root IS the app. The canonical CI path (deploy-console.yml) handles this with git subtree split and a direct git push. This runbook documents the equivalent shell procedure for use when the CI workflow itself is unavailable (e.g. runner outage, GitHub Actions incident, break-glass scenario).

When to use this runbook

Use this only when deploy-console.yml workflow_dispatch is unavailable or has failed in a way that cannot be fixed quickly. Normal deploys go through CI.

Prerequisites

Manual deploy procedure

# 1. Confirm you are on the correct commit.
git log --oneline -5

# 2. Fetch your Heroku token from vault (or export from environment).
#    Never paste the raw token in a script stored in the repo.
HEROKU_EMAIL="kris@moosequest.net"
HEROKU_API_KEY="<token from Infisical /MooseQuest/heroku/HEROKU_API_KEY>"

# 3. Write a .netrc for authentication.
#    umask 077 ensures the file is created with 600 permissions.
umask 077
cat > "$HOME/.netrc" <<EOF
machine git.heroku.com
  login ${HEROKU_EMAIL}
  password ${HEROKU_API_KEY}
EOF
chmod 600 "$HOME/.netrc"

# 4. Produce a synthetic root commit whose tree IS the console/ subtree.
SUBTREE_SHA=$(git subtree split --prefix=console HEAD 2>/dev/null)
echo "Subtree SHA: $SUBTREE_SHA"

# 5. Push to Heroku.
git push --force \
  "https://git.heroku.com/raxx-console-prod.git" \
  "${SUBTREE_SHA}:refs/heads/main"

# 6. Verify the deploy completed.
#    Wait ~60 s for dyno restart, then:
curl -fsS https://console.raxx.app/health | python3 -m json.tool
# Expected: HTTP 200, {"status":"ok"} (or 302 redirect to CF Access login)

How to tell it's broken

Known failure modes

Failure mode A: akhileshns setRawMode crash (Node 20+)

Symptom: CI step Deploy to Heroku fails with process.stdin.setRawMode is not a function Cause: akhileshns/heroku-deploy uses an interactive TTY API that does not exist in CI runners Fix: The CI workflow now uses the netrc + git-push pattern (PR #776). If this runbook is being used it means CI itself is down — proceed with the manual steps above. Verification: curl -fsS https://console.raxx.app/health returns 200

Failure mode B: Heroku git auth rejection

Symptom: git push returns error: authentication failed or Do not authenticate with username and password using git Cause: The .netrc login is not the registered Heroku account email, OR the token is expired/revoked Fix: 1. Confirm the email matches the Heroku account: heroku auth:whoami (if CLI available) 2. Rotate the token: Heroku dashboard -> Account -> API Key -> Regenerate 3. Update the secret in Infisical and in the GitHub Environment secrets Verification: Retry the push; it should proceed without auth errors

Failure mode C: run_id_not_found_within_window — dispatch accepted but no run appears

Symptom: Console deploy modal shows "Deploy timed out — GitHub Actions run could not be matched within 90 seconds." Failure stage shows DISPATCH. All downstream stages (Smoke gate / Freeze check / Deploy / Health check) show "not reached." Log pane is empty. Cause: workflow_dispatch returned HTTP 204 (accepted) but no GH Actions run materialized in the API within the reconciler's 30-minute window. Known triggers: 1. Transient GH runner availability gap — run entered queued and disappeared before first reconciler poll. 2. Smoke gate failed before building callback fired — the run exists but never called back, so github_run_id was not back-filled. The console row times out. 3. Token scope insufficient — token lacks actions:write permission; dispatch silently accepts but queues nothing.

Diagnose:

# 1. Check if a run started at all in the dispatch window
gh run list \
  --repo raxx-app/TradeMasterAPI \
  --workflow=deploy-console.yml \
  --created '2026-05-15T06:20:00Z..2026-05-15T06:30:00Z' \
  --json databaseId,displayTitle,conclusion,createdAt,event

# 2. If a run exists but failed — check which job failed
gh run view <run_id> --repo raxx-app/TradeMasterAPI --json jobs

# 3. If no run exists — check GH status for active incidents
# https://www.githubstatus.com/

# 4. Verify token has correct scopes (needs actions:write for dispatch)
curl -H "Authorization: Bearer $GITHUB_API_DISPATCH_TOKEN" \
  https://api.github.com/rate_limit | python3 -m json.tool

Fix: If no run started due to GH transient issue, retry the dispatch from the console UI. If smoke failed, fix the smoke failure first (check the failing run logs above), then retry. No rollback needed — no Heroku push occurred. Verification: Retry dispatch succeeds and progresses past DISPATCH stage in the modal within 2 minutes.

Failure mode D: subtree split fails (no output)

Symptom: git subtree split --prefix=console HEAD exits 0 but prints nothing; push target SHA is empty Cause: The console/ directory does not exist at HEAD, or the git history has no commits touching console/ Fix: Verify the working directory and ref:

git log --oneline --follow -- console/app.py | head -3
ls console/

If console/ is present but subtree split still fails, try with --rejoin:

git subtree split --prefix=console --rejoin HEAD

Verification: SUBTREE_SHA is a 40-character SHA

Enabling runtime-dyno-metadata (one-time setup)

Required for HEROKU_SLUG_COMMIT to be available in the dyno environment (used by the version-footer commit link per #775). Run once per app:

heroku labs:enable runtime-dyno-metadata --app raxx-console-prod

Verify after the next deploy:

heroku run "printenv | grep HEROKU_SLUG_COMMIT" --app raxx-console-prod

Emergency stop

To take raxx-console-prod offline cleanly (scale all dynos to 0):

heroku ps:scale web=0 --app raxx-console-prod

To bring it back:

heroku ps:scale web=1 --app raxx-console-prod

Escalation

Wake the operator when: - The Heroku API token is revoked and cannot be rotated without account access - The console/ subtree has a corrupted git history that blocks subtree split - The dyno crashes on boot after a successful push (application-level bug, not deploy-level)

Contact: Kristerpher via Slack DM (D0AJ7K184TV) or kris@moosequest.net

References