Raxx · internal docs

internal · gated ↑ index

Feature Flag Ops Runbook

System: Console + Raptor API (raxx-console-prod, raxx-console-staging, raxx-api-prod, raxx-api-staging) Owner: operator / ops-agent Created: 2026-05-04 UTC Related design doc: docs/architecture/console-feature-flags.md — sections 6 (migrations), 7 (rollout plan), 8 (security) Related promotion doc: docs/architecture/console-flag-promotion-flow.md Parent epic: #551 (in-console feature flag management) Deprecation table: docs/ops/feature-flags-heroku-deprecation.md


1. System overview

Feature flags resolve in this order (per flag, per env):

1. console_feature_flags DB row  — wins when FLAG_CONSOLE_FEATURE_FLAGS_DB=1 and row exists
2. FLAG_<UPPER_FLAG_KEY> Heroku config var  — bootstrap fallback when no DB row
3. feature_flags.yaml default value         — final fallback

The canonical flag declarations live in backend_v2/api/feature_flags.yaml. Every flag must have an entry there before it can be flipped.

Once FLAG_CONSOLE_FEATURE_FLAGS_DB=1 is set and migration 0009 is applied, a flip via the console UI writes a DB row. That row wins over the Heroku env var from that point forward. The env var is then dormant for that flag+env pair — it continues to exist on Heroku but has no effect. See the deprecation table for removal tracking.

Cache: per-process LRU with 30-second TTL. A flip via the UI explicitly invalidates the affected (flag_key, env) pair, so the change is visible within 30 seconds on all dynos.


2. How to flip a flag via the console UI

Required role: ops (read-only) or superadmin (read + write). Flags tagged risk: high in feature_flags.yaml require superadmin and TOTP elevation.

  1. Navigate to https://console.raxx.app/console/flags (or the staging equivalent for the staging env).
  2. The page shows a table of all declared flags, their current resolved value, the source layer (db, env, or yaml), and the last operator who flipped them.
  3. Confirm the env banner in the header is set to the environment you intend to flip (red = prod, purple = staging). Flipping a flag always affects the currently active env only.
  4. Locate the flag row. The On/Off toggle shows the current resolved value.
  5. Click the toggle. An HTMX request fires to POST /console/flags/<flag_key>/flip.
  6. On success: the row updates in place showing the new value, the source column changes to db, and last changed by shows your account.
  7. The change takes effect on all dynos within 30 seconds (cache TTL).

Screenshot placeholder: [console-flags-page-screenshot]


3. How to flip a flag via the promotion flow (staging → prod)

The mark-promote → promote flow is the structured path for moving a verified staging flag to prod.

Required role: superadmin

  1. On the /console/flags page (staging env active), click Mark active for prod on the flag row.
  2. The flag is entered into the promotion queue with state=pending. A soak clock starts (default 24h; flag-specific soak_period_hours from feature_flags.yaml).
  3. After soak expires, the Promote button becomes available on the flag row (prod env active).
  4. Click Promote. For flags tagged risk: high, type the confirmation phrase when prompted.
  5. The prod flip fires. Both the promotion record and the flag audit row are written.

Full promotion state machine: pendingapprovedpromoted (or rejected / expired). Audit actions: console.flag.mark_promote, console.flag.approved, console.flag.promoted, console.flag.rejected, console.flag.expired.

See docs/architecture/console-flag-promotion-flow.md for the full state machine and sequence diagrams.


4. Audit log

Every flip writes an immutable row to console_audit_log:

To query the audit log for flag flips:

SELECT created_at, admin_id, payload
FROM console_audit_log
WHERE action = 'console.flag.flip'
ORDER BY created_at DESC
LIMIT 50;

Audit rows follow the 2-year retention policy. They are immutable — no row is ever deleted.

In the console UI: navigate to /console/audit and filter by action console.flag.flip.


5. RBAC requirements

Action Minimum role
View /console/flags page ops
Flip any flag superadmin
Flip a risk: low flag ops
Flip a risk: high flag superadmin + TOTP elevation
Mark a flag active-for-prod superadmin
Approve / promote a pending promotion superadmin
Reject a pending promotion superadmin

support and readonly roles cannot access the flag management page.


6. Emergency path — Heroku CLI (break-glass)

Use the Heroku CLI flip when the console is unavailable (e.g. console DB down, console dyno cycling) and you need to change a flag value immediately.

When the Heroku env var still works: - If no DB row exists for the (flag_key, env) pair, the env var wins over the YAML default. - If FLAG_CONSOLE_FEATURE_FLAGS_DB=0 (the DB layer is off), the env var always wins.

When the Heroku env var does NOT override: - Once a DB row exists for a (flag_key, env) pair, the DB wins. The Heroku env var is dormant for that pair and cannot override it via config:set alone.

If you need to force-override when a DB row exists:

Option A — flip via the flags service directly (if you have console DB access):

UPDATE console_feature_flags SET value=1, updated_at=CURRENT_TIMESTAMP
WHERE flag_key='<flag_key>' AND env='<env>';

Then wait up to 30 seconds for the cache TTL to expire, or restart the dyno to flush immediately.

Option B — disable the DB layer temporarily (emergency only, requires dyno restart):

heroku config:set FLAG_CONSOLE_FEATURE_FLAGS_DB=0 --app <app-name> >/dev/null 2>&1
heroku config:set FLAG_<UPPER_KEY>=1 --app <app-name> >/dev/null 2>&1

Re-enable the DB layer once the incident is resolved:

heroku config:set FLAG_CONSOLE_FEATURE_FLAGS_DB=1 --app <app-name> >/dev/null 2>&1

Always silence Heroku config:set stdout — it echoes the full config including any adjacent secrets: heroku config:set FLAG_X=1 --app <app> >/dev/null 2>&1

After any break-glass Heroku flip: file a note in the relevant incident ticket and use the console UI to write a reconciling DB row so the audit log reflects the true current state.


7. How to add a new flag

  1. Add an entry to backend_v2/api/feature_flags.yaml:
  my_new_feature:
    default: false
    soak_period_hours: 24
    description: "one-line description of what the flag gates"
    risk: low   # low | medium | high
  1. Reference the flag in code via the flags service:
from app.services.flags import flags

if flags.is_on("my_new_feature"):
    # gated behavior

For Raptor (backend_v2), the existing FLAG_<KEY>=1 env var pattern remains valid until #553 migrates all call sites.

  1. A DB row is not required at add time. The YAML default (false) applies until an operator flips the flag via the UI.

  2. To enable on staging first: navigate to /console/flags with the staging env active and toggle on.

  3. After staging soak (soak_period_hours): use the mark-promote → promote flow (section 3) to bring it to prod.


8. When to remove a Heroku config var

Once a flag has been flipped via the console UI, a DB row exists for that (flag_key, env) pair and the Heroku env var is dormant. The removal window is:

  1. Confirm the DB row exists on both prod and staging for the flag (audit log shows at least one console.flag.flip entry per env).
  2. Soak >= 24h with the DB row active. Confirm no regressions.
  3. Remove the Heroku config var: heroku config:unset FLAG_<UPPER_KEY> --app raxx-console-prod >/dev/null 2>&1 heroku config:unset FLAG_<UPPER_KEY> --app raxx-console-staging >/dev/null 2>&1
  4. Update the deprecation table status to removed.

The removal step is not automated — operators must confirm via audit log that the DB row is live before removing the env var safety net. Never remove a Heroku env var speculatively.

See the deprecation table for the full list of FLAG_CONSOLE_* vars and their removal status.


9. GA sign-off checklist

Before tagging GA on the flag management feature (#551):