Feature Flag Ops Runbook
System: Console + Raptor API (raxx-console-prod, raxx-console-staging, raxx-api-prod, raxx-api-staging) Owner: operator / ops-agent Created: 2026-05-04 UTC Related design doc: docs/architecture/console-feature-flags.md — sections 6 (migrations), 7 (rollout plan), 8 (security) Related promotion doc: docs/architecture/console-flag-promotion-flow.md Parent epic: #551 (in-console feature flag management) Deprecation table: docs/ops/feature-flags-heroku-deprecation.md
1. System overview
Feature flags resolve in this order (per flag, per env):
1. console_feature_flags DB row — wins when FLAG_CONSOLE_FEATURE_FLAGS_DB=1 and row exists
2. FLAG_<UPPER_FLAG_KEY> Heroku config var — bootstrap fallback when no DB row
3. feature_flags.yaml default value — final fallback
The canonical flag declarations live in backend_v2/api/feature_flags.yaml. Every flag must have an entry there before it can be flipped.
Once FLAG_CONSOLE_FEATURE_FLAGS_DB=1 is set and migration 0009 is applied, a flip via the console UI writes a DB row. That row wins over the Heroku env var from that point forward. The env var is then dormant for that flag+env pair — it continues to exist on Heroku but has no effect. See the deprecation table for removal tracking.
Cache: per-process LRU with 30-second TTL. A flip via the UI explicitly invalidates the affected (flag_key, env) pair, so the change is visible within 30 seconds on all dynos.
2. How to flip a flag via the console UI
Required role: ops (read-only) or superadmin (read + write). Flags tagged risk: high in feature_flags.yaml require superadmin and TOTP elevation.
- Navigate to
https://console.raxx.app/console/flags(or the staging equivalent for the staging env). - The page shows a table of all declared flags, their current resolved value, the source layer (
db,env, oryaml), and the last operator who flipped them. - Confirm the env banner in the header is set to the environment you intend to flip (red = prod, purple = staging). Flipping a flag always affects the currently active env only.
- Locate the flag row. The On/Off toggle shows the current resolved value.
- Click the toggle. An HTMX request fires to
POST /console/flags/<flag_key>/flip. - On success: the row updates in place showing the new value, the
sourcecolumn changes todb, andlast changed byshows your account. - The change takes effect on all dynos within 30 seconds (cache TTL).
Screenshot placeholder: [console-flags-page-screenshot]
3. How to flip a flag via the promotion flow (staging → prod)
The mark-promote → promote flow is the structured path for moving a verified staging flag to prod.
Required role: superadmin
- On the
/console/flagspage (staging env active), click Mark active for prod on the flag row. - The flag is entered into the promotion queue with
state=pending. A soak clock starts (default 24h; flag-specificsoak_period_hoursfromfeature_flags.yaml). - After soak expires, the Promote button becomes available on the flag row (prod env active).
- Click Promote. For flags tagged
risk: high, type the confirmation phrase when prompted. - The prod flip fires. Both the promotion record and the flag audit row are written.
Full promotion state machine: pending → approved → promoted (or rejected / expired).
Audit actions: console.flag.mark_promote, console.flag.approved, console.flag.promoted, console.flag.rejected, console.flag.expired.
See docs/architecture/console-flag-promotion-flow.md for the full state machine and sequence diagrams.
4. Audit log
Every flip writes an immutable row to console_audit_log:
action:console.flag.fliptarget_type:feature_flagtarget_id: the flag key (e.g.console_billing)payload:{"flag": "<key>", "env": "prod|staging", "from": <bool>, "to": <bool>}
To query the audit log for flag flips:
SELECT created_at, admin_id, payload
FROM console_audit_log
WHERE action = 'console.flag.flip'
ORDER BY created_at DESC
LIMIT 50;
Audit rows follow the 2-year retention policy. They are immutable — no row is ever deleted.
In the console UI: navigate to /console/audit and filter by action console.flag.flip.
5. RBAC requirements
| Action | Minimum role |
|---|---|
View /console/flags page |
ops |
| Flip any flag | superadmin |
Flip a risk: low flag |
ops |
Flip a risk: high flag |
superadmin + TOTP elevation |
| Mark a flag active-for-prod | superadmin |
| Approve / promote a pending promotion | superadmin |
| Reject a pending promotion | superadmin |
support and readonly roles cannot access the flag management page.
6. Emergency path — Heroku CLI (break-glass)
Use the Heroku CLI flip when the console is unavailable (e.g. console DB down, console dyno cycling) and you need to change a flag value immediately.
When the Heroku env var still works:
- If no DB row exists for the (flag_key, env) pair, the env var wins over the YAML default.
- If FLAG_CONSOLE_FEATURE_FLAGS_DB=0 (the DB layer is off), the env var always wins.
When the Heroku env var does NOT override:
- Once a DB row exists for a (flag_key, env) pair, the DB wins. The Heroku env var is dormant for that pair and cannot override it via config:set alone.
If you need to force-override when a DB row exists:
Option A — flip via the flags service directly (if you have console DB access):
UPDATE console_feature_flags SET value=1, updated_at=CURRENT_TIMESTAMP
WHERE flag_key='<flag_key>' AND env='<env>';
Then wait up to 30 seconds for the cache TTL to expire, or restart the dyno to flush immediately.
Option B — disable the DB layer temporarily (emergency only, requires dyno restart):
heroku config:set FLAG_CONSOLE_FEATURE_FLAGS_DB=0 --app <app-name> >/dev/null 2>&1
heroku config:set FLAG_<UPPER_KEY>=1 --app <app-name> >/dev/null 2>&1
Re-enable the DB layer once the incident is resolved:
heroku config:set FLAG_CONSOLE_FEATURE_FLAGS_DB=1 --app <app-name> >/dev/null 2>&1
Always silence Heroku config:set stdout — it echoes the full config including any adjacent secrets:
heroku config:set FLAG_X=1 --app <app> >/dev/null 2>&1
After any break-glass Heroku flip: file a note in the relevant incident ticket and use the console UI to write a reconciling DB row so the audit log reflects the true current state.
7. How to add a new flag
- Add an entry to
backend_v2/api/feature_flags.yaml:
my_new_feature:
default: false
soak_period_hours: 24
description: "one-line description of what the flag gates"
risk: low # low | medium | high
- Reference the flag in code via the flags service:
from app.services.flags import flags
if flags.is_on("my_new_feature"):
# gated behavior
For Raptor (backend_v2), the existing FLAG_<KEY>=1 env var pattern remains valid until #553 migrates all call sites.
-
A DB row is not required at add time. The YAML default (
false) applies until an operator flips the flag via the UI. -
To enable on staging first: navigate to
/console/flagswith the staging env active and toggle on. -
After staging soak (
soak_period_hours): use the mark-promote → promote flow (section 3) to bring it to prod.
8. When to remove a Heroku config var
Once a flag has been flipped via the console UI, a DB row exists for that (flag_key, env) pair and the Heroku env var is dormant. The removal window is:
- Confirm the DB row exists on both
prodandstagingfor the flag (audit log shows at least oneconsole.flag.flipentry per env). - Soak >= 24h with the DB row active. Confirm no regressions.
- Remove the Heroku config var:
heroku config:unset FLAG_<UPPER_KEY> --app raxx-console-prod >/dev/null 2>&1 heroku config:unset FLAG_<UPPER_KEY> --app raxx-console-staging >/dev/null 2>&1 - Update the deprecation table status to
removed.
The removal step is not automated — operators must confirm via audit log that the DB row is live before removing the env var safety net. Never remove a Heroku env var speculatively.
See the deprecation table for the full list of FLAG_CONSOLE_* vars and their removal status.
9. GA sign-off checklist
Before tagging GA on the flag management feature (#551):
- [ ] Migration 0009 applied on prod console DB
- [ ]
FLAG_CONSOLE_FEATURE_FLAGS_DB=1set on both prod and staging - [ ] All
FLAG_CONSOLE_*Heroku config vars confirmed dormant (DB rows present for all flags that were previously set to1) - [ ] Audit log shows at least one
console.flag.flipentry for each migrated flag - [ ]
FLAG_CONSOLE_FLAG_MGMTHeroku config var removed from both apps (the feature self-gate is cleaned up post-GA) - [ ] No remaining
os.environ.get("FLAG_")calls inconsole/app/(confirm withgrep -r 'os.environ.get.*FLAG_' console/app/) - [ ] Deprecation table updated: all dormant vars marked, removal timeline recorded
10. Related resources
- Design doc:
docs/architecture/console-feature-flags.md - Promotion flow:
docs/architecture/console-flag-promotion-flow.md - ADR-0026 (flag persistence):
docs/architecture/adr/0026-feature-flag-persistence.md - ADR-0027 (env scoping):
docs/architecture/adr/0027-feature-flag-env-scoping.md - ADR-0035 (promotion flow):
docs/architecture/adr/0035-flag-promotion-staging-to-prod.md - Heroku runbook:
docs/ops/runbooks/heroku.md - Deprecation table:
docs/ops/feature-flags-heroku-deprecation.md