Raxx · internal docs

internal · gated

ADR-0095: Deploy Modal Phase Progression — Option A (Fine-Grained Workflow Callbacks)

Date: 2026-05-16 UTC Status: Accepted Issue: #2232 (closes #1948) Design doc: docs/architecture/deploy-modal-phase-progression-2026-05-16.md


Context

The deploy modal's four-step stepper (Smoke gate / Freeze check / Deploy / Health check) cannot advance beyond step 3 (Deploy) during a live deploy. deploy-console.yml emits only three status callbacks: building, deploying, succeeded/failed. The smoke and freeze-check jobs run before the deploy job and are not wired to notify-deploy-status at all.

Three options were presented by sre-agent:

Option A — Add fine-grained callbacks to all four layers (workflow, service model, DB migration, modal JS).

Option B — Infer phase client-side from elapsed time / log_tail keyword matching.

Option C — Simplify the stepper to match actual event granularity (3 states).


Decision

Option A is adopted.


Reasoning

Option B fabricates state. The elapsed-time estimate would advance the stepper regardless of whether the workflow is actually past the smoke gate. A slow smoke run or an unexpected stall would produce a false phase display — precisely the misleading behaviour the operator reported in #1948. The project invariant (feedback_deterministic_execution_ai_augments.md) prohibits the system from inferring state it does not actually have: authoritative events, not estimates.

Option C removes real gates from the operator's view. The smoke gate and freeze-check gate are safety controls. Hiding them from the stepper reduces visibility into which gate a deploy is waiting on. The #1948 user story explicitly requires each phase to produce a visible state transition; Option C satisfies neither the story nor the acceptance criteria.

Option A requires changes across four layers but each layer is bounded and independently testable. The callback mechanism is already generic (apply_callback handles any valid status). The transition graph extension is additive. The workflow changes are two steps per job. The modal JS routing is an extension of the existing switch/map structure. The total scope is appropriate for a size:m multi-PR effort.


Consequences

Positive: - Stepper accurately reflects real workflow state at each phase. - Audit log gains four new granular transition events per deploy. - No fabricated state; operator can distinguish smoke-gate delay from deploy delay.

Negative: - Four layers must ship in coordinated PRs (or one large PR). - New status values require a DB migration (additive; low risk). - New callback steps in smoke and freeze-check jobs must use continue-on-error: true to tolerate 422 rejections during a code rollback window.

Neutral: - Existing building / deploying / succeeded / failed behaviour is unchanged. - Non-console-triggered deploys (push-to-main, break-glass) are unaffected — they supply no console_deploy_id and the action no-ops silently.


Alternatives Considered

Option Rejected reason
B — Client-side inference Fabricates phase state; violates deterministic-execution invariant
C — Simplify stepper Removes real gate visibility; does not satisfy #1948 AC