ADR 0057 — Reasonator re-scoring: model SHA as first-class provenance field

Status: Accepted Date: 2026-05-09 UTC Refs: #1385, docs/architecture/reasonator/design.md (Decision 4), #1380 (Decision 2 — model version governance)

Context

FinBERT model weights evolve. A ProsusAI/finbert checkpoint update could change the score for an identical headline. Without version provenance, a historical score is ambiguous: which model produced it? Was this computed before or after the model update?

The operator requirement (from handoff doc, OQ-2): every score must carry the exact model SHA used to produce it. If the model updates, historical rows must be re-scorable with provenance of both the old and new score preserved.

Two re-scoring strategies were considered: overwrite-in-place, and append-with-audit.

Decision

Append-with-audit. sentiment_events stores the current (latest) score. Every score write — initial and re-score — is appended to sentiment_score_audit with both the new score and (for re-scores) the previous score and its model SHA.

POST /v1/score/rescore carries previous_score and previous_model_sha in the request; Reasonator echoes them in the response alongside the new scores. Raptor writes the full comparison row to sentiment_score_audit.

scorer_model_version in sentiment_events is always the SHA of the model that produced the current score.

FINBERT_MODEL_SHA is an env var in the Reasonator config, sourced from Infisical. Changing the SHA and redeploying (or SIGHUPing) Reasonator triggers the model reload. A re-score sweep job is then triggered to update historical rows.

Consequences

Positive: Full provenance — any historical score can be traced to an exact model checkpoint. Supports forensic comparison of scores across model versions.
Positive: sentiment_score_audit grows at the rate of score writes, not at the rate of article ingestion. The audit table is append-only and can be partitioned by scored_at for archival.
Negative: sentiment_score_audit is a new table; the re-scoring sub-card must add the migration.
Negative: Re-scoring large historical archives (Phase 2: potentially millions of rows) is a slow background operation. The re-score sweep job pages in 500-row batches with inter-page delays to avoid starving Pro+ sync requests.
Open: Model SHA governance (who approves a model update, what testing is required before updating the prod SHA) is OQ-2 in the design doc and was flagged as Decision 2 in #1380. This ADR records the storage and API design; the governance process is a separate operator decision.

Alternatives Considered

Overwrite-in-place: Simpler — just update sentiment_events.sentiment_score and scorer_model_version. Loses the old score permanently. Rejected: the operator requirement for reproducibility requires the old score to be preserved for comparison.

Separate score versions table (one row per version per event): More normalized. Querying current scores requires a MAX(scored_at) join. Higher query complexity with no additional benefit over the append-to-audit pattern. Rejected.

No re-scoring: Treat each model SHA as producing a distinct, non-comparable score series. Simple but means users cannot get a consistent historical view after a model update. Rejected — the operator explicitly requires re-scoring as a supported operation.