ADR 0057 — Reasonator re-scoring: model SHA as first-class provenance field
Status: Accepted
Date: 2026-05-09 UTC
Refs: #1385, docs/architecture/reasonator/design.md (Decision 4), #1380 (Decision 2 — model version governance)
Context
FinBERT model weights evolve. A ProsusAI/finbert checkpoint update could change the score for an identical headline. Without version provenance, a historical score is ambiguous: which model produced it? Was this computed before or after the model update?
The operator requirement (from handoff doc, OQ-2): every score must carry the exact model SHA used to produce it. If the model updates, historical rows must be re-scorable with provenance of both the old and new score preserved.
Two re-scoring strategies were considered: overwrite-in-place, and append-with-audit.
Decision
Append-with-audit. sentiment_events stores the current (latest) score. Every score write — initial and re-score — is appended to sentiment_score_audit with both the new score and (for re-scores) the previous score and its model SHA.
POST /v1/score/rescore carries previous_score and previous_model_sha in the request; Reasonator echoes them in the response alongside the new scores. Raptor writes the full comparison row to sentiment_score_audit.
scorer_model_version in sentiment_events is always the SHA of the model that produced the current score.
FINBERT_MODEL_SHA is an env var in the Reasonator config, sourced from Infisical. Changing the SHA and redeploying (or SIGHUPing) Reasonator triggers the model reload. A re-score sweep job is then triggered to update historical rows.
Consequences
- Positive: Full provenance — any historical score can be traced to an exact model checkpoint. Supports forensic comparison of scores across model versions.
- Positive:
sentiment_score_auditgrows at the rate of score writes, not at the rate of article ingestion. The audit table is append-only and can be partitioned byscored_atfor archival. - Negative:
sentiment_score_auditis a new table; the re-scoring sub-card must add the migration. - Negative: Re-scoring large historical archives (Phase 2: potentially millions of rows) is a slow background operation. The re-score sweep job pages in 500-row batches with inter-page delays to avoid starving Pro+ sync requests.
- Open: Model SHA governance (who approves a model update, what testing is required before updating the prod SHA) is OQ-2 in the design doc and was flagged as Decision 2 in #1380. This ADR records the storage and API design; the governance process is a separate operator decision.
Alternatives Considered
Overwrite-in-place: Simpler — just update sentiment_events.sentiment_score and scorer_model_version. Loses the old score permanently. Rejected: the operator requirement for reproducibility requires the old score to be preserved for comparison.
Separate score versions table (one row per version per event): More normalized. Querying current scores requires a MAX(scored_at) join. Higher query complexity with no additional benefit over the append-to-audit pattern. Rejected.
No re-scoring: Treat each model SHA as producing a distinct, non-comparable score series. Simple but means users cannot get a consistent historical view after a model update. Rejected — the operator explicitly requires re-scoring as a supported operation.