Status: Accepted (conditional — see open question in §Consequences)
Date: 2026-04-29
Deciders: software-architect
Refs: workflow-uuid-tracing.md, ADR-0003
The workflow-UUID tracing design requires a storage backend that can:
user_id, workflow_id, short action_type strings).Candidates evaluated: Timescale (Postgres + extension), plain Postgres (partitioned), ClickHouse, InfluxDB, AWS Timestream, SQLite (4-week MVP path).
Use Timescale (TimescaleDB extension on Postgres) as the production trace store, with SQLite as a valid 4-week MVP shortcut.
Postgres declarative partitioning on ts_emitted works but requires manual partition creation per time range, manual index management per partition, and a separate housekeeping job to detach old partitions for cold storage. Timescale's create_hypertable automates all of this:
add_compression_policy('trace_events', INTERVAL '90 days') compresses old chunks in-place with ~10–20× ratio on repetitive event rows. No separate ETL job.add_tiering_policy (Timescale Cloud) moves compressed chunks to S3 transparently, keeping hot Postgres storage bounded.time_bucket queries and first()/last() aggregate functions are purpose-built for event reconstruction.ClickHouse is purpose-built for OLAP on append-only data and outperforms Postgres at hundreds of millions of rows. But:
UPDATE or DELETE in the traditional sense (mutations are expensive async operations). This is fine for append-only trace data but complicates GDPR pseudonymization, which requires modifying user_id fields on erasure. The workaround (soft-delete + view layer) adds complexity.InfluxDB's data model (measurements, tags, fields) fits infrastructure metrics well but is a poor fit for Raxx's event schema, which is a heterogeneous record with varying context fields. The Flux query language is a separate skill. InfluxDB Cloud has a separate billing model and does not compose with the existing Postgres-centric Raptor codebase. Rejected.
AWS Timestream is managed, serverless, and cheap at low volume. But it couples the trace store to AWS permanently — a platform change from Heroku to Fly.io or Railway would require a migration path. Timestream queries use a custom SQL dialect. No Heroku add-on. Rejected for pre-launch; acceptable as a v2 option if Raxx migrates fully to AWS infrastructure.
At MVP volume (<10 users, <10,000 events total), SQLite with a composite index on (user_id, ts_emitted) is entirely sufficient. The schema is identical to the Timescale schema minus the create_hypertable call. When Raptor migrates from SQLite to Postgres (a planned step in the backend roadmap), the trace table migrates with it and create_hypertable is called on the existing data. No data is lost; Timescale applies partitioning retroactively.
If Raptor is already on Heroku Postgres (not SQLite), the SQLite path is moot — go straight to Timescale.
UPDATE on pseudonymization fields — no special ClickHouse-style mutation needed.timescaledb extension installed manually (requires a Heroku Private Space for custom buildpacks). This is the recommended path until managed Timescale is justified by volume.add_tiering_policy (cold storage to S3) is only available on Timescale Cloud, not the open-source self-hosted version. If Raxx self-hosts, cold storage requires a custom job. Feature-developer must choose between managed (cost) and self-hosted (complexity) when implementing SC-11.This ADR's database path decision is conditional on the answer to Open Question 1 in workflow-uuid-tracing.md §13: is Raptor on SQLite or Heroku Postgres today? If Postgres, the recommendation is Timescale immediately. If SQLite, the recommendation is SQLite for MVP, Timescale at the first Postgres migration.
| Candidate | Verdict | Reason |
|---|---|---|
| Timescale | Recommended | Postgres-native; auto-partitioning; compression; tiered storage; Heroku-compatible |
| Plain Postgres + partitioning | Acceptable fallback | Works, but manual partition management and no built-in compression |
| ClickHouse | Deferred to v2 at scale | Better at 100M+ rows; new ops paradigm; GDPR mutation complexity |
| InfluxDB | Rejected | Inappropriate data model; separate query language; no Heroku add-on |
| AWS Timestream | Rejected pre-launch | AWS vendor lock-in; custom SQL dialect; no Heroku integration |
| SQLite | MVP-only | Sufficient for <10K events; migrates forward to Timescale without data loss |