Language Tier Policy
Status: Accepted
Date: 2026-05-18 UTC
Owner: software-architect
Refs: operator decision 2026-05-10 UTC (project_language_tier_philosophy), ADR-0076 (Queue C++ election), docs/architecture/queue-cpp-scaffold-review-2026-05-13.md
Blocked by: nothing
Blocks: sub2 (philosophical doc), sub3 (SDLC template updates)
Issue: #2286
Purpose
This document is a policy, not a philosophy. It defines which language tier a service lives in, the numeric thresholds that trigger re-classification review, the contract requirements a service must satisfy before a Tier 1 rewrite begins, and the approval path for that rewrite.
The philosophical rationale for the two-phase language strategy belongs in sub2 (pending). This document cites operator decisions without re-arguing them.
1. Tier Definitions {#tier-definitions}
Tier 1 — Rust or C++ {#tier-1-criteria}
Tier 1 services are built in Rust or C++. They are expected to live indefinitely without a planned rewrite. The higher engineering cost is accepted in exchange for:
- Compile-time type safety with no GIL, no GC pauses, no managed-runtime overhead.
- Predictable sub-millisecond latency on auth hot paths, cryptographic operations, and high-frequency write paths.
- Foundation longevity: a Tier 1 service is the last version, not an interim one.
Criteria (any single criterion qualifies for Tier 1 review):
| # | Criterion | Threshold |
|---|---|---|
| C-1 | Security-sensitive hot path | Service handles passkey verification, session token mint/verify, cryptographic key material, or audit hash-chain writes |
| C-2 | P99 latency | P99 response latency > 100 ms sustained for 7 consecutive days under normal load |
| C-3 | Throughput demand | Service must sustain > 5,000 requests/sec on its primary endpoint |
| C-4 | Memory footprint | Resident set size (RSS) > 512 MB under normal operating load for 7 consecutive days |
| C-5 | Rewrite cost justification | Python rewrite was already planned (i.e., the service is a known short-term implementation) AND it owns a critical data domain |
| C-6 | Infrastructure cost | Dyno/compute cost for the service exceeds $500/month and profiling attributes > 50 % to interpreter overhead |
A service meeting C-1 alone qualifies for Tier 1 directly (no additional numeric gate). For C-2 through C-6, the threshold must be sustained, not transient — to prevent reacting to spikes.
Framework guidance (non-binding):
- C++: Drogon preferred for HTTP services (ADR-0076); libpqxx for Postgres.
- Rust: axum preferred for HTTP services; sqlx for Postgres.
- Policy is framework-agnostic at the tier level. The ADR for each rewrite selects the specific stack.
Current Tier 1 services:
| Service | Language | Rationale | Status |
|---|---|---|---|
| Queue | C++ (Drogon) | C-1 (passkey, session, RBAC, audit hash-chain), C-5 (long-term owner of identity domain) | In development — ADR-0076 |
Named Tier 1 candidates (pre-promotion):
| Service | Primary criterion | Notes |
|---|---|---|
| Velvet | C-1 (secret rotation, key material distribution) | Explicitly named by operator 2026-05-10 UTC; Python v1 ships first, Tier 1 rewrite follows |
| Audit hash-chain verifier | C-1 (cryptographic verification on audit read path) | Standalone verifier process; scope TBD at sub-card filing |
| Order-routing hot path | C-2/C-3 (latency + throughput at scale) | Conditional on v1 volume data; decision deferred to post-v1 |
Tier 2 — Python {#tier-2-criteria}
Tier 2 services are built in Python. Python is the default for all new services unless a Tier 1 criterion is met.
Tier 2 is not a consolation tier. Python with mypy --strict, Pydantic at data
boundaries, and Decimal for money arithmetic is the correct choice for the
majority of Raxx services for v1 and well beyond.
Criteria for staying Tier 2:
- Service does not meet any Tier 1 threshold (see §1 above).
- Iteration speed matters more than raw throughput (e.g., backtest engine where domain logic changes frequently).
- Service is a composition layer that orchestrates calls to other services (e.g., Console, Raptor).
- Service lifetime is expected to be short (< 18 months) before being absorbed by a Tier 1 service.
Style requirements for Tier 2 services:
mypy --strictmust pass in CI.- All external data boundaries use Pydantic models.
- All money values use
Decimal; nofloatarithmetic for amounts. - SQLAlchemy or Alembic for DB access; no raw string interpolation in queries.
Current Tier 2 services:
| Service | Notes |
|---|---|
Raptor (backend_v2/) |
Trade execution orchestration, historical data, backtest engine |
| Console | Operator UI and admin API; Python Flask |
| Velvet (v1) | Secret rotation service; Python v1 pending Tier 1 promotion |
| Reasonator | Sentiment/pattern scoring; Python |
| getraxx marketing site | Static/thin server; Python or Node |
2. Promotion Thresholds {#promotion-thresholds}
Promotion from Tier 2 to Tier 1 is triggered when any of the Tier 1 criteria in §1 is met.
Monitoring obligations
For each Tier 2 service in production, the service owner must instrument and report the following metrics to Sentry:
| Metric | Instrument | Threshold that triggers review |
|---|---|---|
svc.<name>.p99_latency_ms |
Sentry performance transaction | > 100 ms sustained 7 days |
svc.<name>.rss_mb |
Sentry custom measurement, sampled every 5 min | > 512 MB sustained 7 days |
svc.<name>.dyno_cost_usd_mo |
Manual review in monthly cost audit | > $500/mo with > 50 % interpreter overhead |
| Security incident count | Filed as severity:high or severity:critical issues |
Any incident where interpreter-level behavior (GIL, type coercion, dynamic dispatch) is a named contributing factor |
Thresholds are soft guidelines by default. The operator may override any threshold by editing this document directly. The override must note the service name and rationale.
Sustained-measurement definition
"Sustained 7 consecutive days" means: the metric is above the threshold on each of 7 calendar days as measured by the daily P99 (not momentary peak). A single-day spike does not trigger review; 7 consecutive days does.
3. Promotion Decision Flow {#promotion-decision-flow}
Tier 2 service hits threshold
│
▼
Service owner files a promotion-review issue
- Labels: area:architecture, type:decision, lang-tier:review
- Body: metric evidence (7-day chart), affected criterion, proposed target language
│
▼
software-architect reviews within 5 business days
- Confirms or disputes threshold evidence
- Drafts an ADR (ADR number assigned at this step)
│
▼
Operator reviews ADR
- Approves rewrite scope, language, and timeline
- Signs off on parallel-implementation strategy (see §4)
│
▼
ADR filed as Accepted → sub-cards filed → rewrite dispatched
Who approves: The operator (Kristerpher) is the sole approver of Tier 1 promotions. The software-architect may recommend but not unilaterally promote.
Timeline expectation: From threshold hit to rewrite dispatch is expected to take 2–4 weeks (issue → ADR → approval → sub-cards). The Tier 2 service remains live throughout; there is no freeze.
4. Contract Requirements for Tier 1 Services {#contract-requirements}
Before a Tier 1 rewrite begins, the Tier 2 service must have a stable, language-agnostic API contract. This contract is the rewrite's target; it is not defined during the rewrite.
Required artifacts before rewrite start
-
OpenAPI 3.1 spec (or equivalent typed schema) covering all endpoints the rewrite must implement. Spec is committed to
docs/architecture/<service>/api-contract.mdor as aopenapi.yamlfile. -
Behavioral test suite — integration tests that run against the HTTP interface, not the Python internals. Tests must be framework-agnostic (i.e., they pass against both the Tier 2 and Tier 1 implementation). These tests are the parity gate for the rewrite.
-
Contract-test CI job — a CI step that runs the behavioral suite against the Tier 1 implementation. The rewrite is not "done" until this job is green.
-
Data migration plan — if the Tier 2 service owns a DB, the migration strategy (schema compatibility, dual-write window, cutover) must be documented before the rewrite is dispatched.
Parallel implementation model
Tier 1 rewrites ship as parallel implementations behind the same API contract, not in-place rewrites. The pattern:
- Tier 2 service continues to serve production.
- Tier 1 implementation is developed and deployed to staging under the same URL prefix (e.g., behind a feature flag or a header-based router).
- Behavioral parity tests pass on Tier 1 staging.
- Traffic is shifted to Tier 1 via feature flag (canary → 100 %).
- Tier 2 service is decommissioned after a soak period (minimum 14 days at 100 % traffic).
Velvet's transition model is an explicit example of this pattern: parallel Rust implementation behind the same rotation API contract, not an in-place rewrite of the Python service.
5. Cross-Language Testing Posture {#cross-language-testing}
The behavioral parity gate (§4) is the primary cross-language testing mechanism. Additional requirements:
- No language-specific test may be the sole coverage for a contract endpoint. Each endpoint must have at least one integration test that exercises the HTTP interface.
- Consumer-driven contract tests (e.g., Pact) are preferred when a Tier 1 service has multiple consumers (iOS, Antlers, Console). The consumer defines the contract; the Tier 1 service proves it satisfies it.
- Error semantics must match. The Tier 1 implementation must return the same HTTP status codes, error response shapes, and header contracts as the Tier 2 implementation. Behavioral divergence in error paths is a parity failure.
6. Rewrite Delegation {#rewrite-delegation}
Per operator decision 2026-05-10 UTC: Tier 1 rewrites are likely delegated to a separate agent or model purpose-built for the target language (e.g., a C++ specialist agent for Queue). The dispatching agent does not write the Tier 1 service; it files the ADR, scopes the sub-cards, and hands off.
Delegation constraints:
- The delegated agent receives: (a) the approved ADR, (b) the API contract spec, (c) the behavioral test suite, (d) the data migration plan.
- The delegated agent does not modify the Tier 2 service except to add instrumentation or contract tests required by the parity gate.
- The delegated agent's PRs are reviewed by the software-architect before merge.
7. Override Log {#override-log}
Operator threshold overrides are recorded here as they are applied.
| Date | Service | Metric | Default threshold | Override value | Rationale |
|---|---|---|---|---|---|
| — | — | — | — | — | No overrides applied |
8. Cross-References {#cross-references}
- Operator philosophy:
project_language_tier_philosophy(memory) — v1 Python, post-v1 Rust+C++ for critical services; Velvet explicitly named. - Queue C++ election: [ADR-0076](https://internal-docs.raxx.app/architecture/adr/0076-queue-phase1-billing-v1-aggressive-12day.html)
- Queue vcpkg discipline:
queue-cpp-scaffold-review-2026-05-13.md— first Tier 1 service operational lessons; propagates to all future Tier 1 services. - Scale-tier upgrade thresholds (audit write path): [ADR-0063](https://internal-docs.raxx.app/architecture/adr/0063-scale-tier-latency-budget-trigger-upgrade.html) — numeric threshold pattern that this policy follows.
- ADR-0011: Superseded (premium-tier compute); historical reference only.
- Sub2 (philosophical rationale doc): blocked on this document.
- Sub3 (SDLC template updates): blocked on sub2.