Language Tier Policy

Status: Accepted
Date: 2026-05-18 UTC
Owner: software-architect
Refs: operator decision 2026-05-10 UTC (project_language_tier_philosophy), ADR-0076 (Queue C++ election), docs/architecture/queue-cpp-scaffold-review-2026-05-13.md
Blocked by: nothing
Blocks: sub2 (philosophical doc), sub3 (SDLC template updates)
Issue: #2286

Purpose

This document is a policy, not a philosophy. It defines which language tier a service lives in, the numeric thresholds that trigger re-classification review, the contract requirements a service must satisfy before a Tier 1 rewrite begins, and the approval path for that rewrite.

The philosophical rationale for the two-phase language strategy belongs in sub2 (pending). This document cites operator decisions without re-arguing them.

1. Tier Definitions {#tier-definitions}

Tier 1 — Rust or C++ {#tier-1-criteria}

Tier 1 services are built in Rust or C++. They are expected to live indefinitely without a planned rewrite. The higher engineering cost is accepted in exchange for:

Compile-time type safety with no GIL, no GC pauses, no managed-runtime overhead.
Predictable sub-millisecond latency on auth hot paths, cryptographic operations, and high-frequency write paths.
Foundation longevity: a Tier 1 service is the last version, not an interim one.

Criteria (any single criterion qualifies for Tier 1 review):

#	Criterion	Threshold
C-1	Security-sensitive hot path	Service handles passkey verification, session token mint/verify, cryptographic key material, or audit hash-chain writes
C-2	P99 latency	P99 response latency > 100 ms sustained for 7 consecutive days under normal load
C-3	Throughput demand	Service must sustain > 5,000 requests/sec on its primary endpoint
C-4	Memory footprint	Resident set size (RSS) > 512 MB under normal operating load for 7 consecutive days
C-5	Rewrite cost justification	Python rewrite was already planned (i.e., the service is a known short-term implementation) AND it owns a critical data domain
C-6	Infrastructure cost	Dyno/compute cost for the service exceeds $500/month and profiling attributes > 50 % to interpreter overhead

A service meeting C-1 alone qualifies for Tier 1 directly (no additional numeric gate). For C-2 through C-6, the threshold must be sustained, not transient — to prevent reacting to spikes.

Framework guidance (non-binding):

C++: Drogon preferred for HTTP services (ADR-0076); libpqxx for Postgres.
Rust: axum preferred for HTTP services; sqlx for Postgres.
Policy is framework-agnostic at the tier level. The ADR for each rewrite selects the specific stack.

Current Tier 1 services:

Service	Language	Rationale	Status
Queue	C++ (Drogon)	C-1 (passkey, session, RBAC, audit hash-chain), C-5 (long-term owner of identity domain)	In development — ADR-0076

Named Tier 1 candidates (pre-promotion):

Service	Primary criterion	Notes
Velvet	C-1 (secret rotation, key material distribution)	Explicitly named by operator 2026-05-10 UTC; Python v1 ships first, Tier 1 rewrite follows
Audit hash-chain verifier	C-1 (cryptographic verification on audit read path)	Standalone verifier process; scope TBD at sub-card filing
Order-routing hot path	C-2/C-3 (latency + throughput at scale)	Conditional on v1 volume data; decision deferred to post-v1

Tier 2 — Python {#tier-2-criteria}

Tier 2 services are built in Python. Python is the default for all new services unless a Tier 1 criterion is met.

Tier 2 is not a consolation tier. Python with mypy --strict, Pydantic at data boundaries, and Decimal for money arithmetic is the correct choice for the majority of Raxx services for v1 and well beyond.

Criteria for staying Tier 2:

Service does not meet any Tier 1 threshold (see §1 above).
Iteration speed matters more than raw throughput (e.g., backtest engine where domain logic changes frequently).
Service is a composition layer that orchestrates calls to other services (e.g., Console, Raptor).
Service lifetime is expected to be short (< 18 months) before being absorbed by a Tier 1 service.

Style requirements for Tier 2 services:

mypy --strict must pass in CI.
All external data boundaries use Pydantic models.
All money values use Decimal; no float arithmetic for amounts.
SQLAlchemy or Alembic for DB access; no raw string interpolation in queries.

Current Tier 2 services:

Service	Notes
Raptor (`backend_v2/`)	Trade execution orchestration, historical data, backtest engine
Console	Operator UI and admin API; Python Flask
Velvet (v1)	Secret rotation service; Python v1 pending Tier 1 promotion
Reasonator	Sentiment/pattern scoring; Python
getraxx marketing site	Static/thin server; Python or Node

2. Promotion Thresholds {#promotion-thresholds}

Promotion from Tier 2 to Tier 1 is triggered when any of the Tier 1 criteria in §1 is met.

Monitoring obligations

For each Tier 2 service in production, the service owner must instrument and report the following metrics to Sentry:

Metric	Instrument	Threshold that triggers review
`svc.<name>.p99_latency_ms`	Sentry performance transaction	> 100 ms sustained 7 days
`svc.<name>.rss_mb`	Sentry custom measurement, sampled every 5 min	> 512 MB sustained 7 days
`svc.<name>.dyno_cost_usd_mo`	Manual review in monthly cost audit	> $500/mo with > 50 % interpreter overhead
Security incident count	Filed as `severity:high` or `severity:critical` issues	Any incident where interpreter-level behavior (GIL, type coercion, dynamic dispatch) is a named contributing factor

Thresholds are soft guidelines by default. The operator may override any threshold by editing this document directly. The override must note the service name and rationale.

Sustained-measurement definition

"Sustained 7 consecutive days" means: the metric is above the threshold on each of 7 calendar days as measured by the daily P99 (not momentary peak). A single-day spike does not trigger review; 7 consecutive days does.

3. Promotion Decision Flow {#promotion-decision-flow}

Tier 2 service hits threshold
        │
        ▼
Service owner files a promotion-review issue
  - Labels: area:architecture, type:decision, lang-tier:review
  - Body: metric evidence (7-day chart), affected criterion, proposed target language
        │
        ▼
software-architect reviews within 5 business days
  - Confirms or disputes threshold evidence
  - Drafts an ADR (ADR number assigned at this step)
        │
        ▼
Operator reviews ADR
  - Approves rewrite scope, language, and timeline
  - Signs off on parallel-implementation strategy (see §4)
        │
        ▼
ADR filed as Accepted → sub-cards filed → rewrite dispatched

Who approves: The operator (Kristerpher) is the sole approver of Tier 1 promotions. The software-architect may recommend but not unilaterally promote.

Timeline expectation: From threshold hit to rewrite dispatch is expected to take 2–4 weeks (issue → ADR → approval → sub-cards). The Tier 2 service remains live throughout; there is no freeze.

4. Contract Requirements for Tier 1 Services {#contract-requirements}

Before a Tier 1 rewrite begins, the Tier 2 service must have a stable, language-agnostic API contract. This contract is the rewrite's target; it is not defined during the rewrite.

Required artifacts before rewrite start

OpenAPI 3.1 spec (or equivalent typed schema) covering all endpoints the rewrite must implement. Spec is committed to docs/architecture/<service>/api-contract.md or as a openapi.yaml file.
Behavioral test suite — integration tests that run against the HTTP interface, not the Python internals. Tests must be framework-agnostic (i.e., they pass against both the Tier 2 and Tier 1 implementation). These tests are the parity gate for the rewrite.
Contract-test CI job — a CI step that runs the behavioral suite against the Tier 1 implementation. The rewrite is not "done" until this job is green.
Data migration plan — if the Tier 2 service owns a DB, the migration strategy (schema compatibility, dual-write window, cutover) must be documented before the rewrite is dispatched.

Parallel implementation model

Tier 1 rewrites ship as parallel implementations behind the same API contract, not in-place rewrites. The pattern:

Tier 2 service continues to serve production.
Tier 1 implementation is developed and deployed to staging under the same URL prefix (e.g., behind a feature flag or a header-based router).
Behavioral parity tests pass on Tier 1 staging.
Traffic is shifted to Tier 1 via feature flag (canary → 100 %).
Tier 2 service is decommissioned after a soak period (minimum 14 days at 100 % traffic).

Velvet's transition model is an explicit example of this pattern: parallel Rust implementation behind the same rotation API contract, not an in-place rewrite of the Python service.

5. Cross-Language Testing Posture {#cross-language-testing}

The behavioral parity gate (§4) is the primary cross-language testing mechanism. Additional requirements:

No language-specific test may be the sole coverage for a contract endpoint. Each endpoint must have at least one integration test that exercises the HTTP interface.
Consumer-driven contract tests (e.g., Pact) are preferred when a Tier 1 service has multiple consumers (iOS, Antlers, Console). The consumer defines the contract; the Tier 1 service proves it satisfies it.
Error semantics must match. The Tier 1 implementation must return the same HTTP status codes, error response shapes, and header contracts as the Tier 2 implementation. Behavioral divergence in error paths is a parity failure.

6. Rewrite Delegation {#rewrite-delegation}

Per operator decision 2026-05-10 UTC: Tier 1 rewrites are likely delegated to a separate agent or model purpose-built for the target language (e.g., a C++ specialist agent for Queue). The dispatching agent does not write the Tier 1 service; it files the ADR, scopes the sub-cards, and hands off.

Delegation constraints:

The delegated agent receives: (a) the approved ADR, (b) the API contract spec, (c) the behavioral test suite, (d) the data migration plan.
The delegated agent does not modify the Tier 2 service except to add instrumentation or contract tests required by the parity gate.
The delegated agent's PRs are reviewed by the software-architect before merge.

7. Override Log {#override-log}

Operator threshold overrides are recorded here as they are applied.

Date	Service	Metric	Default threshold	Override value	Rationale
—	—	—	—	—	No overrides applied

8. Cross-References {#cross-references}

Operator philosophy: project_language_tier_philosophy (memory) — v1 Python, post-v1 Rust+C++ for critical services; Velvet explicitly named.
Queue C++ election: [ADR-0076](https://internal-docs.raxx.app/architecture/adr/0076-queue-phase1-billing-v1-aggressive-12day.html)
Queue vcpkg discipline: queue-cpp-scaffold-review-2026-05-13.md — first Tier 1 service operational lessons; propagates to all future Tier 1 services.
Scale-tier upgrade thresholds (audit write path): [ADR-0063](https://internal-docs.raxx.app/architecture/adr/0063-scale-tier-latency-budget-trigger-upgrade.html) — numeric threshold pattern that this policy follows.
ADR-0011: Superseded (premium-tier compute); historical reference only.
Sub2 (philosophical rationale doc): blocked on this document.
Sub3 (SDLC template updates): blocked on sub2.