Raxx · internal docs

internal · gated

Language Tier Policy

Status: Accepted
Date: 2026-05-18 UTC
Owner: software-architect
Refs: operator decision 2026-05-10 UTC (project_language_tier_philosophy), ADR-0076 (Queue C++ election), docs/architecture/queue-cpp-scaffold-review-2026-05-13.md
Blocked by: nothing
Blocks: sub2 (philosophical doc), sub3 (SDLC template updates)
Issue: #2286


Purpose

This document is a policy, not a philosophy. It defines which language tier a service lives in, the numeric thresholds that trigger re-classification review, the contract requirements a service must satisfy before a Tier 1 rewrite begins, and the approval path for that rewrite.

The philosophical rationale for the two-phase language strategy belongs in sub2 (pending). This document cites operator decisions without re-arguing them.


1. Tier Definitions {#tier-definitions}

Tier 1 — Rust or C++ {#tier-1-criteria}

Tier 1 services are built in Rust or C++. They are expected to live indefinitely without a planned rewrite. The higher engineering cost is accepted in exchange for:

Criteria (any single criterion qualifies for Tier 1 review):

# Criterion Threshold
C-1 Security-sensitive hot path Service handles passkey verification, session token mint/verify, cryptographic key material, or audit hash-chain writes
C-2 P99 latency P99 response latency > 100 ms sustained for 7 consecutive days under normal load
C-3 Throughput demand Service must sustain > 5,000 requests/sec on its primary endpoint
C-4 Memory footprint Resident set size (RSS) > 512 MB under normal operating load for 7 consecutive days
C-5 Rewrite cost justification Python rewrite was already planned (i.e., the service is a known short-term implementation) AND it owns a critical data domain
C-6 Infrastructure cost Dyno/compute cost for the service exceeds $500/month and profiling attributes > 50 % to interpreter overhead

A service meeting C-1 alone qualifies for Tier 1 directly (no additional numeric gate). For C-2 through C-6, the threshold must be sustained, not transient — to prevent reacting to spikes.

Framework guidance (non-binding):

Current Tier 1 services:

Service Language Rationale Status
Queue C++ (Drogon) C-1 (passkey, session, RBAC, audit hash-chain), C-5 (long-term owner of identity domain) In development — ADR-0076

Named Tier 1 candidates (pre-promotion):

Service Primary criterion Notes
Velvet C-1 (secret rotation, key material distribution) Explicitly named by operator 2026-05-10 UTC; Python v1 ships first, Tier 1 rewrite follows
Audit hash-chain verifier C-1 (cryptographic verification on audit read path) Standalone verifier process; scope TBD at sub-card filing
Order-routing hot path C-2/C-3 (latency + throughput at scale) Conditional on v1 volume data; decision deferred to post-v1

Tier 2 — Python {#tier-2-criteria}

Tier 2 services are built in Python. Python is the default for all new services unless a Tier 1 criterion is met.

Tier 2 is not a consolation tier. Python with mypy --strict, Pydantic at data boundaries, and Decimal for money arithmetic is the correct choice for the majority of Raxx services for v1 and well beyond.

Criteria for staying Tier 2:

Style requirements for Tier 2 services:

Current Tier 2 services:

Service Notes
Raptor (backend_v2/) Trade execution orchestration, historical data, backtest engine
Console Operator UI and admin API; Python Flask
Velvet (v1) Secret rotation service; Python v1 pending Tier 1 promotion
Reasonator Sentiment/pattern scoring; Python
getraxx marketing site Static/thin server; Python or Node

2. Promotion Thresholds {#promotion-thresholds}

Promotion from Tier 2 to Tier 1 is triggered when any of the Tier 1 criteria in §1 is met.

Monitoring obligations

For each Tier 2 service in production, the service owner must instrument and report the following metrics to Sentry:

Metric Instrument Threshold that triggers review
svc.<name>.p99_latency_ms Sentry performance transaction > 100 ms sustained 7 days
svc.<name>.rss_mb Sentry custom measurement, sampled every 5 min > 512 MB sustained 7 days
svc.<name>.dyno_cost_usd_mo Manual review in monthly cost audit > $500/mo with > 50 % interpreter overhead
Security incident count Filed as severity:high or severity:critical issues Any incident where interpreter-level behavior (GIL, type coercion, dynamic dispatch) is a named contributing factor

Thresholds are soft guidelines by default. The operator may override any threshold by editing this document directly. The override must note the service name and rationale.

Sustained-measurement definition

"Sustained 7 consecutive days" means: the metric is above the threshold on each of 7 calendar days as measured by the daily P99 (not momentary peak). A single-day spike does not trigger review; 7 consecutive days does.


3. Promotion Decision Flow {#promotion-decision-flow}

Tier 2 service hits threshold
        │
        ▼
Service owner files a promotion-review issue
  - Labels: area:architecture, type:decision, lang-tier:review
  - Body: metric evidence (7-day chart), affected criterion, proposed target language
        │
        ▼
software-architect reviews within 5 business days
  - Confirms or disputes threshold evidence
  - Drafts an ADR (ADR number assigned at this step)
        │
        ▼
Operator reviews ADR
  - Approves rewrite scope, language, and timeline
  - Signs off on parallel-implementation strategy (see §4)
        │
        ▼
ADR filed as Accepted → sub-cards filed → rewrite dispatched

Who approves: The operator (Kristerpher) is the sole approver of Tier 1 promotions. The software-architect may recommend but not unilaterally promote.

Timeline expectation: From threshold hit to rewrite dispatch is expected to take 2–4 weeks (issue → ADR → approval → sub-cards). The Tier 2 service remains live throughout; there is no freeze.


4. Contract Requirements for Tier 1 Services {#contract-requirements}

Before a Tier 1 rewrite begins, the Tier 2 service must have a stable, language-agnostic API contract. This contract is the rewrite's target; it is not defined during the rewrite.

Required artifacts before rewrite start

  1. OpenAPI 3.1 spec (or equivalent typed schema) covering all endpoints the rewrite must implement. Spec is committed to docs/architecture/<service>/api-contract.md or as a openapi.yaml file.

  2. Behavioral test suite — integration tests that run against the HTTP interface, not the Python internals. Tests must be framework-agnostic (i.e., they pass against both the Tier 2 and Tier 1 implementation). These tests are the parity gate for the rewrite.

  3. Contract-test CI job — a CI step that runs the behavioral suite against the Tier 1 implementation. The rewrite is not "done" until this job is green.

  4. Data migration plan — if the Tier 2 service owns a DB, the migration strategy (schema compatibility, dual-write window, cutover) must be documented before the rewrite is dispatched.

Parallel implementation model

Tier 1 rewrites ship as parallel implementations behind the same API contract, not in-place rewrites. The pattern:

  1. Tier 2 service continues to serve production.
  2. Tier 1 implementation is developed and deployed to staging under the same URL prefix (e.g., behind a feature flag or a header-based router).
  3. Behavioral parity tests pass on Tier 1 staging.
  4. Traffic is shifted to Tier 1 via feature flag (canary → 100 %).
  5. Tier 2 service is decommissioned after a soak period (minimum 14 days at 100 % traffic).

Velvet's transition model is an explicit example of this pattern: parallel Rust implementation behind the same rotation API contract, not an in-place rewrite of the Python service.


5. Cross-Language Testing Posture {#cross-language-testing}

The behavioral parity gate (§4) is the primary cross-language testing mechanism. Additional requirements:


6. Rewrite Delegation {#rewrite-delegation}

Per operator decision 2026-05-10 UTC: Tier 1 rewrites are likely delegated to a separate agent or model purpose-built for the target language (e.g., a C++ specialist agent for Queue). The dispatching agent does not write the Tier 1 service; it files the ADR, scopes the sub-cards, and hands off.

Delegation constraints:


7. Override Log {#override-log}

Operator threshold overrides are recorded here as they are applied.

Date Service Metric Default threshold Override value Rationale
No overrides applied

8. Cross-References {#cross-references}