NewTier T0 cache · 18× cheaper repeat queries

Stop overpaying
for every LLM call.

Iudex Route scores each request, routes it to the cheapest model that can answer it, and ships per-query cost, latency, and tier telemetry to one dashboard. One OpenAI-compatible endpoint.

  • Average 47.2% spend reduction
  • Sub-30ms routing decision
  • 5 tiers — Cache → Opus 4.7
  • OpenAI, Anthropic, Google, Meta
Get $5 free creditRead docsNo card · Cancel anytime
ARDFHKPM
1,924 teams routing through Iudex Route this week
POST api.acbm.ai/v1/chat/completions
live
$ curl api.acbm.ai/v1/chat/completions
-H "Authorization: Bearer acbm_••••"
-d '{ "model": "auto", "messages": [
{ "role": "user", "content": "Format this address into JSON" }
]}'
200 OK·184msrouting decision · 28ms
T1·gpt-4.1-nanoopenai$0.00018
difficulty0.12
T0
T1
T2
T3
T4
47.2%
Avg cost reduction
observed across 1,924 teams
28ms
Routing decision
p95 ahead of upstream call
5
Tiers
cache → Opus 4.7 / Deep Think
1
OpenAI-compatible endpoint
drop-in for any client
Routes traffic to
OpenAIAnthropicGoogle AIMetaMistralCoherexAITogetherDeepSeekGroqOpenAIAnthropicGoogle AIMetaMistralCoherexAITogetherDeepSeekGroq
The hot path

From token to tier in 28 milliseconds.

Five-stage pipeline. No queue, no batch. Every request flows through the same path so latency stays predictable — and so does your bill.

01
Score

Hybrid difficulty estimator runs in 8–24ms — token shape, embedding cluster, domain hint, and historical priors.

02
Budget

Allocator checks remaining monthly quota, request budget, and per-route policy — picks the floor tier safely.

03
Route

One of five tiers fires against the cheapest provider that owns that capability band right now.

04
Verify

Optional verifier cascade — schema, contradiction, judge — escalates only when the answer fails its check.

05
Log

Per-query cost, latency, tier, country, and difficulty land in your dashboard. No external observability hookup.

What you get

A router, an observability stack, and a budget guardrail — in one wire.

We do one job — picking the right model — and the dashboard exists so you can audit every pick.

Routing

Difficulty-aware tier selection

Each request is scored 0–1 and assigned a tier. Pin a tier, calibrate per route, or let auto handle it.

prompttiercost
  • Translate to PolishT1$0.00021
  • Pick a SKU from this imageT2$0.00187
  • Re-derive the SQL planT3$0.00412
  • Plan a 7-step agent runT4$0.02140
  • "ok"T0$0.00000
Drop-in

One-line integration

Speaks the OpenAI chat-completions protocol. Anything that talks to GPT, talks to Iudex Route.

integrate
Python
Node
Go
curl
Guardrail

Monthly budget caps + spike alerts

The router self-throttles toward cheaper tiers as you approach the cap. No 4 a.m. surprises.

Pro plan
$68.00 / $200.00
Forecast hits cap on day 23
Telemetry

Per-country breakdown

Origin country is logged on every query — surfaced as latency, cost, and tier maps.

  • US42.0%
  • DE23.0%
  • IN14.0%
  • BR11.0%
  • JP6.0%
  • NG4.0%
Per-tier

Where every dollar lands

Audit any window: spend share, avg cost, queue depth, and the exact model that fired.

T008%
$0.00
cache
T141%
$0.02
haiku · gemini-flash-lite
T227%
$0.18
gpt-4.1-mini · gemini-flash
T318%
$0.91
sonnet · gpt-5 · gemini-pro
T406%
$4.17
opus · deep-think · gpt-5-pro
T008%
$0.00
cache
T141%
$0.02
haiku · gemini-flash-lite
T227%
$0.18
gpt-4.1-mini · gemini-flash
T318%
$0.91
sonnet · gpt-5 · gemini-pro
T406%
$4.17
opus · deep-think · gpt-5-pro
Row-level isolated (RLS)Export CSV / Parquet on Pro
The ladder

Five tiers. Each gets the queries that actually deserve it.

2 + 2 never needs an Opus chain. A novel combinatorics proof shouldn't be answered by a 3B flash model. Iudex Route scores each request 0–1 and picks the floor that still answers it.

T0
Trivial repeats
Cache
Exact-match · templated responses
~$0
T1
Easy
Haiku 4.5 · Gemini Flash-Lite · GPT-4.1 nano
Simple factual · classification · formatting
$0.01–0.05
T2
Moderate
GPT-4.1 mini · Gemini 2.5 Flash
Multi-step reasoning · synthesis
$0.05–0.30
T3
Hard
Sonnet 4.6 · GPT-5 · Gemini 2.5 Pro
Complex reasoning · expert-level
$0.30–2.00
T4
Extreme
Opus 4.7 · GPT-5 Pro · Gemini Deep Think
Novel proofs · research · agentic
$2.00–15.00

Every pick is logged with its tier, difficulty score, latency, and the actual upstream cost. Drill into any query from the dashboard.

Where we sit

Routers move bytes. Gateways count them. Iudex Route picks the model.

Most tools in the LLM-ops aisle either let you call any model, or tell you what calling cost. Iudex Route does the third thing.

Capabilityiudex routeOpenRouterHeliconePortkey
Difficulty-aware tier routing
OpenAI-compatible drop-in
Per-query cost telemetry
Budget cap + auto-downgrade
Verifier cascade
Per-region (country) breakdown
Free tier (no card)$5/mocredits10k req10k logs
Paid plan entry$29/moPAYG$79/mo$49/mo

Source: vendor docs and 2026 LLM-gateway roundups (Braintrust, Inworld, EdenAI). Pricing accurate as of May 2026; check vendor sites for updates.

IudexRoutecutourreasoning-modelspendby47%withoutameasurableaccuracydrop.WewerepayingforOpus4.7onqueriesaflashmodelcouldanswernowtherouterhandlesthatforus,andwecanactuallyseewhereeverydollarwent.

47.2% avg savings

Ship cheaper inference this afternoon.

The free tier covers most prototype work. Change one base URL, keep your existing client, watch the spend chart move.