Field notes on LLM cost and optimization
Practical, measured writing on model routing, prompt caching, output control, and running an LLM gateway in production. Backed by our own benchmarks, not borrowed numbers.
LLM cost calculator
Enter your workload. We estimate your current monthly API spend, then apply the optimization levers from the guide and show what each one saves.
We apply typical results in the background: about half of input is reused context that gets cached, output is trimmed roughly 30 percent, and 60 percent of traffic routes to a cheaper model. Read the methodology.
Where the savings come from
LLM cost optimization: the complete guide
Every lever that reduces LLM costs, with real pricing, worked math, and measured results.
Does model routing actually save money?
How difficulty classifiers and up-routing work, what the savings really are, and where routing breaks down.
OpenAI vs Anthropic prompt caching
The mechanics and economics of both, plus the real cache-hit rates we measured in production.
How to reduce output token costs
Why output is the dominant cost, and the controls that cut it without hurting answers.
Proving LLM quality held: TOST
How to show an optimization did not degrade quality, using formal equivalence testing.
The Phantm evaluation report
A 13,491-prompt evaluation of the pipeline, with full methodology and per-stage results.