Phantm blog

Field notes on LLM cost and optimization

Practical, measured writing on model routing, prompt caching, output control, and running an LLM gateway in production. Backed by our own benchmarks, not borrowed numbers.

LLM cost calculator

Enter your workload. We estimate your current monthly API spend, then apply the optimization levers from the guide and show what each one saves.

We apply typical results in the background: about half of input is reused context that gets cached, output is trimmed roughly 30 percent, and 60 percent of traffic routes to a cheaper model. Read the methodology.

Current spend
$0
With Phantm
$0
Estimated monthly savings
$0
about $0 per year

Where the savings come from

Prompt caching$0
Output control$0
Model routing$0
A heuristic estimate on June 2026 list prices, not a quote. Want the real number on your own traffic? Start a free pilot →
Guide

LLM cost optimization: the complete guide

Every lever that reduces LLM costs, with real pricing, worked math, and measured results.

June 25, 202614 min
Model routing

Does model routing actually save money?

How difficulty classifiers and up-routing work, what the savings really are, and where routing breaks down.

Coming soon
Caching

OpenAI vs Anthropic prompt caching

The mechanics and economics of both, plus the real cache-hit rates we measured in production.

Coming soon
Output

How to reduce output token costs

Why output is the dominant cost, and the controls that cut it without hurting answers.

Coming soon
Methodology

Proving LLM quality held: TOST

How to show an optimization did not degrade quality, using formal equivalence testing.

Coming soon
Research

The Phantm evaluation report

A 13,491-prompt evaluation of the pipeline, with full methodology and per-stage results.

May 2026Report