Phantm blog

Field notes on LLM cost and optimization

Practical, measured writing on model routing, prompt caching, output control, and running an LLM gateway in production. Backed by our own benchmarks, not borrowed numbers.

Cornerstone guide

Start here

LLM cost optimization: how to cut LLM and AI API spend

The complete guide to reducing LLM costs: the six levers that move an API bill, real provider pricing, worked math, and the trade-offs, built on a 13,491-request evaluation.

Phantm · June 25, 2026 · 14 min read

Read the guide →

47.1%

cost reduction, measured across 13,491 requests

Free tool

LLM cost calculator

Enter your workload. We estimate your current monthly API spend, then apply the optimization levers from the guide and show what each one saves.

Requests per month

Model you use today

Input tokens / request

Output tokens / request

We apply typical results in the background: about half of input is reused context that gets cached, output is trimmed roughly 30 percent, and 60 percent of traffic routes to a cheaper model. Read the methodology.

Current spend

With Phantm

Estimated monthly savings

about $0 per year

Where the savings come from

Prompt caching$0

Output control$0

Model routing$0

A heuristic estimate on June 2026 list prices, not a quote. Want the real number on your own traffic? Start a free pilot →

Latest

Guide

LLM cost optimization: the complete guide

Every lever that reduces LLM costs, with real pricing, worked math, and measured results.

June 25, 202614 min

Model routing

Does model routing actually save money?

How difficulty classifiers and up-routing work, what the savings really are, and where routing breaks down.

Coming soon

Caching

OpenAI vs Anthropic prompt caching

The mechanics and economics of both, plus the real cache-hit rates we measured in production.

Coming soon

Output

How to reduce output token costs

Why output is the dominant cost, and the controls that cut it without hurting answers.

Coming soon

Methodology

Proving LLM quality held: TOST

How to show an optimization did not degrade quality, using formal equivalence testing.

Coming soon

Research

The Phantm evaluation report

A 13,491-prompt evaluation of the pipeline, with full methodology and per-stage results.

May 2026Report