Skip to main content

Prompt Caching ROI Calculator

Find the breakeven point where prompt caching starts saving you money, per provider.

100% client-side⛁ prices verified 2026-06-11⌁ zero network calls
tokens
60
8h
$308.88

saved per month on Claude Sonnet 4.5$41.79 cached vs $350.67 uncached, on the 8k-token prefix alone. Caching pays for itself after 2 calls.

Monthly bill, write vs read

Cache writes / mo
$7.31
Cache reads / mo
$34.48
Calls / month
14.6k
Call interval
60s · within TTL ✓
18
models priced, 4 vendors
2026-06-11
prices verified against vendor pages
90d
price staleness tripwire in CI
0
network requests per keystroke

How it works

Prompt caching is the highest-leverage cost optimization for any workload that re-sends a stable prefix — a system prompt, tool definitions, project instructions. This calculator models Anthropic's explicit caching scheme: you pay a 25% premium to write a prefix into the cache, then 90% less every time you read it back. Enter your prefix size and call rate and the monthly savings, breakeven point and write-versus-read split compute instantly, entirely in your browser.

The math: an uncached month costs calls × prefix_tokens × input_rate ÷ 1M. The cached month replaces most of those with reads at the cache-read rate, plus periodic writes at 1.25× the input rate. We assume one cache write per active hour — conservative, since the TTL refreshes on every read, but deploys and restarts invalidate caches in practice. Breakeven uses the exact write-premium-versus-read-saving formula: on Sonnet an 8k-token prefix pays for its write after the second call.

The trap this tool exists to catch: the 5-minute TTL. If your calls arrive further apart than the TTL, the cache has always expired before the next call — every single call pays the write premium, and caching makes your bill higher, not lower. A monitoring job polling every 10 minutes is the classic case. The calculator switches to an explicit warning state when your call interval exceeds the TTL, because this failure mode is silent on your invoice: nothing errors, you just quietly pay 25% extra on every prefix.

What this deliberately leaves out: OpenAI's automatic caching (no write premium, so the ROI question barely exists), the 1-hour TTL tier, and multi-breakpoint cache layouts where you cache tools and context separately. Real hit rates also depend on request routing — concurrent requests can race the first write. FORG measures actual cache hit rates per session from your live traffic, so you can compare the estimate here against what your agents really achieve. Prices verified 2026-06-11 against vendor pricing pages.

Frequently asked questions

What is the difference between cache write and cache read pricing?

Writing a prefix into the cache costs 25% more than a normal input token on Anthropic models — Sonnet input is $3/M, a cache write is $3.75/M. Reading it back costs a tenth of the input rate: $0.30/M. So the first call pays a premium and every subsequent hit pays 90% less, which is why caching only pays off after a couple of calls.

How do cache TTL rules work, and why do they matter so much?

Anthropic's default cache TTL is 5 minutes, refreshed on every read. If your calls arrive more than 5 minutes apart, the cache has always expired by the next call — so every call pays the 25% write premium and you save nothing. The calculator flags this state explicitly. A 1-hour TTL tier exists at a higher write price for genuinely sparse traffic.

When does prompt caching actually lose money?

Two cases: call intervals longer than the TTL (every call is a write at 1.25× the input rate), and prefixes used exactly once or twice (the write premium never amortizes). The breakeven is fast — typically 2 calls — but a cron job that fires every 10 minutes against a 5-minute TTL pays the premium forever and would be cheaper uncached.

Do OpenAI and Google price caching the same way?

No. OpenAI caches automatically with no write premium — repeated prefixes over 1,024 tokens are billed at roughly a tenth of the input rate with zero configuration. Google's Gemini offers explicit cached content with storage billed per hour. This calculator models Anthropic's explicit write/read scheme because it is the one where you can actively lose money with a wrong setup.

How big should my cached prefix be for this to matter?

It scales linearly: a 2k-token system prompt at 60 calls/hour saves a few dollars a month, while a 50k-token project context in an agentic coding tool saves hundreds. Agent tools like Claude Code resend large stable prefixes every turn, which is why their effective cache hit rates of 60–90% dominate the economics. Prices last verified 2026-06-11.

FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.

Start tracking with FORG