Question 1

What is the difference between cache write and cache read pricing?

Accepted Answer

Writing a prefix into the cache costs 25% more than a normal input token on Anthropic models — Sonnet input is $3/M, a cache write is $3.75/M. Reading it back costs a tenth of the input rate: $0.30/M. So the first call pays a premium and every subsequent hit pays 90% less, which is why caching only pays off after a couple of calls.

Question 2

How do cache TTL rules work, and why do they matter so much?

Accepted Answer

Anthropic's default cache TTL is 5 minutes, refreshed on every read. If your calls arrive more than 5 minutes apart, the cache has always expired by the next call — so every call pays the 25% write premium and you save nothing. The calculator flags this state explicitly. A 1-hour TTL tier exists at a higher write price for genuinely sparse traffic.

Question 3

When does prompt caching actually lose money?

Accepted Answer

Two cases: call intervals longer than the TTL (every call is a write at 1.25× the input rate), and prefixes used exactly once or twice (the write premium never amortizes). The breakeven is fast — typically 2 calls — but a cron job that fires every 10 minutes against a 5-minute TTL pays the premium forever and would be cheaper uncached.

Question 4

Do OpenAI and Google price caching the same way?

Accepted Answer

No. OpenAI caches automatically with no write premium — repeated prefixes over 1,024 tokens are billed at roughly a tenth of the input rate with zero configuration. Google's Gemini offers explicit cached content with storage billed per hour. This calculator models Anthropic's explicit write/read scheme because it is the one where you can actively lose money with a wrong setup.

Question 5

How big should my cached prefix be for this to matter?

Accepted Answer

It scales linearly: a 2k-token system prompt at 60 calls/hour saves a few dollars a month, while a 50k-token project context in an agentic coding tool saves hundreds. Agent tools like Claude Code resend large stable prefixes every turn, which is why their effective cache hit rates of 60–90% dominate the economics. Prices last verified 2026-06-11.

Prompt Caching ROI Calculator

How it works

Frequently asked questions

What is the difference between cache write and cache read pricing?

How do cache TTL rules work, and why do they matter so much?

When does prompt caching actually lose money?

Do OpenAI and Google price caching the same way?

How big should my cached prefix be for this to matter?

Related tools

Token Cost Calculator

Batch API Savings Calculator

Agent Session Cost Estimator

AI Model Pricing Comparison