Context Window Comparison

Every major model's context window, max output, price and speed in one sortable table.

100% client-side⌗ exact o200k tokenizer⌁ zero uploads

Context window and spec comparison for major AI models
Model
Gemini 3.5 Flashlargestgoogle · frontier	1M	65.5k	$1.5	$9	160
Gemini 3.1 Progoogle · frontier	1M	65.5k	$2	$12	80
Gemini 3 Flashgoogle · mid	1M	65.5k	$0.5	$3	180
Gemini 3.1 Flash-Litegoogle · small	1M	65.5k	$0.25	$1.5	250
Gemini 2.5 Progoogle · frontier	1M	65.5k	$1.25	$10	85
Claude Fable 5anthropic · frontier	1M	64k	$10	$50	60
Claude Opus 4.8anthropic · frontier	1M	64k	$5	$25	55
Claude Sonnet 4.6anthropic · frontier	1M	64k	$3	$15	78
DeepSeek V4 Flashdeepseek · mid	1M	384k	$0.14	$0.28	70
DeepSeek V4 Prodeepseek · frontier	1M	384k	$0.435	$0.87	55
GPT-5.5openai · frontier	400k	128k	$5	$30	85
GPT-5.5 Proopenai · frontier	400k	128k	$30	$180	45
GPT-5.4openai · frontier	400k	128k	$2.5	$15	90
GPT-5.4 miniopenai · mid	400k	128k	$0.75	$4.5	150
GPT-5.4 nanoopenai · small	400k	128k	$0.2	$1.25	210
GPT-5.3 Codexopenai · mid	400k	128k	$1.75	$14	95
Claude Sonnet 4.5anthropic · frontier	200k	64k	$3	$15	78
Claude Haiku 4.5anthropic · mid	200k	64k	$1	$5	130

Specs checked 2026-06-11 · sources: vendor docs and pricing pages. Tok/sec are median public benchmark figures — expect variance.

o200k

exact GPT tokenizer, in-browser

≈3.6

chars/token Claude estimate, documented

models in the cost dataset

network requests per keystroke

How it works

Context windows are the most-quoted and least-understood model spec. This table puts every major model side by side — total window, maximum output per response, current pricing and streaming speed — sortable by any column, with the data verification date shown (2026-06-11).

Two distinctions trip people up. First, window versus max output: a 200k window does not mean 200k of generation — output caps are typically 8-128k, and input plus output must fit the window together. Second, advertised versus usable context: published needle-in-a-haystack benchmarks show retrieval accuracy degrading well before windows fill, especially for facts buried mid-context. A million-token window is a real capability, but treating it as a database is how agents start hallucinating file contents they read an hour ago.

The cost dimension is on the same table because window and price interact: carrying a big context is not free, it is re-billed every call. At Claude Sonnet rates, holding 150k tokens of session history costs about $0.45 of input per turn before the model writes a word — caching reduces it, compaction eliminates it. The bars next to each window size keep the scale honest; a 1M window is five times Claude's, and the bar shows it.

Speed figures are median public benchmark numbers for streaming throughput and vary with load, region and prompt shape — treat them as ratios between models rather than guarantees. For how your own sessions actually fill these windows, FORG tracks context size per turn across every real session.

When two models look equivalent on this table, the tiebreakers usually live elsewhere: tokenizer efficiency (the same document can differ ten percent in token count between vocabularies), cache pricing, and how gracefully each model degrades as its window fills. The comparison here is the factual baseline; pair it with the Context Rot Simulator for the degradation story and the Model Capability Picker when the decision is about task fit rather than specs.

Frequently asked questions

Which model has the biggest context window in 2026?

Gemini 2.5 and Llama 4 Maverick lead at ~1M tokens, with Claude and GPT-5 families at 200k-400k. Sort the table by context window for the current ranking — and remember that usable context is smaller than advertised context (see context rot).

What is the difference between context window and max output?

The context window is the total the model can hold — your input plus its output combined. Max output caps how much it can generate in one response, and is typically much smaller (8k-128k). A 200k-window model with 64k max output cannot write you a 100k-token document in one call.

Does a bigger context window actually help?

Up to a point. Retrieval accuracy degrades as windows fill — models reliably use the start and end of context but lose facts from the middle ('lost in the middle'). Past roughly 50-70% fill, adding more context often hurts. Bigger windows help most for code, where structure aids retrieval.

Why does context size matter for cost?

You pay input rates on every token in context, every call. An agent that carries 150k tokens of history pays for them on each turn — which is why long sessions cost quadratically, not linearly. Pair this table with our Session Cost Estimator to see the effect.

FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.

Start tracking with FORG

Related tools

Cost & Pricing

Context Window Comparison

How it works

Frequently asked questions

Which model has the biggest context window in 2026?

What is the difference between context window and max output?

Does a bigger context window actually help?

Why does context size matter for cost?

Related tools

AI Model Pricing Comparison

Context Window Visualizer

Model Capability Picker

Streaming Latency Estimator