Skip to main content

Context Window Visualizer

See how your system prompt, history and files fill a 200k window — and when you overflow.

100% client-side⌗ exact o200k tokenizer⌁ zero uploads

Context components

tok/turn
51%

of Claude Sonnet 4.5's 200k window used (102k tokens). 98k free. At +4k/turn you overflow in ~25 turns.

Window fill

  • System prompt8k · 8%
  • Tool definitions4k · 4%
  • Conversation history30k · 29%
  • Files in context60k · 59%

Token figures are whatever you enter — measure each component with the Token Counter for exact numbers. Overflow projection assumes linear growth.

o200k
exact GPT tokenizer, in-browser
≈3.6
chars/token Claude estimate, documented
18
models in the cost dataset
0
network requests per keystroke

How it works

A context window is a hard budget, but most developers only discover how theirs is spent when a session breaks. This visualizer makes the budget explicit: list each component that ships with your calls — system prompt, tool definitions, conversation history, files — with its token size, pick a model, and see a stacked bar of the window with the exact percentage used and how many tokens remain.

The overflow projection is the part that prevents incidents. Agent sessions do not have a fixed context size; history grows every turn as new messages and tool results accumulate. Enter your average growth per turn and the tool projects how many turns remain before you hit the wall. If that number is smaller than a typical working session, you need a compaction strategy before users need it for you — not after the first truncated session ships a wrong answer.

The math is deliberately simple and stated plainly: components are summed, divided by the selected model's published window size, and the overflow turn is free space divided by growth per turn, assuming linear growth. Real sessions grow unevenly — a single large file read can add 30k tokens in one turn — so treat the projection as a central estimate, not a guarantee. Window sizes come from our models dataset, checked against vendor documentation on 2026-06-11.

What the numbers usually teach: tool definitions cost more than people expect (a rich MCP setup easily eats 5–10k tokens before the first user message), and file contents dominate agentic coding sessions. Remember the cost dimension too — a 150k-token context re-sent for 30 turns is millions of billed input tokens per session. The Session Cost Estimator prices that loop, and FORG measures it live from your real traffic so the budget you sketch here matches what production actually does. Sketch the budget once, then re-check it whenever you add a tool, a bigger system prompt, or a new file-heavy workflow — windows fill by accretion, exactly like the prompts inside them.

Frequently asked questions

What actually fills a context window?

Everything the model reads on each call: the system prompt, tool and function definitions, the full conversation history, file contents pulled into context, and retrieved documents. In agentic coding sessions the biggest line items are usually file contents and accumulated history — tool results from earlier turns get re-sent every single turn unless something compacts them away.

What happens when I overflow the window?

It depends on the client. Raw API calls fail with a validation error when input exceeds the window. Agent harnesses like Claude Code instead compact: they summarize or drop older history to make room, which silently loses detail. Either way, performance degrades well before the hard limit — retrieval accuracy drops as windows fill, which is exactly what our Context Rot Simulator charts.

How do I find the token size of each component?

Paste each piece — system prompt, a representative file, your tool JSON — into our Token Counter, which runs the real o200k tokenizer locally in your browser. As rules of thumb: English prose runs about 4 characters per token, source code about 3, and JSON tool schemas are dense because every brace, quote and key tokenizes separately.

What are the main compaction strategies?

Four cover most cases: summarize old turns into a short digest, truncate oversized tool results before they enter history, move reference material out of context into retrieval so only relevant chunks are loaded, and reset the session at natural task boundaries. Each trades recall for headroom; the right mix depends on whether your sessions die from history growth or from file bloat.

Does a bigger window solve this?

Partially, and at a price. A 1M-token window delays overflow but every input token is billed on every turn, so filling a huge window makes each call proportionally expensive. Long-context accuracy also degrades before the limit. Budgeting the window deliberately is usually cheaper and more reliable than buying a bigger one and filling it.

FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.

Start tracking with FORG