Skip to main content

RAG Chunking Visualizer

Visualize chunk size and overlap on your document — chunk count, waste and query cost.

100% client-side⌗ exact o200k tokenizer⌁ zero uploads
Never leaves your browser
120
20

Loading tokenizer… counts appear in a moment.

Document split — overlap shaded brighter

token 0token 459
Document tokens
459
Tokens embedded (incl. overlap)
539
Duplication waste
17.4%
One-time embedding cost
$0.0000

Embedding cost uses an illustrative $0.02/1M-token rate — embedding models are not in our verified pricing dataset; check your provider's rate. Token positions are computed from the exact o200k count; chunk boundaries shown are positional, not semantic.

o200k
exact GPT tokenizer, in-browser
≈3.6
chars/token Claude estimate, documented
18
models in the cost dataset
0
network requests per keystroke

How it works

Chunking is the most consequential decision in a RAG pipeline and the least examined. Chunk size determines what a retrieval hit actually contains; overlap determines whether facts that straddle boundaries survive; and together they set the embedding bill, the vector count, and the per-query prompt cost. Most teams copy a tutorial's 1,000/200 settings and never look again. This visualizer is the looking.

Paste a document — a realistic deployment runbook is prefilled — and the page tokenizes it with the exact o200k_base encoder, locally in your browser. The horizontal bar is your document laid out in token space, segmented into chunks at your chosen size, with each chunk's overlap region shaded brighter where it duplicates the tail of its neighbor. Drag the sliders and watch the trade-off move: bigger chunks mean fewer segments and less duplication; more overlap means brighter seams and a higher waste percentage.

The numbers below the bar are the ones that end up on invoices. Total embedded tokens counts every chunk at full size, so overlap is paid for — a 20-token overlap on 120-token chunks runs roughly 20% duplication, which surprises people who think of overlap as free insurance. The one-time embedding cost prices that total at a flat $0.02 per million tokens, explicitly labeled illustrative: embedding models are not in our verified pricing dataset, so substitute your provider's real rate before budgeting.

Stated assumptions: chunks are fixed-size token windows (real splitters snap to sentence or paragraph boundaries, which barely changes the totals), the document is embedded once (re-index churn multiplies the cost), and query-time costs are out of scope here — retrieved chunks become input tokens, which the Embeddings Cost Calculator and Token Cost Calculator price.

A workflow that works: visualize your most representative document here, pick the smallest chunk size that keeps its sections intact in single chunks, set overlap to 10-15% of that, and then validate with real queries before scaling the index. Chunking mistakes are cheap to fix before you embed a million documents and expensive after.

Frequently asked questions

What chunk size should I use for RAG?

Most production pipelines land between 200 and 800 tokens. Small chunks retrieve precisely but lose surrounding context, so answers cite fragments that need neighbors to make sense; large chunks preserve context but dilute the embedding — one vector has to represent several topics, which hurts retrieval ranking. Start near 400, evaluate on your own queries, and adjust from evidence.

Why use overlap at all, and how much?

Overlap protects against a fact straddling a chunk boundary, where neither chunk contains the complete statement and retrieval misses it. Ten to twenty percent of chunk size is the common range. The cost is real and this tool quantifies it: overlapping tokens are embedded twice, stored twice, and the waste percentage shown is exactly that duplication.

How is the embedding cost calculated, and why is it labeled illustrative?

Total embedded tokens — document tokens plus overlap duplication — times a flat $0.02 per million tokens. We label it illustrative because embedding models are not part of our verified pricing dataset; the rate is in the range of current small embedding models, but you should substitute your provider's actual figure. The token math, by contrast, is exact o200k counts.

Are the chunk boundaries shown where a real splitter would cut?

No — they are positional. The visualizer divides the document's exact token count into fixed-size windows, which is what naive fixed-size chunking does. Production splitters usually prefer paragraph or sentence boundaries (recursive character splitting), which shifts boundaries slightly but leaves chunk count, overlap waste and cost almost identical for a given size and overlap setting.

Does chunking also affect query-time cost?

Substantially. Every retrieved chunk is pasted into the prompt as input tokens, so retrieving five 800-token chunks costs 4,000 input tokens per query before the question itself. Smaller chunks with a higher retrieval count give you finer control over that budget. Pair this tool with the Embeddings Cost Calculator to model the full pipeline including query volume.

Built by FORG — AI cost observability for agentic coding. Free tools, no signup, nothing leaves your browser.

Learn about FORG