Question 1

What chunk size should I use for RAG?

Accepted Answer

Most production pipelines land between 200 and 800 tokens. Small chunks retrieve precisely but lose surrounding context, so answers cite fragments that need neighbors to make sense; large chunks preserve context but dilute the embedding — one vector has to represent several topics, which hurts retrieval ranking. Start near 400, evaluate on your own queries, and adjust from evidence.

Question 2

Why use overlap at all, and how much?

Accepted Answer

Overlap protects against a fact straddling a chunk boundary, where neither chunk contains the complete statement and retrieval misses it. Ten to twenty percent of chunk size is the common range. The cost is real and this tool quantifies it: overlapping tokens are embedded twice, stored twice, and the waste percentage shown is exactly that duplication.

Question 3

How is the embedding cost calculated, and why is it labeled illustrative?

Accepted Answer

Total embedded tokens — document tokens plus overlap duplication — times a flat $0.02 per million tokens. We label it illustrative because embedding models are not part of our verified pricing dataset; the rate is in the range of current small embedding models, but you should substitute your provider's actual figure. The token math, by contrast, is exact o200k counts.

Question 4

Are the chunk boundaries shown where a real splitter would cut?

Accepted Answer

No — they are positional. The visualizer divides the document's exact token count into fixed-size windows, which is what naive fixed-size chunking does. Production splitters usually prefer paragraph or sentence boundaries (recursive character splitting), which shifts boundaries slightly but leaves chunk count, overlap waste and cost almost identical for a given size and overlap setting.

Question 5

Does chunking also affect query-time cost?

Accepted Answer

Substantially. Every retrieved chunk is pasted into the prompt as input tokens, so retrieving five 800-token chunks costs 4,000 input tokens per query before the question itself. Smaller chunks with a higher retrieval count give you finer control over that budget. Pair this tool with the Embeddings Cost Calculator to model the full pipeline including query volume.

RAG Chunking Visualizer

How it works

Frequently asked questions

What chunk size should I use for RAG?

Why use overlap at all, and how much?

How is the embedding cost calculated, and why is it labeled illustrative?

Are the chunk boundaries shown where a real splitter would cut?

Does chunking also affect query-time cost?

Related tools

Embeddings Cost Calculator

Conversation Memory Planner

Context Window Visualizer

Token Counter