Question 1

What is the chunk size tradeoff in RAG?

Accepted Answer

Smaller chunks (128-256 tokens) retrieve more precisely but multiply chunk count, raising index cost and forcing more chunks into the prompt to reconstruct context. Larger chunks (1k-2k) are cheaper to index and carry more context each, but retrieval gets fuzzier and you pay more chat-model input tokens per retrieved chunk. 512 tokens with ~10% overlap is the boring, defensible default for prose and docs.

Question 2

What forces a full re-embedding of my corpus?

Accepted Answer

Three things: switching embedding models (vectors from different models live in different spaces and cannot be compared), changing chunk size or overlap (the chunks themselves change), and content updates (only the changed documents, if you track them). The first two mean re-embedding everything — which is why the calculator surfaces the full re-index cost rather than letting it hide.

Question 3

Which embedding model is cheapest, and is cheapest wise?

Accepted Answer

OpenAI's text-embedding-3-small and Voyage 3.5 Lite at $0.02 per million tokens are the budget anchors — an entire 10k-document corpus indexes for pennies. Larger models like text-embedding-3-large ($0.13/M) buy measurably better retrieval on harder corpora. Because indexing is usually a one-time cost dwarfed by chat spend, picking on quality rather than price is normally correct.

Question 4

What about vector storage and search costs?

Accepted Answer

This calculator prices embedding API calls only. Storage adds a second bill: a managed vector database charges for stored vectors and queries, roughly proportional to chunk count times embedding dimensions. Self-hosted pgvector or a local index makes storage nearly free at small scale. Chunk count — shown in the results — is the number that drives both bills, so over-chunking hurts twice.

Embeddings Cost Calculator

How it works

Frequently asked questions

What is the chunk size tradeoff in RAG?

What forces a full re-embedding of my corpus?

Which embedding model is cheapest, and is cheapest wise?

What about vector storage and search costs?

Related tools

Token Cost Calculator

Batch API Savings Calculator

Self-Host vs API Calculator

AI Model Pricing Comparison