Embeddings Cost Calculator
Price your RAG pipeline: document indexing cost plus monthly query embedding spend.
one-time to index 10k docs (40k chunks) on OpenAI text-embedding-3-small, plus $0.19/month to embed 5k queries/day.
- Chunks per doc
- 4
- Total chunks
- 40k
- Tokens embedded (index)
- 20.5M
- Query tokens / month
- 9.7M
- Full re-index cost
- $0.41
- First-year total
- $2.75
Re-indexing note: switching embedding models or changing chunk size means re-embedding everything — budget the $0.41 index cost again, not just the delta. Assumes 64-token average queries.
How it works
Every RAG pipeline has two embedding bills: a one-time cost to index the corpus and a recurring cost to embed queries. This calculator prices both from first principles — document count, average length, chunk size, overlap and your embedding model — so you can budget a retrieval feature before writing any code. Everything computes in your browser as you type.
The chunk math is where intuition fails. Overlap means each chunk advances by chunk_size × (1 − overlap) tokens, so chunks per document is the document length divided by that effectivestride — at 512 tokens and 10% overlap, a 1,500-token document needs 4 chunks, not 3, and you embed slightly more tokens than your corpus contains. Index cost is total chunks × chunk size × the model's per-million rate. Query cost assumes 64-token average queries across a 30.44-day month; real queries in chat-style RAG often run longer once you embed conversational context, so treat it as a floor.
The headline result for most teams is how cheap indexing is: ten thousand documents at $0.02 per million tokens costs well under a dollar. The real money hides downstream — in re-indexing and in chat tokens. Changing embedding model or chunking strategy re-embeds the whole corpus, which is fine at 10k documents and a budget line at 10 million. And every retrieved chunk becomes input tokens to your chat model at rates 100×+ the embedding price, so a retrieval setup that stuffs five chunks where two would do costs far more in chat spend than the entire embedding pipeline.
Embedding prices here are maintained alongside our chat-model dataset (last verified 2026-06-11) and shown per model in the selector. What this tool cannot see is your production behavior: actual query volume, retrieval counts, and the chat-side token cost of the chunks you inject. FORG measures that end-to-end spend per session in real traffic, which is how you find out whether your RAG feature costs what this estimate says it should. The share link preserves your exact scenario for design reviews.
Frequently asked questions
What is the chunk size tradeoff in RAG?
Smaller chunks (128-256 tokens) retrieve more precisely but multiply chunk count, raising index cost and forcing more chunks into the prompt to reconstruct context. Larger chunks (1k-2k) are cheaper to index and carry more context each, but retrieval gets fuzzier and you pay more chat-model input tokens per retrieved chunk. 512 tokens with ~10% overlap is the boring, defensible default for prose and docs.
What forces a full re-embedding of my corpus?
Three things: switching embedding models (vectors from different models live in different spaces and cannot be compared), changing chunk size or overlap (the chunks themselves change), and content updates (only the changed documents, if you track them). The first two mean re-embedding everything — which is why the calculator surfaces the full re-index cost rather than letting it hide.
Which embedding model is cheapest, and is cheapest wise?
OpenAI's text-embedding-3-small and Voyage 3.5 Lite at $0.02 per million tokens are the budget anchors — an entire 10k-document corpus indexes for pennies. Larger models like text-embedding-3-large ($0.13/M) buy measurably better retrieval on harder corpora. Because indexing is usually a one-time cost dwarfed by chat spend, picking on quality rather than price is normally correct.
What about vector storage and search costs?
This calculator prices embedding API calls only. Storage adds a second bill: a managed vector database charges for stored vectors and queries, roughly proportional to chunk count times embedding dimensions. Self-hosted pgvector or a local index makes storage nearly free at small scale. Chunk count — shown in the results — is the number that drives both bills, so over-chunking hurts twice.
Built by FORG — AI cost observability for agentic coding. Free tools, no signup, nothing leaves your browser.
Learn about FORG