Skip to main content

Tokens to Words Converter

Convert between tokens, words, characters and pages with per-language ratios.

100% client-side⌗ exact o200k tokenizer⌁ zero uploads

English: ≈ 4 chars/token, ≈ 5.1 chars/word. Pages assume ~500 words single-spaced.

78,431

words ≈ 100k tokens of english — about 133 pages, and $0.30 as input on Claude Sonnet 4.5.

Tokens
100k
Words
78,431
Characters
400k
Pages
133

Common sizes for reference

Item≈ tokens
Tweet (280 chars)70
Typical email300
One printed page750
Long blog post3k
Academic paper12k
Average novel120k

All conversions are approximations from average ratios — exact counts depend on the specific text. For exact counts, use the Token Counter.

o200k
exact GPT tokenizer, in-browser
≈3.6
chars/token Claude estimate, documented
18
models in the cost dataset
0
network requests per keystroke

How it works

Token budgets are everywhere — context windows, pricing pages, rate limits — but humans think in words and pages. This converter translates between tokens, words, characters and pages in any direction: enter a quantity in any unit and every other unit updates instantly, with a per-language ratio applied and a cost line showing what that volume costs as model input at current prices.

The mechanics are transparent. Everything normalizes through characters: English prose averages about 4 characters per token and 5.1 characters per word, so 100k tokens ≈ 400k characters ≈ 78k words ≈ 130 pages. Pick a different content type and the ratios shift — source code runs about 3 characters per token because symbols and identifiers fragment heavily, while Chinese and Japanese drop below 2 because tokenizer vocabularies are English-weighted. These ratios are documented approximations from observed o200k behavior, not exact measurements of your specific text.

The reference table anchors the abstractions in real artifacts: a tweet is about 70 tokens, a printed page about 750, an average novel about 120k. These anchors make budget conversations concrete — when someone proposes shipping a 50k-token system preamble, you can say that is sixty-five pages of instructions and ask whether a model (or a person) actually needs that much standing context to do the job.

When you have the actual text, stop estimating: the Token Counter runs the real o200k tokenizer in your browser and gives exact figures, privately. Use this converter for the planning conversations that happen before text exists — sizing a context budget, scoping a RAG chunk strategy, or sanity-checking a vendor quote denominated in tokens. And once estimates become production traffic, FORG tracks the real token flow per session so your plans stay calibrated against what teams actually consume. The estimate-then-measure loop is the habit worth building: convert here to plan, count exactly when text exists, and reconcile against live telemetry once it ships — three numbers that should agree within a few percent if your assumptions hold.

Frequently asked questions

How many words is 100k tokens?

For English prose, roughly 75,000 words — about 150 single-spaced pages, or a short novel. The conversion uses the standard ≈4 characters per token and ≈5.1 characters per word averages for English. Code, other languages, and heavily formatted text all shift this ratio, which is why the converter lets you pick a content type instead of assuming everything is English prose.

Why does the ratio change by language?

Tokenizer vocabularies are trained mostly on English, so English words frequently map to single tokens while other languages fragment more. German compounds split into several pieces, and CJK languages can run below two characters per token because each character often becomes its own token. The practical effect: the same document translated to Japanese can cost two times more tokens than the English original.

How does source code convert?

Code is denser than prose — roughly 3 characters per token versus 4 for English. Braces, operators, underscores and short identifiers each tend to tokenize separately, and indentation whitespace adds up. A 1,000-line source file at 40 characters per line is around 13k tokens, which is why pulling a handful of files into an agent's context fills a window faster than people expect.

What does a 'page' assume here?

About 500 words single-spaced, or 3,000 characters — the common publishing convention. Real pages vary with font, margins and spacing, so treat page counts as a mental model rather than a layout prediction. The useful intuition is the scale: one printed page is roughly 750 tokens, so a 200k window holds about 270 pages of English prose.

Why is this approximate when token counters are exact?

An exact count requires running the actual tokenizer over actual text — which our Token Counter does, locally in your browser. This converter answers the inverse question: you have a quantity (a word-count target, a context budget, a document length) and want the equivalent in other units before any text exists. Average ratios are the only honest way to do that, and we state them on the page.

Built by FORG — AI cost observability for agentic coding. Free tools, no signup, nothing leaves your browser.

Learn about FORG