Question 1

How many words is 100k tokens?

Accepted Answer

For English prose, roughly 75,000 words — about 150 single-spaced pages, or a short novel. The conversion uses the standard ≈4 characters per token and ≈5.1 characters per word averages for English. Code, other languages, and heavily formatted text all shift this ratio, which is why the converter lets you pick a content type instead of assuming everything is English prose.

Question 2

Why does the ratio change by language?

Accepted Answer

Tokenizer vocabularies are trained mostly on English, so English words frequently map to single tokens while other languages fragment more. German compounds split into several pieces, and CJK languages can run below two characters per token because each character often becomes its own token. The practical effect: the same document translated to Japanese can cost two times more tokens than the English original.

Question 3

How does source code convert?

Accepted Answer

Code is denser than prose — roughly 3 characters per token versus 4 for English. Braces, operators, underscores and short identifiers each tend to tokenize separately, and indentation whitespace adds up. A 1,000-line source file at 40 characters per line is around 13k tokens, which is why pulling a handful of files into an agent's context fills a window faster than people expect.

Question 4

What does a 'page' assume here?

Accepted Answer

About 500 words single-spaced, or 3,000 characters — the common publishing convention. Real pages vary with font, margins and spacing, so treat page counts as a mental model rather than a layout prediction. The useful intuition is the scale: one printed page is roughly 750 tokens, so a 200k window holds about 270 pages of English prose.

Question 5

Why is this approximate when token counters are exact?

Accepted Answer

An exact count requires running the actual tokenizer over actual text — which our Token Counter does, locally in your browser. This converter answers the inverse question: you have a quantity (a word-count target, a context budget, a document length) and want the equivalent in other units before any text exists. Average ratios are the only honest way to do that, and we state them on the page.

Item	≈ tokens
Tweet (280 chars)	70
Typical email	300
One printed page	750
Long blog post	3k
Academic paper	12k
Average novel	120k

Tokens to Words Converter

How it works

Frequently asked questions

How many words is 100k tokens?

Why does the ratio change by language?

How does source code convert?

What does a 'page' assume here?

Why is this approximate when token counters are exact?

Related tools

Token Counter

Context Window Comparison

Token Cost Calculator

Markdown Token Heatmap