Question 1

Why does pretty-printed JSON cost more tokens than minified JSON?

Accepted Answer

Indentation. Every level of nesting adds two or more spaces to every line, plus a newline, and those whitespace runs become real tokens. On a typically nested payload the pretty version runs 20-50% more tokens than the minified version of byte-identical data. The model gains nothing from the formatting — it parses both equally well.

Question 2

Is YAML always cheaper than JSON?

Accepted Answer

No, and this tool shows the honest answer for a realistic payload. YAML drops braces, brackets and most quotes, which helps, but it pays for structure with newlines and indentation that JSON's minified form avoids entirely. For flat or shallow data YAML often beats pretty JSON yet loses to minified JSON. Measure your own shape before standardizing.

Question 3

Does format choice affect model output quality?

Accepted Answer

Sometimes, and you should weigh it. Models have seen enormous amounts of JSON in training and follow JSON schemas reliably; YAML's significant whitespace gives models slightly more room to produce subtly invalid output. If you need structured output back, JSON with a schema is the safer ask — pay the token premium on input only by minifying what you send.

Question 4

How is the monthly dollar figure calculated?

Accepted Answer

Each format's exact o200k token count is priced as fresh input at the selected model's verified per-million-token rate, multiplied by your payloads-per-day and our 30.44 days-per-month convention. Output tokens, caching and batching are excluded so the comparison isolates the one variable you control here: serialization format.

Question 5

Are the token counts exact?

Accepted Answer

Yes for the GPT family: the page runs the real o200k_base encoding via js-tiktoken, entirely in your browser, on each rendered format. Claude's tokenizer is not public, so Claude-priced rows use the same o200k counts — relative format rankings hold across tokenizers even when absolute counts shift a few percent.

Structured Data Token Overhead

How it works

Frequently asked questions

Why does pretty-printed JSON cost more tokens than minified JSON?

Is YAML always cheaper than JSON?

Does format choice affect model output quality?

How is the monthly dollar figure calculated?

Are the token counts exact?

Related tools

Markdown Token Heatmap

Token Counter

Prompt Compressor

Tool Call Overhead Calculator