Question 1

Why budget a system prompt at all?

Accepted Answer

Because the system prompt is re-sent on every single API call, forever. A 2,000-token preamble at 10,000 calls a day is 20 million input tokens daily before anyone types a word. Setting an explicit budget turns prompt growth from invisible accretion into a deliberate trade-off you can see and defend in review.

Question 2

What is a reasonable budget?

Accepted Answer

For most production agents, 1,000-3,000 tokens covers identity, rules, tool guidance and a couple of examples comfortably. Coding agents with rich project context often justify 3,000-6,000. If you are past 8,000, parts of the prompt almost certainly belong in retrieval or prompt caching instead of being paid fresh on every call.

Question 3

How do I get token estimates for each section?

Accepted Answer

Paste each section into our Token Counter, which runs the exact o200k tokenizer in your browser, and copy the number into the matching field here. The prefilled values are realistic figures for a mid-sized agentic coding prompt, so the tool gives a meaningful picture before you measure anything.

Question 4

Which section should I cut first when over budget?

Accepted Answer

Examples, almost always — they are the largest section in most prompts and the one with the steepest diminishing returns. One excellent demonstration usually outperforms four mediocre ones. After examples, look for rules that restate model defaults, then tool prose that duplicates what the API tool schemas already convey.

Question 5

Does prompt caching make budgeting unnecessary?

Accepted Answer

It softens the cost but not the other penalties. Cached input is still billed (at roughly a tenth of fresh rates on Anthropic), still consumes context window, and long prompts still dilute instruction-following — models weight a 6,000-token rule list less reliably than a 600-token one. Budget first, cache second.

System Prompt Budgeter

How it works

Frequently asked questions

Why budget a system prompt at all?

What is a reasonable budget?

How do I get token estimates for each section?

Which section should I cut first when over budget?

Does prompt caching make budgeting unnecessary?

Related tools

System Prompt Linter

Token Counter

Context Window Visualizer

CLAUDE.md Generator