System Prompt Budgeter
Allocate a token budget across identity, rules, examples and tools — see what fits.
Estimate each section's size with the Token Counter, or start from the prefilled typical values for an agentic coding prompt.
200 tokens over budget (110% of allocation). Every call re-sends the overage — trim the sections flagged below or raise the budget deliberately.
Allocation vs budget
- Identity & role150 · 7%
- Rules & constraints500 · 23%
- Few-shot examples800 · 36%
- Tool instructions400 · 18%
- Project context350 · 16%
⚠ Over budget. At 10k calls/day, 200 extra tokens per call is ~2M wasted tokens daily.
How it works
System prompts fail by accretion, not by design. Nobody writes a 9,000-token preamble on day one — they add a clarifying rule after an incident, a new example after a bad output, three paragraphs of tool guidance after an integration, and a year later the prompt is an archaeological dig that costs real money on every call and follows instructions worse than it used to. The fix is the same one finance uses: a budget, allocated by line item.
This tool implements that budget. Set a total — 2,000 tokens is a sane default for a production agent — and enter your estimate for each of the five sections almost every system prompt decomposes into: identity, rules, few-shot examples, tool instructions and project context. The stacked bar shows each section's share against the budget mark, and the moment your allocation crosses the line you get an explicit over-budget warning with the overage quantified.
The recommendations are heuristic and honest about it. They fire on patterns we see repeatedly: example sections past 45% of the prompt (cut to your best one or two pairs), rule lists that have grown contradictory, tool prose duplicating API schemas, and static project context that belongs in retrieval rather than in every call. They are starting points for an edit, not a substitute for reading your own prompt.
Assumptions are minimal: token figures are whatever you enter (measure them with the Token Counter for accuracy), and the budget line is yours to set. No pricing model is applied here because the budgeter is about discipline, not dollars — though the over-budget warning shows the daily token waste at a representative call volume to keep the stakes visible.
Pair it with the System Prompt Linter, which grades prompt quality rather than size, and re-run the budget after every meaningful prompt edit. Teams that keep this number under review ship prompts that stay lean; teams that do not discover the bloat in their invoice.
Frequently asked questions
Why budget a system prompt at all?
Because the system prompt is re-sent on every single API call, forever. A 2,000-token preamble at 10,000 calls a day is 20 million input tokens daily before anyone types a word. Setting an explicit budget turns prompt growth from invisible accretion into a deliberate trade-off you can see and defend in review.
What is a reasonable budget?
For most production agents, 1,000-3,000 tokens covers identity, rules, tool guidance and a couple of examples comfortably. Coding agents with rich project context often justify 3,000-6,000. If you are past 8,000, parts of the prompt almost certainly belong in retrieval or prompt caching instead of being paid fresh on every call.
How do I get token estimates for each section?
Paste each section into our Token Counter, which runs the exact o200k tokenizer in your browser, and copy the number into the matching field here. The prefilled values are realistic figures for a mid-sized agentic coding prompt, so the tool gives a meaningful picture before you measure anything.
Which section should I cut first when over budget?
Examples, almost always — they are the largest section in most prompts and the one with the steepest diminishing returns. One excellent demonstration usually outperforms four mediocre ones. After examples, look for rules that restate model defaults, then tool prose that duplicates what the API tool schemas already convey.
Does prompt caching make budgeting unnecessary?
It softens the cost but not the other penalties. Cached input is still billed (at roughly a tenth of fresh rates on Anthropic), still consumes context window, and long prompts still dilute instruction-following — models weight a 6,000-token rule list less reliably than a 600-token one. Budget first, cache second.
Turn this analysis into a live rule with the FORG rule engine — route models and enforce limits automatically.
Explore the rule engine