Prompt Compressor
Strip token waste from prompts — whitespace, markdown decoration and duplicate lines.
137 → 92 tokens (33% smaller, estimated). At 1,000 calls/day on Claude Sonnet 4.5 that is $4.11/month saved. Loading exact tokenizer…
Per-option contribution (chars removed)
- Strip markdown decoration27
- Dedupe repeated lines77
- Minify JSON blocks0
- Collapse whitespace59
Whitespace collapse and line dedupe are lossless for meaning; markdown stripping and JSON minification can change semantics — always diff before shipping.
How it works
System prompts accrete waste: duplicated instructions from copy-paste edits, markdown bold nobody needed, pretty-printed JSON schemas indented four deep, triple blank lines. None of it improves model behavior, and all of it is re-billed on every call. This tool applies four targeted cleanups — whitespace collapse, markdown decoration stripping, duplicate-line removal and JSON minification — and shows exactly how many tokens each one recovers, plus what that saving is worth per month at your call volume.
The math is straightforward and visible. Your text is tokenized in the browser with the real o200k_base encoding (lazily loaded, never uploaded), compressed, and tokenized again. The difference, multiplied by your calls per day and current Claude Sonnet input pricing over a 30.44-day month, gives the dollar figure. A 200-token saving sounds trivial until you multiply: at 1,000 calls a day it is six million tokens a year on that prompt alone.
Honesty about loss matters more than the savings number. Whitespace collapse and exact duplicate removal are semantically safe. Markdown stripping is almost always safe but removes emphasis; JSON minification is safe for schemas the model reads but dangerous for output examples the model is supposed to imitate, because the formatting is the instruction. The tool flags when JSON blocks were rewritten so you review rather than trust. Compression you cannot diff is compression you should not ship.
Use this on the static prompts you control — system messages, tool descriptions, CLAUDE.md files (the Markdown Token Heatmap shows which blocks to attack first). The dynamic waste in live agent sessions — oversized tool results, re-sent file contents — needs enforcement at run time, which is what the FORG rule engine does: the trimming policy you validate here becomes a rule that runs on every session automatically. A good cadence: compress and diff once per quarter for every prompt that ships with production traffic, and treat any prompt that grew more than twenty percent since last review as a candidate for a rewrite rather than a trim.
Frequently asked questions
Is the compression lossless?
Two passes are safe for meaning: collapsing runs of whitespace and removing exactly-duplicated lines never change what a model understands. The other two are lossy by design — stripping markdown removes emphasis the model might have been told to notice, and minifying JSON destroys formatting that matters if the block is an output example the model should mimic. The tool warns when JSON was touched; always diff before shipping a compressed prompt.
Do whitespace and indentation really cost tokens?
Yes, and more than people expect. Runs of spaces tokenize separately from words, deep indentation in pretty-printed JSON repeats on every line, and trailing whitespace is pure waste. A four-space-indented JSON schema can shrink 30–40% by minification alone. Since a system prompt is re-sent on every single call, each wasted token is billed thousands of times per month at production volume.
Does markdown decoration help or hurt model performance?
Structure helps; decoration mostly does not. Headings and lists give models useful anchors, but bold, italics and horizontal rules add tokens without measurably improving instruction-following in most evaluations. The strip pass removes asterisk emphasis and decoration while keeping list structure and line breaks intact, so the prompt's organization survives the diet.
How accurate are the token counts?
When the tokenizer loads, counts are exact for the o200k_base encoding used by current OpenAI models — computed entirely in your browser, nothing uploaded. Claude's tokenizer is not public, so treat the counts as close proxies there. If the tokenizer fails to load we fall back to a documented characters ÷ 3.6 estimate and label the result as estimated.
Can this trimming be automated in production?
Manual compression catches the static waste in a prompt you control, but agentic sessions generate dynamic waste — oversized tool results, repeated file contents, accumulating history. That is rule-engine territory: FORG's rule engine can trim, truncate and route live traffic by policy, so the cleanup you prototype here runs automatically on every call instead of once at design time.
Turn this analysis into a live rule with the FORG rule engine — route models and enforce limits automatically.
Explore the rule engine