Context Compaction Savings
See how much summarizing conversation history every N turns saves on long sessions.
saved per session on Claude Sonnet 4.5 (66% cheaper): $4.79 with compaction every 10 turns vs $14.22 without — summarization calls included.
Cost per turn — without (top) vs with compaction (bottom)
- Without compaction
- $14.22
- With compaction
- $4.79
- Final context (without)
- 210k
- Compactions performed
- 3
Each compaction is modeled as one extra call that reads the full context and writes the summary, after which context resets to the summary size.
How it works
Long agent sessions have a quietly brutal cost structure: because every turn re-sends the entire conversation so far, input tokens grow linearly per turn and total session cost grows quadratically. A 40-turn session that starts at 15k tokens and grows 5k per turn is sending over 200k tokens by the final turn — and you paid for every intermediate step along the way.
Compaction breaks the quadratic. Summarize the history every N turns, reset the context to a few thousand tokens, and the per-turn cost saw-tooths instead of climbing forever. This calculator models both worlds with the same arithmetic the Agent Session Cost Estimator uses: turn k sends the accumulated context as input at the model's verified per-million rates, plus your configured output per turn.
We model compaction honestly. Each compaction is an extra API call — the full context goes in as input, the summary comes out as output, both billed at real rates — and only then does the context reset to the summary size. Tools that pretend summarization is free overstate savings by 10-30% on aggressive intervals. The hero number here is net of those summary calls.
The assumptions are deliberately simple and stated: context grows by a constant amount per turn, the summary always lands at your configured size, and we never compact on the final turn (there is nothing after it to save). Real sessions are lumpier — file reads spike context, some turns are tiny — but constant-growth math brackets the answer well enough to make the architectural decision.
Use the comparison bars to find your interval: if the gray bars dwarf the green ones for most of the session, you are compacting at roughly the right cadence. If the two charts look the same, your sessions are too short or your contexts too small for compaction to matter — spend the engineering effort elsewhere. And if you want to see what your sessions actually cost rather than what this model predicts, FORG measures it from live traffic.
Frequently asked questions
What is context compaction?
Compaction replaces the accumulated conversation history with a short summary, so subsequent turns send a few thousand tokens of summary instead of tens of thousands of raw transcript. Claude Code's /compact command and most agent frameworks do this automatically when the window fills. The trade is fidelity for cost: the model loses verbatim detail but keeps the gist.
Does the calculator include the cost of the summarization call itself?
Yes. Every compaction is modeled as one extra API call that reads the entire accumulated context as input and writes the summary as output, at the selected model's real rates. That is why very frequent compaction on small contexts can come out more expensive than doing nothing — the summary calls eat the savings.
How do I choose the compaction interval?
Watch the two bar charts. Compact too rarely and per-turn cost climbs toward the no-compaction curve before each reset; compact too often and you pay for summary calls that barely shrink anything. For typical agentic coding sessions with 4-6k tokens of growth per turn, intervals of 8-15 turns usually land near the sweet spot — but slide the control and check your own numbers.
Is a smaller summary always better?
Cheaper, yes — better, not necessarily. A 1k-token summary of a 100k-token session discards a lot, and the agent may re-fetch files or re-ask questions, which costs tokens elsewhere. The calculator only prices the direct token math; budget some slack in the summary size for the context your agent genuinely needs to keep working.
Why do per-turn costs saw-tooth in the compacted chart?
Each compaction turn pays double: the normal turn plus the summarization call, then the next turn restarts from the cheap summary-sized context. The spikes are the summaries; the drops are the resets. The area under the green curve versus the gray one is your saving, and the hero number is exactly that difference.
FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.
Start tracking with FORG