Question 1

What is the difference between a sliding window and a running summary?

Accepted Answer

A sliding window keeps the last N turns verbatim and silently drops everything older — simple, lossless within the window, total amnesia beyond it. A running summary maintains a compressed digest of the whole conversation, periodically refreshed by an extra model call. The window pays for raw recency; the summary pays refresh overhead for unlimited (lossy) history.

Question 2

When does each strategy win on cost?

Accepted Answer

Short sessions and small windows favor the window: a few hundred tokens of recent history beats maintaining any summary. Long sessions, chatty turns, or wide windows flip it — re-sending twelve 600-token turns on every call costs more than a 1,500-token digest plus occasional refreshes. Slide the session length and watch the bars cross; that crossover is the real answer for your numbers.

Question 3

Are the summary refresh calls included in the cost?

Accepted Answer

Yes, at full price. Every refresh is modeled as a real API call that reads the old summary plus all unsummarized turns as input and writes the new summary as output, at the selected model's verified rates. Calculators that treat summarization as free systematically flatter the summary strategy — ours bills it.

Question 4

Which strategy is better for quality, not just cost?

Accepted Answer

They fail differently. Windows preserve exact wording but forget commitments made before the cutoff, which users notice as the bot contradicting itself. Summaries remember the whole arc but blur specifics — exact numbers, code identifiers, precise phrasing. Many production systems hybridize: a summary for the distant past plus a short verbatim window for recent turns.

Question 5

Why no prompt caching in this comparison?

Accepted Answer

Caching helps both strategies and helps them differently per provider, which would bury the structural comparison under provider-specific assumptions. The uncached numbers isolate the memory-design decision. Once you pick a strategy, the Prompt Caching ROI Calculator will tell you what caching does to its absolute cost.

Conversation Memory Planner

How it works

Frequently asked questions

What is the difference between a sliding window and a running summary?

When does each strategy win on cost?

Are the summary refresh calls included in the cost?

Which strategy is better for quality, not just cost?

Why no prompt caching in this comparison?

Related tools

Context Compaction Savings

Context Window Visualizer

Agent Session Cost Estimator

RAG Chunking Visualizer