Agent Loop Cost Simulator
Monte Carlo your agent's retry and loop behavior to see p95 spend and runaway risk.
1,000 runs · seed 42 — shares reproduce exactly
p95 session cost on Claude Sonnet 4.5 — median $2.81, p99 $5.21. 0.0% of sessions hit the 50-turn cap. Suggested cap: ~45 turns (1.5× p95 turn count).
Cost distribution (1,000 simulated sessions)
- p50
- $2.81
- p95
- $4.61
- p99
- $5.21
How it works
This simulator runs 1,000 Monte Carlo agent sessions in your browser and shows the cost distribution — not the average, the distribution. Set a model, an expected session length, a tool-failure retry probability and a max-turn cap, and you get p50/p95/p99 session cost, a histogram of all 1,000 runs, and the percentage of sessions that hit the cap: your runaway risk.
Each simulated session works like a real agent loop. It starts with a base context and walks turn by turn; every turn sends the accumulated context (which grows as history and tool results pile up) and generates output tokens at the model's prices. On each turn there is a chance — your retry slider — that a tool call fails and adds extra turns. With loop detection on, repeated failures get progressively suppressed, modeling an agent framework that notices it is going in circles; with it off, failures can chain freely, which is how a $0.40 session becomes a $12 one. Sessions that reach the max-turn cap are counted as runaways.
The reason to care about p95 rather than the mean is that costs compound nonlinearly with session length. A session twice as long does not cost twice as much — later turns carry all the context of earlier ones, so cost grows roughly quadratically in turns. That is why the histogram has a long right tail and why teams that budget on average session cost get surprised invoices. The cap recommendation line applies a simple heuristic — about 1.5× your simulated p95 turn count — which keeps legitimate long sessions alive while cutting the tail off the distribution.
The randomness is seeded with mulberry32, so a shared link reproduces the exact histogram you are looking at — useful when you are arguing for a turn cap in a design review and want everyone staring at the same 1,000 runs. And to be clear about the model's limits: real retry probabilities are not constant per turn, context growth varies by task, and your actual distribution will differ. The simulation's job is to make the tail risk visible and roughly sized; measuring it for real, per session and per developer, is what FORG does.
Frequently asked questions
Why simulate agent costs instead of just multiplying averages?
Because agent session costs are not normally distributed — they have a long right tail. The average session might cost $0.40, but retries compound: each failed tool call adds a turn, each added turn carries the full accumulated context, and occasionally a session spirals to the turn cap and costs 10× the median. Budgeting on the average systematically underestimates your bill; the p95 and p99 are what your invoice actually reflects.
What causes runaway agent sessions in practice?
The classic patterns: a tool that fails deterministically (so every retry fails the same way), the model oscillating between two approaches without converging, a test suite the agent keeps re-running after non-fixes, and lost context after compaction causing the agent to redo completed work. Each retry adds turns, each turn resends the growing context, and cost compounds quadratically until something — ideally your turn cap — stops it.
What is a sane max-turn cap for coding agents?
Look at your p95 successful-session turn count and set the cap 1.5-2× above it. For typical coding agents that means 40-80 turns. Too low and you kill legitimate long sessions mid-task, wasting everything spent so far; too high and runaways burn budget for no benefit, because a session that has looped 60 times almost never recovers on turn 90. The simulator's cap recommendation applies exactly this heuristic to your distribution.
How does FORG help with runaway sessions?
This simulator estimates the risk in advance; FORG measures it live. It tracks per-session spend across every agent in your team in real time, alerts when a session's cost trajectory looks like a runaway (cost accelerating turn-over-turn), and hard budget caps stop the bleeding automatically instead of waiting for a human to notice the bill. The simulation tells you what cap to set — FORG enforces it.
Why do shared links reproduce the exact same histogram?
The simulation uses a seeded pseudo-random number generator (mulberry32), so the same seed and inputs always produce the identical 1,000 runs. When you share a result link, the seed travels with it — your teammate sees exactly the distribution you saw, not a fresh roll of the dice. Hit re-roll to draw a new seed if you want to check the result is stable across seeds.
FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.
Start tracking with FORG