AI Bill Diagnostic
A guided diagnostic that finds what blew up your AI bill — loops, cache misses or retries.
Step 1 of 5
How it works
A surprise AI bill almost always has one of five causes, and they leave different fingerprints. This quiz walks through the five questions an experienced operator would ask — bill shape, model mix, retry behavior, cache setup, agent loop limits — and ranks the likely culprits with a concrete fix checklist for each. Five steps, skippable, entirely in your browser; your answers encode into the URL so you can share the diagnosis with your team.
The ranking logic mirrors real incident triage. A sudden spike points at runaway loops or retry storms, because both are burst phenomena: a stuck agent compounds context turn after turn, and a retry storm multiplies traffic during a provider wobble. Gradual creep points at structural waste — cache misses and frontier models doing routine work — because structural waste scales smoothly with usage. A step change usually means a default changed: a tool update switched models, a refactor moved a cache breakpoint, a new team came aboard.
The fifth diagnosis is the meta-cause: if you answered "not sure" more than once, your real problem is that the bill is a single opaque number. Monthly totals cannot localize anything — you need spend attributed per session, per developer and per repository before any of the other four causes can even be confirmed. That is why the "no visibility" diagnosis scores up sharply with every unknown answer, and why its fix list starts with attribution rather than optimization.
Treat the confidence bars honestly: they are relative weightings from a five-question heuristic, not a forensic audit. The quiz tells you where to look first, and each diagnosis links the calculator that quantifies it — the session estimator for loops, the caching ROI tool for misses, the downgrade advisor for model overuse. Confirming the culprit requires looking at real per-session data, which is what FORG records continuously: every diagnosis on this page is something it alerts on automatically, before the anomaly compounds into an invoice story.
Frequently asked questions
What does a runaway agent loop look like on a bill?
A sharp single-day spike, often from one developer's machine. An agent gets stuck retrying the same failing approach — edit, test, fail, edit — and because context accumulates every turn, the late turns of a stuck session cost many times the early ones. One overnight loop on a frontier model can burn a four-figure sum. Hard turn caps and same-action loop detection are the fix.
How do cache misses inflate a bill without anything breaking?
Prompt caching fails silently: if the prefix is not byte-identical between calls, or calls arrive further apart than the 5-minute TTL, every call pays full input rates instead of the ~90%-discounted cache-read rate. Nothing errors, latency barely changes, and the bill is simply 3-5× what it should be for agentic workloads. The tell is a high bill with normal usage patterns.
What is a retry storm and why is it expensive?
An integration that retries failures without backoff or caps. During a provider incident, every client retries simultaneously, gets rate-limited, and retries again — multiplying traffic exactly when the provider is degraded. The expensive subtlety: requests that fail after generating partial output still bill those tokens. Bounded retries (3 attempts, exponential backoff with jitter) cap the damage at 4× a single call.
How would alerting have caught this before the invoice?
Every cause this quiz diagnoses leaves a real-time signature: a session exceeding 3× the median cost, a cache hit rate dropping below baseline, a burst of 429-then-retry patterns, a developer's daily spend doubling. FORG watches those signals per session and per developer and alerts when they fire — so you find out about a runaway loop in minutes, not when finance forwards the invoice.
FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.
Start tracking with FORG