Cost Per Task Comparator
Price a complete task — calls, retries and context included — across models honestly.
per task on Gemini 3 Flash — the cheapest of your four picks. Claude Sonnet 4.5 costs 5.6× more ($0.54) for the same task.
Total task cost — 9.2 effective calls (8 + 15% retries)
Task volume: 124.2k tokens billed per completed task.
How it works
Per-million-token prices are how vendors sell; cost per completed task is how teams actually spend. This comparator prices a whole task — the number of API calls it takes, the input and output tokens each call carries, and the retries your pipeline really incurs — then shows the total side by side across four models you pick. Everything computes locally in your browser.
The retry rate is the honest part most calculators skip. A retried call resends the full input context and regenerates output, so it bills as a complete extra call: at 8 calls per task and 15% retries you pay for 9.2. Agentic pipelines see 10-20% retry rates routinely — 429s, malformed JSON, failed validations — and on long-context calls those retries are expensive precisely because the input is large. The effective-calls figure is shown above the bars so the multiplier is never hidden.
Define a task as whatever unit you budget by: a resolved ticket, a generated pull request, a summarized document. Estimate the average calls a completed instance takes and the typical context per call — for agentic coding, input context of 10k-30k tokens per call with 1k-2k output is a reasonable starting shape. The defaults model exactly that, so a real result renders before you touch anything.
Read the verdict line with appropriate suspicion. The comparison holds calls-per-task constant across models, which flatters cheaper, weaker models — in reality a smaller model may need more turns or more retries to finish the same task, and sometimes fails it outright. The useful signal is magnitude: a 10× gap justifies an experiment with a cheaper model on a slice of traffic; a 1.3× gap rarely justifies the capability risk.
Model prices come from the shared FORG model table, verified 2026-06-11 against the Anthropic, OpenAI, Google and DeepSeek pricing pages. What this tool estimates from sliders, FORG measures from your real sessions — actual calls per task, actual retry rates, actual cost per merged PR — which is the number worth putting in a planning doc. The share link preserves your full scenario, four models included.
Frequently asked questions
Why price per task instead of per token?
Because nobody ships tokens — they ship completed tasks. A task is several API calls, each carrying context, plus the retries your pipeline actually incurs. Per-token comparisons hide all of that: a model that needs more calls or more retries to finish can cost more in practice than one with a higher sticker price per million tokens.
How does the retry rate affect the total?
Each retry resends the full input context and regenerates output, so it bills like a complete extra call. The calculator multiplies your calls per task by (1 + retry rate): at 8 calls and a 15% retry rate, you are billed for 9.2 effective calls. Retry rates of 10-20% are normal for agentic pipelines hitting rate limits, malformed outputs and validation failures.
What counts as one task?
Whatever unit your team budgets by: a resolved support ticket, a generated pull request, a document summarized, a test suite repaired. Estimate the average number of API calls a completed instance takes, the typical input context per call (system prompt plus accumulated history) and the typical output, and the comparison holds for any task shape.
Should I always pick the cheapest model in the verdict?
No — the comparison assumes every model completes the task in the same number of calls, which favors weaker models. In practice a smaller model may need more turns, more retries or human cleanup. The right read: if the cheapest model is 10× cheaper, it is worth an experiment; if it is 1.3× cheaper, the capability risk probably is not worth it.
FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.
Start tracking with FORG