Multi-Model Blend Calculator
Split traffic across models and see your true blended rate per million tokens.
blended rate across your mix — $1,100.10/month at 500M tokens. Claude Sonnet 4.5 drives 49% of cost on 20% of traffic.
Cost share vs traffic share
Effective rate per model = input rate × 80% + output rate × 20%.
How it works
Almost no team runs one model anymore. The routine work goes to a small model, the hard reasoning to a frontier one, and somewhere in between sits a default. This calculator turns that routing reality into a single number — the blended rate per million tokens — and a monthly cost at your volume, so a multi-model stack can be budgeted as plainly as a single-model one.
Set up to four models with traffic-share sliders. Shares that do not sum to 100% are normalized proportionally and flagged, so partial edits never silently corrupt the math. The output-token share slider sets your input/output mix: because output rates run 4-8× input rates on every provider, each model's effective rate is computed as input rate × (1 − output share) + output rate × output share. Agentic coding traffic typically runs 10-25% output; verbose chat workloads run higher.
The most useful thing on the page is the cost-versus-traffic comparison. Per-token prices span two orders of magnitude across the current model table, so a frontier model carrying 20% of your traffic routinely produces 80% of your bill. The bars show each model's cost share next to its traffic share, and the dominant cost driver gets called out explicitly in the verdict line — that is the model where a routing change or a downgrade experiment pays off first.
Model prices come from the shared FORG model table, verified 2026-06-11 against the Anthropic, OpenAI, Google and DeepSeek pricing pages. The blend math itself is exact for the rates shown; what it cannot know is your true output share or how your router actually splits traffic under load. Treat the result as the clean baseline to compare your invoice against.
If the blended rate looks higher than expected, the playbook is consistent: find the cost driver, check what fraction of its traffic genuinely needs frontier capability, and move the rest down a tier — the Model Downgrade Advisor linked below helps with that triage. For measured per-model spend from real traffic rather than slider estimates, FORG breaks down every session by model automatically. The share link preserves your exact mix.
Frequently asked questions
What is a blended token rate?
It is the weighted average price per million tokens across every model you route traffic to, weighted by each model's share of volume. If 40% of tokens go to a $1.80/M effective-rate model and 60% to a $5.40/M model, your blended rate is $3.96/M. It is the single number that turns 'we use four models' into a budget line finance can work with.
Why does one model dominate cost without dominating traffic?
Because per-token prices span two orders of magnitude. A frontier model at 20% of traffic can easily account for 80% of spend when the other 80% of traffic runs on a small model that costs 25× less. The bars in this tool show cost share against traffic share for every row precisely so that mismatch is impossible to miss.
What does the output-token share slider do?
Every provider prices output tokens 4-8× higher than input tokens, so a model's effective rate per million tokens depends on your traffic's input/output mix. The slider sets that mix: effective rate = input rate × (1 − share) + output rate × share. Agentic coding traffic is typically 10-25% output; chatbot traffic with long answers can exceed 40%.
What happens if my traffic shares don't add up to 100%?
The calculator normalizes them: each row's share is divided by the sum of all four, so the proportions you set are preserved even if the raw numbers total 80 or 130. A warning appears when the raw sum is not 100 so you know normalization happened. Prices come from the same verified model table used across all FORG tools.
Turn this analysis into a live rule with the FORG rule engine — route models and enforce limits automatically.
Explore the rule engine