Rate Limit Planner
Check provider tier limits against your agent fleet and find out when you hit 429s.
Representative public tier limits — verify your account's actual limits in the provider console.
Your fleet of 5 exceeds what Anthropic Tier 2 sustains — you hit the TPM limit at 3 agents at this call rate.
Tier upgrade options (Anthropic)
| Tier | RPM / TPM | Supports |
|---|---|---|
| Tier 3fits your fleet | 2,000 / 160k | 6 agents |
| Tier 4fits your fleet | 4,000 / 400k | 16 agents |
How it works
This planner answers a question every team running coding agents eventually asks: how many concurrent agents can my API tier actually sustain before the 429s start? Enter your provider and tier, the size of your fleet, how often each agent calls and how many tokens each call carries, and the utilization bars show how close you are to the requests-per-minute and tokens-per-minute ceilings.
The math is deliberately simple and shown rather than hidden. Request load is agents × calls per minute; token load is request load × tokens per call. Each is divided by the tier's RPM and TPM limits to give utilization. The verdict line computes the maximum fleet size your tier supports — the smaller of the RPM-bound and TPM-bound counts — and how many agents past that you currently are. For agentic coding workloads, the answer is almost always TPM-bound: a 6,000-token call is unremarkable for an agent carrying project context, and at four calls per minute a single agent burns 24k TPM before doing anything interesting.
The tier dataset embedded here is representative of public tier limits across Anthropic, OpenAI and Google as of mid-2026 — and it is labeled that way in the tool because real limits vary by account, negotiated agreements, and model. Providers adjust tiers regularly and grant custom limits on request, so treat these numbers as planning defaults and verify the limits page in your own console before committing to an architecture. The structure of the calculation holds regardless of which exact numbers you plug in.
Two practical notes from production fleets. First, plan to about 70% utilization, not 100% — agents burst when tool loops fire several calls back-to-back, and the headroom is what absorbs the burst without a retry storm. Second, when you do hit the ceiling, a tier upgrade is usually faster and cheaper than engineering around it: providers grant upgrades quickly to accounts with payment history, while multi-key sharding does nothing (limits are per-organization) and multi-account evasion violates terms of service. Measure first, then upgrade deliberately.
Frequently asked questions
What is the difference between RPM and TPM limits?
RPM (requests per minute) caps how many API calls you can make regardless of size; TPM (tokens per minute) caps the total tokens processed across all calls. Agentic workloads with big contexts almost always hit TPM first — five agents sending 6k-token calls four times a minute is only 20 RPM but 120k TPM. This planner computes both so you can see which wall you hit first.
How do providers handle bursts above the limit?
Limits are typically enforced on a sliding or token-bucket window, so a short burst above your average can pass while a sustained burst returns 429s. Some providers also enforce concurrent-request caps separately from RPM. Plan for your peak minute, not your average minute — agent fleets are bursty by nature, since tool loops fire several calls back-to-back.
What is the right backoff strategy for 429s?
Respect the Retry-After header when present — it tells you exactly when capacity returns. Otherwise use exponential backoff with jitter, starting around one second and capping near a minute. The critical part is the jitter: a fleet of agents all retrying after exactly two seconds creates a synchronized retry storm that re-triggers the limit. Also cap total retries; an agent that retries forever is a runaway-cost machine.
Is sharding across multiple API keys a legitimate way to scale?
Splitting load across keys on the same account does nothing — limits are enforced per organization, not per key, on every major provider. Creating multiple accounts to evade limits violates the providers' terms of service and risks a ban. The legitimate paths are requesting a tier upgrade (usually granted quickly with payment history), provisioning dedicated throughput, or routing a slice of traffic to a second provider.
FORG tracks this automatically across every agent session — live cost attribution, budgets, and alerts.
Start tracking with FORG