Question 1

What is the difference between RPM and TPM limits?

Accepted Answer

RPM (requests per minute) caps how many API calls you can make regardless of size; TPM (tokens per minute) caps the total tokens processed across all calls. Agentic workloads with big contexts almost always hit TPM first — five agents sending 6k-token calls four times a minute is only 20 RPM but 120k TPM. This planner computes both so you can see which wall you hit first.

Question 2

How do providers handle bursts above the limit?

Accepted Answer

Limits are typically enforced on a sliding or token-bucket window, so a short burst above your average can pass while a sustained burst returns 429s. Some providers also enforce concurrent-request caps separately from RPM. Plan for your peak minute, not your average minute — agent fleets are bursty by nature, since tool loops fire several calls back-to-back.

Question 3

What is the right backoff strategy for 429s?

Accepted Answer

Respect the Retry-After header when present — it tells you exactly when capacity returns. Otherwise use exponential backoff with jitter, starting around one second and capping near a minute. The critical part is the jitter: a fleet of agents all retrying after exactly two seconds creates a synchronized retry storm that re-triggers the limit. Also cap total retries; an agent that retries forever is a runaway-cost machine.

Question 4

Is sharding across multiple API keys a legitimate way to scale?

Accepted Answer

Splitting load across keys on the same account does nothing — limits are enforced per organization, not per key, on every major provider. Creating multiple accounts to evade limits violates the providers' terms of service and risks a ban. The legitimate paths are requesting a tier upgrade (usually granted quickly with payment history), provisioning dedicated throughput, or routing a slice of traffic to a second provider.

Tier	RPM / TPM	Supports
Tier 3fits your fleet	2,000 / 160k	6 agents
Tier 4fits your fleet	4,000 / 400k	16 agents

Rate Limit Planner

How it works

Frequently asked questions

What is the difference between RPM and TPM limits?

How do providers handle bursts above the limit?

What is the right backoff strategy for 429s?

Is sharding across multiple API keys a legitimate way to scale?

Related tools

Streaming Latency Estimator

AI SDK Error Decoder

Agent Loop Cost Simulator

Agent Session Cost Estimator