Retry Backoff Designer
Design exponential backoff with jitter — visualize timing, cost and worst-case delay.
Worst-case total wait before final failure across 5 retries (jitter only shortens this). Retrying all 5 attempts on Claude Sonnet 4.6 costs ≈ $0.83 per incident.
Delay before each retry (bars show the no-jitter value; jitter randomizes within it)
Cost per call: $0.17 — every retry re-bills the full prompt.
/** Exponential backoff: base=500ms, ×2, max 5 retries, cap 30000ms, jitter=full, honors Retry-After. */
async function withRetry<T>(fn: () => Promise<T>, isRetryable: (e: unknown) => boolean, getRetryAfterMs?: (e: unknown) => number | null): Promise<T> {
for (let attempt = 0; ; attempt++) {
try {
return await fn();
} catch (e) {
if (attempt >= 5 || !isRetryable(e)) throw e;
let delay = Math.min(30000, 500 * Math.pow(2, attempt));
delay = Math.random() * delay; // full jitter
const ra = getRetryAfterMs?.(e);
if (ra != null) delay = Math.max(delay, ra); // server knows best
await new Promise((r) => setTimeout(r, delay));
}
}
}How it works
Set a base delay, multiplier, retry count and jitter mode, and watch the policy take shape: a visual timeline of every attempt's delay, the worst-case total wait before final failure, and what the retried calls themselves cost at real model prices. When the policy looks right, export it as a dependency-free TypeScript or Python snippet.
Exponential backoff is the standard answer to transient failure — wait base × multiplier^n before attempt n — but the unmodified version has a flaw that only shows up at scale: every client computes the same delays, so simultaneous failures produce simultaneous retries, and the recovering service gets hit by a synchronized wave exactly when it is weakest. Jitter breaks the synchronization. Full jitter randomizes the whole delay window; equal jitter keeps half as a guaranteed floor. The timeline in this tool renders jittered delays as ranges so you can see the spread you are actually buying.
The cost panel exists because LLM retries are not like HTTP retries. A failed request to a JSON API costs nothing to repeat; a failed call carrying a 50,000-token agent context re-bills the entire prompt on every attempt. Five retries of a large-context call on a frontier model is real money, multiplied by however many requests hit the failure window. Seeing the per-incident retry cost next to the timing usually changes the answer to "how many retries should we allow?" — typically downward.
Two defaults worth keeping: honor Retry-After whenever the provider sends it, since the server knows its own recovery time better than your exponent does, and cap the maximum single delay so worst-case latency stays bounded. The exported snippets implement both. What they leave to you is error classification — retry 429, 5xx and timeouts; never retry 400s or auth failures, which will fail identically forever no matter how politely you wait.
Frequently asked questions
Why do I need jitter at all?
Without jitter, every client that failed at the same moment retries at exactly the same moment — base delay times multiplier is deterministic — so a transient outage turns into synchronized retry waves that keep knocking the service over, the classic thundering herd. Full jitter picks a uniform random delay between zero and the computed backoff, which spreads the herd across the whole window. AWS's published analysis found full jitter reduces total work and contention dramatically compared to no jitter.
What is the difference between full and equal jitter?
Full jitter randomizes the entire delay: sleep a uniform random amount between 0 and the exponential backoff value, which gives maximum spread but means some retries fire almost immediately. Equal jitter keeps half the backoff as a guaranteed floor and randomizes the other half — sleep backoff/2 plus a random amount up to backoff/2 — trading a little spread for a guaranteed minimum pause. Full jitter is the usual default; equal jitter suits services that genuinely need breathing room after every failure.
Should I honor the Retry-After header?
Yes, always, when the provider sends one. A 429 or 529 with Retry-After is the server telling you exactly when capacity returns — retrying earlier is guaranteed wasted spend and may extend your rate limiting, while your computed backoff is just a guess. The correct policy is max(your backoff, Retry-After). The designer's toggle reflects this: when honored, the header overrides shorter computed delays.
How many retries should an LLM call get?
Fewer than you think. Each retried call re-sends the full prompt, so retries on a 50k-token agent context are expensive — the cost panel in this tool makes that concrete. Three to five attempts with a max delay cap around 30-60 seconds covers virtually all transient 429/529/timeout blips; failures beyond that usually indicate a real outage where retrying burns money without succeeding. Past the cap, fail fast and surface the error to something that can make a smarter decision.
What do the exported snippets contain?
A self-contained retry function in TypeScript or Python implementing exactly the policy you configured: base delay, multiplier, max retries, cap, your chosen jitter mode and Retry-After handling. There are no dependencies beyond the standard library — no axios-retry or tenacity required — so you can paste it into any codebase and adapt the error-classification predicate to your SDK's exception types.
Built by FORG — AI cost observability for agentic coding. Free tools, no signup, nothing leaves your browser.
Learn about FORG