Skip to main content

HTTP 429 Too Many Requests

You exceeded a rate limit — requests per minute, tokens per minute, or concurrent connections.

4xx · Client error✓ retryable with backoff

In AI APIs specifically

THE most common AI-API error at scale. Anthropic returns retry-after seconds in the Retry-After header and separates RPM/ITPM/OTPM limits. OpenAI returns Retry-After plus x-ratelimit-remaining-* headers for both requests and tokens. Google returns RESOURCE_EXHAUSTED quota errors. Always honor Retry-After rather than guessing.

Fix checklist

  • Honor the Retry-After header exactly — don't invent your own delay when one is given.
  • Implement exponential backoff with jitter for the no-header case.
  • Spread bursty workloads with a client-side token bucket.
  • Request a tier upgrade if you hit limits at steady state.
  • Cache repeated prompt prefixes to cut token throughput.

Retry handler (TypeScript)

async function fetchWithRetry(url: string, init: RequestInit, maxRetries = 5) {
  for (let attempt = 0; ; attempt++) {
    const res = await fetch(url, init);
    // 429 is retryable — back off and try again.
    if (res.status !== 429 || attempt >= maxRetries) return res;
    const retryAfter = Number(res.headers.get("retry-after"));
    const delay = Number.isFinite(retryAfter) && retryAfter > 0
      ? retryAfter * 1000
      : Math.min(60_000, 1000 * 2 ** attempt) * (0.5 + Math.random()); // expo backoff + jitter
    await new Promise((r) => setTimeout(r, delay));
  }
}

Spec: RFC reference