HTTP Status Code Reference
Every HTTP status code explained, with AI-API-specific notes on 429s, 529s and timeouts.
1xx — Informational
The server received the request headers and the client should proceed to send the body. Mostly seen with Expect: 100-continue on large uploads.
The server agrees to switch protocols as requested via the Upgrade header — most commonly upgrading HTTP to a WebSocket connection.
AI APIs: Realtime voice/streaming APIs (e.g. OpenAI Realtime) use WebSocket upgrades; a missing 101 means the upgrade was refused.
2xx — Success
The request succeeded. The response body contains the result.
AI APIs: Note that AI APIs can return 200 with an error object inside a streamed body — always inspect stream events, not just the status.
The request succeeded and a new resource was created — typical for POSTs that create files, fine-tune jobs or batch jobs.
AI APIs: Batch and fine-tuning endpoints return 201 with the job resource; poll its status URL rather than expecting results inline.
The request was accepted for asynchronous processing but is not complete. The final outcome is not yet known.
AI APIs: Async batch APIs accept work with 202; completion can take up to 24h on Anthropic and OpenAI batch tiers.
Success with no response body — common for DELETE operations.
The server is delivering only part of the resource, in response to a Range header — used for resumable downloads.
AI APIs: Seen when resuming large model or file downloads (e.g. fine-tune result files).
3xx — Redirection
The resource has a new permanent URL given in the Location header. Clients should update their references.
AI APIs: API base-URL typos (http vs https, trailing slash) often surface as 301s that drop POST bodies on redirect.
The resource temporarily lives at a different URL. The original URL remains canonical.
Conditional GET: the cached copy is still valid; the server sends no body.
Like 302 but the method and body MUST be preserved when following the redirect.
Like 301 but the method and body MUST be preserved. Update stored URLs.
4xx — Client error
The server cannot process the request due to a client error — malformed JSON, invalid parameters, or schema violations.
AI APIs: The most common AI-API 400s: max_tokens exceeding the model limit, invalid model IDs, malformed tool schemas, and context-window overflow (Anthropic returns 400 with an invalid_request_error when the prompt is too long).
Authentication is missing or invalid for this request.
AI APIs: Anthropic expects the x-api-key header; OpenAI expects Authorization: Bearer. Mixing the two conventions is the classic cause. Revoked or rotated keys also land here.
The request requires payment — billing is not set up or a balance is exhausted.
AI APIs: OpenAI returns 402-style billing errors when quota is exhausted; Anthropic surfaces billing problems as 400/403 with billing messages. Check the billing console first.
The server understood the request but refuses to authorize it — valid credentials, insufficient permission.
AI APIs: Common causes: requesting a model your org has no access to, geo-blocked regions, or organization policies restricting an endpoint.
The requested resource does not exist at this URL.
AI APIs: Usually a typo'd endpoint path, a deprecated/retired model ID, or a v1 path used against a different API version.
The URL exists but does not support this HTTP method.
AI APIs: Typically a GET against a POST-only inference endpoint, often caused by a redirect that downgraded the method.
The server timed out waiting for the complete request.
AI APIs: Large request bodies (long contexts, base64 images) over slow links can trip server read timeouts.
The request conflicts with the current state of the resource — e.g. concurrent modification.
AI APIs: Seen when cancelling an already-completed batch job or double-submitting with the same idempotency key but different bodies.
The resource existed but has been permanently removed and no forwarding address is known.
AI APIs: Deprecated API versions and expired file/batch resources return 410 — unlike 404, the provider is telling you it will never come back.
The request URL exceeds the server's length limit.
AI APIs: Usually caused by stuffing data into query parameters that belongs in a POST body.
The request body exceeds the server's size limit.
AI APIs: Hit by oversized image payloads, giant tool results, or batch files over the per-file cap. Distinct from context-window overflow, which is usually a 400.
The request body format is not supported — usually a missing or wrong Content-Type header.
AI APIs: Sending JSON without Content-Type: application/json, or uploading files with the wrong multipart encoding.
The request was well-formed syntactically but semantically invalid.
AI APIs: Some providers use 422 instead of 400 for parameter validation failures — the error body carries the field-level details.
You exceeded a rate limit — requests per minute, tokens per minute, or concurrent connections.
AI APIs: THE most common AI-API error at scale. Anthropic returns retry-after seconds in the Retry-After header and separates RPM/ITPM/OTPM limits. OpenAI returns Retry-After plus x-ratelimit-remaining-* headers for both requests and tokens. Google returns RESOURCE_EXHAUSTED quota errors. Always honor Retry-After rather than guessing.
Headers exceed the server's size limits — often a runaway cookie or oversized auth token.
Access denied for legal reasons — censorship, sanctions or court orders.
AI APIs: AI providers return 451-style geo blocks in unsupported regions; a VPN exit node in a sanctioned country triggers this too.
Non-standard (nginx): the client disconnected before the server finished responding.
AI APIs: Appears in gateway logs when an agent's HTTP timeout is shorter than model generation time — the client gave up mid-stream, but you may still be billed for generated tokens.
5xx — Server error
The server hit an unexpected condition. The fault is on the provider's side.
AI APIs: All providers occasionally 500 under load or on edge-case inputs. Anthropic returns api_error; OpenAI returns server_error. A reproducible 500 on a specific prompt is worth reporting.
The server does not support the functionality required to fulfil the request.
A gateway or proxy received an invalid response from the upstream server.
AI APIs: Often transient infrastructure churn at the provider's edge (deploys, scaling events). With self-hosted gateways/proxies in front of AI APIs, check your own proxy first.
The server is temporarily unable to handle the request — overload or maintenance. May include Retry-After.
AI APIs: Providers return 503 during capacity crunches; Anthropic distinguishes its overload state with the dedicated 529 code instead.
A gateway did not receive a timely response from the upstream server.
AI APIs: Long non-streaming generations are the usual trigger — the proxy's read timeout expires before the model finishes. Streaming avoids this because bytes flow continuously.
The server does not support the HTTP protocol version used in the request.
The server cannot store the representation needed to complete the request.
Cloudflare-specific: the origin returned an empty, malformed or unexpected response.
AI APIs: Several AI providers sit behind Cloudflare; 520s indicate origin-side trouble that Cloudflare can't classify.
Cloudflare-specific: the TCP connection to the origin server timed out.
AI APIs: Indicates the provider's origin is unreachable from Cloudflare's edge — a genuine provider-side outage signal.
Cloudflare-specific: the origin accepted the connection but did not reply within Cloudflare's (typically 100s) proxy timeout.
AI APIs: The classic long-stream killer: a non-streaming request to a slow model exceeds Cloudflare's 100-second window even though the model is still working. Streaming keeps bytes moving and resets the clock — switch any request that can take >90s to streaming.
Non-standard, Anthropic-specific: the API is temporarily overloaded and shedding load.
AI APIs: Anthropic's dedicated overload signal (overloaded_error). Distinct from 429 — this is not about YOUR rate limit but about aggregate platform load. Typically resolves within seconds to minutes. The SDKs retry it automatically a limited number of times.
How it works
This is an HTTP status reference written for people who ship against AI APIs. Generic status-code lists tell you that 429 means "too many requests"; this one tells you that Anthropic sends Retry-After seconds and splits limits across input and output tokens, that OpenAI exposes remaining quota in x-ratelimit headers, and that Google wraps the same condition in a RESOURCE_EXHAUSTED error. The grid above is searchable by number or name and grouped by class, with the AI-specific notes inline.
Each code links to a dedicated page with the full treatment: what the code means, the AI-API-specific failure modes it shows up in, a ranked fix checklist, and — for retryable codes — a TypeScript retry helper with exponential backoff and jitter that honors Retry-After. The pages also link the relevant RFC section, because the primary source settles arguments faster than any blog post.
A few codes deserve special attention in AI workloads. 429 dominates at scale and is really three different limits wearing one number: requests per minute, tokens per minute, and concurrency. 529 is Anthropic's platform-overload signal, distinct from your personal rate limit and usually short-lived. 524 is Cloudflare's 100-second proxy timeout, the classic killer of long non-streaming generations — the model keeps working, the proxy gives up, and you may still be billed. And 499 in your gateway logs means your own client hung up first, which usually points at a timeout set lower than worst-case generation time.
The retry rule that falls out of all this: honor explicit Retry-After headers, retry 5xx and the 408/429 special cases with exponential backoff plus jitter, and never blind-retry validation failures — a 400 retried ten times is ten identical failures that cost you rate limit headroom. When errors persist beyond a few backoff cycles, check the provider status page before debugging your own code; our Provider Status tool keeps the links and recent incident history one click away, alongside the official status-page links for each provider.
Frequently asked questions
Which HTTP status codes are safe to retry?
Retry the transient ones: 408, 429 (after honoring Retry-After), 500, 502, 503, 504, the Cloudflare 52x family and Anthropic's 529. Never blind-retry 4xx validation errors — a 400 or 401 will fail identically every time and just burns rate limit. The grid marks each code retryable or not, and every detail page for a retryable code includes a TypeScript backoff snippet you can lift directly.
What does Anthropic's 529 Overloaded error mean?
529 is Anthropic's non-standard signal that the platform as a whole is shedding load — it is about aggregate capacity, not your personal rate limit, which is what distinguishes it from 429. It typically clears within seconds to minutes. The right response is exponential backoff with jitter and reduced concurrency; the official SDKs already retry it a limited number of times before surfacing it to your code.
Why do my long AI requests fail with 524?
524 is Cloudflare's proxy timeout: the origin accepted the connection but produced no response bytes within roughly 100 seconds. A slow non-streaming generation trips it even though the model is still working — and you may be billed for tokens you never received. The fix is streaming: once bytes flow continuously, the idle timer never fires. Any request that can take more than about 90 seconds should stream.
How should I handle 429 rate limits across providers?
First, honor the Retry-After header exactly when present — Anthropic and OpenAI both send it, and guessing your own delay when the server told you the answer is self-sabotage. Second, distinguish request-per-minute from token-per-minute limits; OpenAI's x-ratelimit-remaining headers report both. Third, smooth bursts with a client-side token bucket instead of letting retries synchronize into thundering herds.
Is a 4xx error my fault or the provider's?
By definition 4xx means the client — your request — needs to change: bad auth (401), insufficient permissions (403), invalid parameters (400/422), too large (413) or too fast (429). 5xx means the server failed and your request may be fine as-is. The practical consequence: 5xx codes deserve retries, 4xx codes deserve code changes — except 408 and 429, which are 4xx-but-retryable special cases.
Built by FORG — AI cost observability for agentic coding. Free tools, no signup, nothing leaves your browser.
Learn about FORG