Question 1

What is the difference between temperature and top_p?

Accepted Answer

They attack randomness from different angles. Temperature reshapes the whole distribution: dividing every logit by the temperature sharpens it below 1 (the favorite gets even more likely) and flattens it above 1 (tail tokens gain ground). Top_p never reshapes anything — it truncates, keeping only the smallest set of tokens whose probabilities sum to p and renormalizing. In practice temperature controls how adventurous each pick is, while top_p caps how deep into the tail an adventurous pick is allowed to reach. Most providers recommend tuning one, not both.

Question 2

What settings should I use for coding versus creative writing?

Accepted Answer

For code, structured output and tool calls, run cold: temperature 0 to 0.3 with top_p left at 1. Wrong-but-plausible tokens are expensive in code, and there is usually one right answer. For brainstorming, naming and prose, temperature 0.8 to 1.2 gives useful variety; pairing with top_p around 0.9 trims the incoherent tail that high temperature lifts. Agentic coding tools ship near-deterministic defaults for good reason — when an agent loops on a task, sampling noise compounds across turns.

Question 3

Why is temperature 0 not fully deterministic in real APIs?

Accepted Answer

Temperature 0 makes the sampling step deterministic — always take the argmax — but the forward pass that produces the logits is not bit-identical between runs. Batched GPU inference adds floating-point non-associativity, mixture-of-experts routing can differ with batch composition, and providers shuffle requests across heterogeneous hardware. When two tokens sit within numerical noise of each other, the argmax itself can flip. So temperature 0 means "as deterministic as the infrastructure allows", not a reproducibility guarantee — for that, some APIs offer a seed parameter, also best-effort.

Question 4

What does top_k do, and when does it matter over top_p?

Accepted Answer

Top_k keeps the k highest-probability tokens regardless of how much probability they hold, where top_p keeps however many tokens it takes to cover p probability mass. The difference shows at the extremes: in a confident distribution, top_p 0.9 might keep just one token while top_k 40 keeps forty near-zero stragglers; in a flat distribution, top_k 5 may cut tokens that held real probability. Anthropic and Google expose top_k directly; OpenAI's Chat Completions API does not — one of the parameter gaps to watch when porting requests between providers.

Question 5

In what order are temperature, top_k and top_p applied?

Accepted Answer

The standard pipeline — and the one this playground implements — is: scale logits by temperature, softmax into probabilities, apply top_k (rank cutoff), apply top_p (cumulative mass cutoff), renormalize the survivors, then sample. The order matters: because temperature reshapes probabilities before truncation, a high temperature can pull tail tokens above the top_p waterline that would have been cut at temperature 1. Watch the chart: raise temperature with top_p at 0.9 and you can see the survivor set itself grow.

Temperature & Top-p Playground

10 samples at these settings

How it works

Frequently asked questions

What is the difference between temperature and top_p?

What settings should I use for coding versus creative writing?

Why is temperature 0 not fully deterministic in real APIs?

What does top_k do, and when does it matter over top_p?

In what order are temperature, top_k and top_p applied?

Related tools

System Prompt Linter

Model Capability Picker

Context Rot Simulator

Token Counter