Question 1

Why do I need both a human-readable policy and a machine config?

Accepted Answer

Because they serve different audiences and fail differently. The markdown policy is for people — managers approving it, developers understanding what happens at the cap, auditors checking governance exists. The JSON config is for systems — the thing your enforcement tooling actually reads. Keeping them generated from the same inputs guarantees they never drift apart, which is the classic failure mode: a policy document promising caps that no system enforces, or enforcement rules nobody documented.

Question 2

How should I split the budget across teams?

Accepted Answer

Start proportional to headcount, then adjust for workload reality: a platform team running agents in CI burns multiples of what a team doing occasional completions does. The builder validates that shares sum to 100% so you cannot accidentally over-allocate. Resist the temptation to leave slack unallocated as a hidden buffer — make the buffer an explicit line instead, owned by whoever arbitrates mid-month overage requests, so the negotiation has a named owner.

Question 3

What are sensible alert thresholds and why three of them?

Accepted Answer

The default 50/80/100 pattern maps to three different responses. Fifty percent mid-month is informational — pace check, no action. Eighty percent is the working alert: the team lead looks at what changed and decides whether to slow down or request more. One hundred percent triggers the enforcement action you chose. Three thresholds work because one alert is noise people unsubscribe from, and continuous alerts are worse; the escalating pattern means each message carries a different expected response.

Question 4

Should the enforcement action be alert, throttle or block?

Accepted Answer

Match it to the blast radius of being wrong in each direction. Alert-only is right while you are building trust and your usage baselines are still guesses — getting blocked by a miscalibrated cap teaches developers to route around the system. Throttle is the strong default once baselines are real: work continues at reduced pace and runaway loops get contained. Hard block is for environments where an overage is genuinely worse than stopped work — rare in practice, common in procurement imaginations.

Question 5

What good is a per-developer daily cap on top of team budgets?

Accepted Answer

It is your runaway-agent circuit breaker. Monthly team budgets catch slow drift but are far too coarse for the failure mode that actually hurts: an agent loop or retry storm that burns through hundreds of dollars in an afternoon. A daily per-developer cap bounds the worst case of any single incident to one day of one person's allowance, which turns a potential month-killer into a footnote. Set it at roughly three times a heavy user's normal day so legitimate spikes clear it.

Token Budget Policy Builder

How it works

Frequently asked questions

Why do I need both a human-readable policy and a machine config?

How should I split the budget across teams?

What are sensible alert thresholds and why three of them?

Should the enforcement action be alert, throttle or block?

What good is a per-developer daily cap on top of team budgets?

Related tools

AI Team Budget Planner

AI Usage Policy Generator

AI Chargeback Allocator

Agent Permission Matrix