Question 1

How is the energy per token estimated?

Accepted Answer

From published per-query measurements: Google's 2025 Gemini disclosure (~0.24 Wh per median text prompt), Epoch AI's GPT-4o estimates (~0.3 Wh per typical query), and academic measurements of open models on known hardware. We convert these to Wh per 1,000 tokens by tier, carry them as low/high ranges rather than point values, and include datacenter overhead (PUE). The honest answer is that provider-side numbers are partially disclosed at best.

Question 2

Why do frontier and small models differ so much in energy?

Accepted Answer

Energy scales roughly with the compute per token, which scales with active parameter count. A small distilled model activates a fraction of the weights a frontier model does — often 10-50× less compute per token — which is why the tier-mix sliders move the result so dramatically. The same routing decisions that cut your bill (send routine work to small models) cut the footprint nearly proportionally.

Question 3

Does this include training emissions?

Accepted Answer

No, deliberately. Training is a one-time cost amortized over every query the model ever serves — and since you cannot know the denominator, any per-query training allocation is fiction. Most lifecycle analyses find inference dominates total emissions for heavily-used models anyway. Embodied hardware carbon and datacenter water use are also excluded, and the methodology block says so rather than hiding it.

Question 4

What actually reduces an AI workload's footprint?

Accepted Answer

In order of leverage: route work to smaller models (10-50× less energy per token), cut wasted tokens — runaway agent loops, cache misses and bloated context burn energy exactly like they burn money — and prefer providers in low-carbon regions, since a Nordic hydro grid emits ~15× less per kWh than a coal-heavy one. The first two are the same optimizations FORG surfaces for cost: waste is waste in both currencies.

AI Carbon Footprint Estimator

How it works

Frequently asked questions

How is the energy per token estimated?

Why do frontier and small models differ so much in energy?

Does this include training emissions?

What actually reduces an AI workload's footprint?

Related tools

AI Model Pricing Comparison

Self-Host vs API Calculator

Token Counter

Batch API Savings Calculator