AI Carbon Footprint Estimator
Estimate the energy and CO2 of your monthly token volume, with honest uncertainty ranges.
Small-model share: 10% (remainder).
CO₂e per month (± range, not a point estimate) from 50M tokens — using 31.3–125.5 kWh on a United States grid.
Uncertainty range
- Energy (midpoint)
- 78.4 kWh/mo
- CO₂e (midpoint)
- 29.8 kg/mo
- ≈ driving
- 175 km
- ≈ phone charges
- 6,531
Methodology
Inference-only. Energy per 1k tokens by tier: frontier 1–4 Wh, mid 0.3–1.2 Wh, small 0.05–0.3 Wh — ranges drawn from published per-query estimates (Epoch AI, 2025; Google Gemini disclosure, 2025; academic measurements of open models) including datacenter PUE. Grid intensities are IEA/EPA public averages. Equivalences: 0.17 kg/km average petrol car, 0.012 kWh per phone charge. Training emissions, embodied hardware carbon and water use are excluded.
How it works
Most AI carbon calculators commit the cardinal sin of fake precision: they multiply three uncertain numbers and print six significant figures. This estimator refuses. It takes your monthly token volume, your model-tier mix and a grid region, and returns a range — kWh and kg CO₂e per month — with the methodology and sources printed next to the result, because an estimate you cannot audit is marketing, not measurement.
The chain has three links. Tokens to energy: each tier carries a published-estimate range of watt-hours per thousand tokens (frontier 1.0–4.0 Wh, mid 0.3–1.2 Wh, small 0.05–0.3 Wh, including datacenter PUE), blended by your mix sliders. Energy to carbon: multiplied by the grid intensity of your region preset, from public IEA/EPA averages — a Nordic hydro grid at 0.03 kg/kWh emits ~15× less than a coal-heavy one at 0.55. The equivalence lines (kilometers driven, phone charges) use the midpoint and standard conversion factors, and exist to make the magnitude graspable, not to dramatize it.
What the magnitude usually shows: for a typical team, monthly LLM inference lands in the range of a single tank of petrol — real, worth optimizing, and far smaller than a commute. The leverage is identical to cost leverage. A frontier model burns 10–50× the energy of a small one per token, so the model-routing decisions that cut your bill cut your footprint almost proportionally. Wasted tokens — stuck agent loops, cache misses, bloated context windows — are wasted energy by definition. Efficiency and sustainability are the same project here, which is rare and convenient.
Excluded, and stated plainly: training emissions (a one-time cost with an unknowable per-query denominator), embodied hardware carbon, and water use. Provider-side disclosure remains thin, so treat the range as a sanity-check instrument rather than an ESG-report figure. The input that is fully knowable is your token volume — most teams guess it badly. FORG measures actual per-session token consumption across your team, which makes the one auditable number in this estimate a fact instead of a guess. Share the link to put your scenario in front of whoever asked about the footprint.
Frequently asked questions
How is the energy per token estimated?
From published per-query measurements: Google's 2025 Gemini disclosure (~0.24 Wh per median text prompt), Epoch AI's GPT-4o estimates (~0.3 Wh per typical query), and academic measurements of open models on known hardware. We convert these to Wh per 1,000 tokens by tier, carry them as low/high ranges rather than point values, and include datacenter overhead (PUE). The honest answer is that provider-side numbers are partially disclosed at best.
Why do frontier and small models differ so much in energy?
Energy scales roughly with the compute per token, which scales with active parameter count. A small distilled model activates a fraction of the weights a frontier model does — often 10-50× less compute per token — which is why the tier-mix sliders move the result so dramatically. The same routing decisions that cut your bill (send routine work to small models) cut the footprint nearly proportionally.
Does this include training emissions?
No, deliberately. Training is a one-time cost amortized over every query the model ever serves — and since you cannot know the denominator, any per-query training allocation is fiction. Most lifecycle analyses find inference dominates total emissions for heavily-used models anyway. Embodied hardware carbon and datacenter water use are also excluded, and the methodology block says so rather than hiding it.
What actually reduces an AI workload's footprint?
In order of leverage: route work to smaller models (10-50× less energy per token), cut wasted tokens — runaway agent loops, cache misses and bloated context burn energy exactly like they burn money — and prefer providers in low-carbon regions, since a Nordic hydro grid emits ~15× less per kWh than a coal-heavy one. The first two are the same optimizations FORG surfaces for cost: waste is waste in both currencies.
Built by FORG — AI cost observability for agentic coding. Free tools, no signup, nothing leaves your browser.
Learn about FORG