Docs / Token savings

Token savings — accuracy spec

How FORG defines each savings tier, the 10 checks an entry must pass to count as measured, and the sensitive-task exclusions that apply to every module.

Savings taxonomy

Every entry in the savings ledger carries a verification_status that maps to one of five tiers. The tiers are never blended in totals. Only measured and reconciled count toward the "actual savings" column.

measured

Original cost is provable (provider request id, exact price at call time) and avoided cost is provable. Reversals net to zero.

reconciled

Carried forward from a measured prior period with no reversal in the current window.

estimated

Projected from observed patterns. safe_to_auto_apply = false. Never counts toward actual totals.

forecast

Hypothetical savings if a proposed policy were enabled. Always dry-run, always read-only.

suppressed

Dropped from totals — abuse-guard rejected, sensitive-task carry, or duplicate.

The 10-check accuracy gate

Before an opportunity is promoted to verified, all 10 checks must pass. Any failure drops it to estimated or suppressed — it never inflates a measured total.

Provider request id present on the original call
Model price at call time matches model_pricing_history exactly
No duplicate ledger row for the same evidence_hash
No abuse-guard flag (runaway, fingerprint replay, sensitive downgrade)
Not in a sensitive-task window (security / compliance / production-incident / unknown)
Reversal partner balances to zero (if any reversal exists, the pair must net out)
Cache hit is the model's published cache rate, not the full input rate
Compaction row is the diff sent to the model, not the full source
Budget-broker row was emitted before the hard cap, not after
Forecast row came from the simulator (dry_run: true, mutation_allowed: false)

Sensitive-task exclusions

The following task keys are excluded from every optimization pass, every projection, and every count. The exclusion is structural — it is enforced before the policy engine sees the row, and it cannot be disabled.

security — code or config that affects authentication, authorization, or secrets
compliance — code or text produced for a regulatory or audit deliverable
production_incident — anything triggered by an active incident response
unknown — sessions that cannot be classified (treated as sensitive by default)

How savings are written

The token_savings_ledger is append-only. Every entry has:

claim_type — one of measured_cache, measured_compaction, prevented_runaway, model_opportunity, repeat_opportunity, policy_simulation, abuse_suppressed, reversal
verification_status — verified, reconciled, reported, estimated, forecast, or suppressed
evidence_hash — a deterministic hash of the input that produced the row
reversal_pair_id — if present, the row is the counterpart of another row and nets to zero
model_pricing_history_id — foreign key to the exact price at call time

How the policy simulator works

The simulator answers "what if?" without writing anything. Its return value always carries:

dry_run: true
mutation_allowed: false
read_only: true
estimated_affected_sessions and projected_estimated_savings_usd labeled as projections, not commitments
sensitive_sessions_affected — the number of sensitive sessions that would have been affected (always reported; the simulator never silently excludes them)

The simulator is the only path to enable a hard budget gate, a model downgrade, or a context-compaction policy. Every such enable writes an audit_log entry with the user who enabled it, the projected blast radius, and the diff against the prior policy.

Endpoints

User dashboard

GET /api/dashboard/savings/ledger — verified + reconciled + estimated rows for the current user
GET /api/dashboard/savings/opportunities — estimated opportunities (safe_to_auto_apply = false)
GET /api/dashboard/usage?include=savings_taxonomy — adds a savings_taxonomy aggregate to usage

Internal health, incident, and audit endpoints are intentionally omitted from public docs.

What we never do

We never count an estimated opportunity toward an actual savings total.
We never reverse a real call to inflate a number — reversal rows only match against a row FORG itself wrote.
We never enable a hard gate silently — every opt-in is audited.
We never measure savings on sensitive-task sessions, even if the math would favor it.
We never blend tiers. If a dashboard column is labeled "measured", it is exactly that.

See the public-facing overview

The Token Savings feature page summarises the same five tiers and the four opt-in modules in a less technical voice.

Open the feature page