Token savings — accuracy spec
How FORG defines each savings tier, the 10 checks an entry must pass to count as measured, and the sensitive-task exclusions that apply to every module.
Savings taxonomy
Every entry in the savings ledger carries a verification_status that maps to one of five tiers. The tiers are never blended in totals. Only measured and reconciled count toward the "actual savings" column.
measuredOriginal cost is provable (provider request id, exact price at call time) and avoided cost is provable. Reversals net to zero.
reconciledCarried forward from a measured prior period with no reversal in the current window.
estimatedProjected from observed patterns. safe_to_auto_apply = false. Never counts toward actual totals.
forecastHypothetical savings if a proposed policy were enabled. Always dry-run, always read-only.
suppressedDropped from totals — abuse-guard rejected, sensitive-task carry, or duplicate.
The 10-check accuracy gate
Before an opportunity is promoted to verified, all 10 checks must pass. Any failure drops it to estimated or suppressed — it never inflates a measured total.
- Provider request id present on the original call
- Model price at call time matches model_pricing_history exactly
- No duplicate ledger row for the same evidence_hash
- No abuse-guard flag (runaway, fingerprint replay, sensitive downgrade)
- Not in a sensitive-task window (security / compliance / production-incident / unknown)
- Reversal partner balances to zero (if any reversal exists, the pair must net out)
- Cache hit is the model's published cache rate, not the full input rate
- Compaction row is the diff sent to the model, not the full source
- Budget-broker row was emitted before the hard cap, not after
- Forecast row came from the simulator (dry_run: true, mutation_allowed: false)
Sensitive-task exclusions
The following task keys are excluded from every optimization pass, every projection, and every count. The exclusion is structural — it is enforced before the policy engine sees the row, and it cannot be disabled.
- security — code or config that affects authentication, authorization, or secrets
- compliance — code or text produced for a regulatory or audit deliverable
- production_incident — anything triggered by an active incident response
- unknown — sessions that cannot be classified (treated as sensitive by default)
How savings are written
The token_savings_ledger is append-only. Every entry has:
- claim_type — one of measured_cache, measured_compaction, prevented_runaway, model_opportunity, repeat_opportunity, policy_simulation, abuse_suppressed, reversal
- verification_status — verified, reconciled, reported, estimated, forecast, or suppressed
- evidence_hash — a deterministic hash of the input that produced the row
- reversal_pair_id — if present, the row is the counterpart of another row and nets to zero
- model_pricing_history_id — foreign key to the exact price at call time
How the policy simulator works
The simulator answers "what if?" without writing anything. Its return value always carries:
dry_run: truemutation_allowed: falseread_only: trueestimated_affected_sessionsandprojected_estimated_savings_usdlabeled as projections, not commitmentssensitive_sessions_affected— the number of sensitive sessions that would have been affected (always reported; the simulator never silently excludes them)
The simulator is the only path to enable a hard budget gate, a model downgrade, or a context-compaction policy. Every such enable writes an audit_log entry with the user who enabled it, the projected blast radius, and the diff against the prior policy.
Endpoints
User dashboard
GET /api/dashboard/savings/ledger— verified + reconciled + estimated rows for the current userGET /api/dashboard/savings/opportunities— estimated opportunities (safe_to_auto_apply = false)GET /api/dashboard/usage?include=savings_taxonomy— adds a savings_taxonomy aggregate to usage
Internal health, incident, and audit endpoints are intentionally omitted from public docs.
What we never do
- We never count an estimated opportunity toward an actual savings total.
- We never reverse a real call to inflate a number — reversal rows only match against a row FORG itself wrote.
- We never enable a hard gate silently — every opt-in is audited.
- We never measure savings on sensitive-task sessions, even if the math would favor it.
- We never blend tiers. If a dashboard column is labeled "measured", it is exactly that.
See the public-facing overview
The Token Savings feature page summarises the same five tiers and the four opt-in modules in a less technical voice.
Open the feature page