Token Savings

Savings you can audit.
Estimates you can trust.

FORG separates five tiers of savings — measured, reconciled, estimated, forecast, and suppressed — and never blends them. Actual savings only count what we can prove. Opportunities are explicit about being estimates until they are measured.

No silent automation. No "guaranteed" claims. No reversing of a real call just to inflate a number. Every entry is auditable end to end.

Get started Read the accuracy spec

Five tiers, never blended

Every number FORG surfaces carries a label. Only measured and reconciled count toward actual totals.

Measured

Counted from real telemetry. Two independent checks must agree: the original token-cost is provable, and the avoided cost is provable. Reversals net to zero.

Examples: cache reuse, compaction on a real call, prevented runaway with provider-request-id proof

Reconciled

Carried forward from a measured prior period with no reversal in the current window. Conservative rolling credit.

Examples: cache hit on a follow-up session that references the original request

Estimated

Projected from observed patterns. Never counts toward actual savings totals. Marked with safe_to_auto_apply = false until measured.

Examples: right-size suggestion, repeat-work fingerprint, off-hours tier recommendation

Forecast

Hypothetical savings if a proposed policy were enabled. Always dry-run. Always read-only.

Examples: policy simulator output, threshold-change blast radius

Suppressed

Dropped from totals — abuse-guard rejected, sensitive-task carry, or duplicate of an existing entry.

Examples: fingerprint collision with prior window, security-flagged reversal pair

Four modules, each on its own

Each module is opt-in, dry-run by default, and never auto-applies.

Prompt-cache optimizer

Identifies repeated prompt prefixes that benefit from prompt caching. Reports cache-read vs cache-write token deltas.

Counts cache hits against the model's published cache rate, not the full input rate.
Marks each entry with the provider and the request id used to verify the cache.
Reversal pair is automatic if a later call shows the same prefix in non-cached form.

Context-diff compaction

Measures tokens saved when an old context is replaced with a diff before being sent to the model.

Compares the bytes-of-context sent to the model against the bytes-of-source the agent edited.
Net savings = (source bytes − context bytes) × per-token rate, only when context bytes are lower.
Reversal pair if the agent re-fetches the full source after compaction.

Budget broker

Advisory per-session and per-day cost ceilings. Surfaces a soft warning before crossing a hard cap.

Hard gates require explicit opt-in and an audit note — never on by default.
Sensitive tasks (security, compliance, production incident, unknown) are always excluded.
Reports the projected false-positive rate before you enable.

Policy simulator

Dry-run preview of a policy change before it goes live. Always read-only, always labeled forecast.

Returns a blast-radius object: orgs, sessions, expected false positives, sensitive sessions affected.
Excludes sensitive sessions from the eligible pool by default regardless of policy flags.
mutation_allowed: false and read_only: true are structural — the API cannot write through it.

Accuracy gate

Ten checks before promotion to verified

An opportunity must pass every gate to count toward the measured total. Any failure drops it to estimated or suppressed — never inflates.

Provider request id present

Links the saved call to a verifiable provider billable record.

Exact model price at call time

Pulls from model_pricing_history; rejects entries priced with a non-canonical rate.

No duplicate ledger row

evidence_hash buckets reversals so an abuse pair nets to zero.

No abuse-guard flag on the row

Drops anything the abuse guard classified as runaway or fingerprint-replay.

Not in a sensitive-task window

Entries that overlap security / compliance / production-incident / unknown are never counted.

Reversal partner is balanced

If a reversal exists, the pair must net to zero before promotion to verified.

What we do not claim

FORG does not guarantee any specific savings outcome for your team.
FORG does not silently enable budget gates, model downgrades, or compaction — every action requires an explicit, audited enable.
FORG does not collect prompt text or completion content — savings come from metadata, which limits what we can measure.
FORG does not count estimated opportunities toward your actual savings total. The two columns are always separate.

See your measured savings on day one

Install FORG, connect one adapter, and watch measured vs estimated split in real time.

Get started Run the savings estimator

Savings you can audit.Estimates you can trust.