FORG — AI Control Plane

The Situation: Baseline Measurement

Our engineering team of 22 developers had been using Claude Code, Cursor, and the Anthropic API for about 8 months. We knew roughly what we were spending — about $1,450/month on average across all tools — but we had no idea where that money was going. No attribution by developer, project, or task type. No visibility into whether we were using the right models for the right tasks. No budgets.

The first thing we did was install FORG and connect all three adapters. Within 48 hours of data collection, the picture was unsettling.

What We Found: The Waste Patterns

FORG's cost intelligence surface surfaced five distinct waste patterns immediately:

1. Idle sessions ($580/month)

The single biggest waste category. Idle sessions are LLM sessions that were opened, made a few calls, and then left active without being explicitly terminated. Because Claude Code holds context across a session, each idle session was continuing to incur costs even when the developer had moved on.

FORG identified 847 sessions in our first week of data where the session was active for more than 2 hours but had a gap of more than 45 minutes between calls. Those sessions were keeping a large context window warm for no reason.

Fix: session timeout rule, 30 minutes of inactivity.

rules:
  - name: "session-timeout"
    type: session_policy
    idle_timeout_minutes: 30
    action: terminate
    notify: false

2. Oversized models for simple tasks ($420/month)

About 31% of our API calls were using claude-opus orgpt-4o for tasks that were clearly simple: short questions, boilerplate generation, docstring writing. FORG's model analysis showed that these calls had median output of 87 tokens — well within the capability of a much cheaper model.

We added a model policy rule that restricts Opus and GPT-4o to specific environments (our CI pipeline and architecture review tasks), and defaults the API to Sonnet/3.5 for everything else.

rules:
  - name: "default-model-policy"
    type: model_policy
    scope: global
    deny_models:
      - "claude-opus*"
      - "gpt-4o"
    except:
      environments:
        - "architecture-review"
        - "ci-pipeline"
    action: redirect_to: "claude-sonnet-4-5"
    message: "Opus reserved for complex tasks. Using Sonnet."

3. No prompt caching ($290/month)

We discovered that none of our Anthropic API calls were using prompt caching. For Claude Code specifically, this is significant — the system prompt is large and nearly identical across sessions. Enabling cache_control on the system prompt alone saved $290/month.

This wasn't a FORG rule — it was a configuration change in our adapter setup. But FORG was the tool that surfaced the gap by showing that ourcache_read token count was zero for 100% of API calls.

4. Duplicate API calls ($180/month)

FORG's session analysis detected 1,200 API calls per week that were semantically identical to a call made in the same session within the last 5 minutes, on the same file, with nearly identical context length. These were likely developer habits: re-running the same prompt, closing and reopening context, triggering the same autocomplete multiple times.

We implemented soft deduplication at the adapter level — a 60-second window where identical calls return cached results — and FORG rules that warn developers when their duplicate call rate exceeds 15%.

5. No budgets = no accountability ($140/month)

This is the soft cost: without budgets, there's no incentive for developers to think about model choice or session hygiene. Once we set per-developer monthly budgets ($75/month, reasonable for a senior dev doing AI-intensive work), spending dropped simply because people became aware of it. This is the Hawthorne effect applied to AI costs.

rules:
  - name: "dev-monthly-budget"
    type: budget
    scope: user
    limit: 75.00
    period: monthly
    warn_at: 60
    action: notify
    over_limit_action: block
    message: "You've reached your monthly AI budget.
              Contact #engineering-ops to request an increase."

Implementation: 3 Days of Work

The actual implementation took about 3 days:

Day 1: FORG setup, all three adapters connected, two weeks of historical data imported from local adapter logs.
Day 2: Data review, waste pattern analysis, rule drafts reviewed with team leads.
Day 3: Rules deployed to warn-only mode first, Slack notifications configured, team briefed.

We ran warn-only mode for 5 days before switching enforcement rules to block mode. This gave developers time to adjust behavior without a sudden productivity impact. The transition was smooth — only 3 developers hit the block action in the first week, and each case was a legitimate exception that got a budget increase.

30-Day Results

Across 8 weeks of data (2 baseline, 6 with rules active):

Average weekly spend dropped from $1,460 to $832 — a 43% reduction
No measurable productivity impact (PR throughput unchanged)
Developer satisfaction with AI tools increased (less context loss from idle session termination)
Zero developer complaints about budget limits (appropriate limits + good messaging)

The annualized savings are around $33,000. FORG Business plan costs $599/month. The ROI calculus isn't complicated.

What We'd Do Differently

Start with warn-only mode for everything. We were impatient on the idle session rule and pushed it to block mode on day 2. One developer lost a long-running session mid-task and was (rightfully) annoyed. Warn first, measure behavior change, then enforce.

Also: segment your budgets by role. Our $75 flat budget was fine for most developers but too low for our three developers who were doing AI-intensive architecture work. Role-based budgets would have been more appropriate from day one.

The tools to do this are all in FORG. We just didn't use them as carefully as we should have on the first pass.

How We Cut Our AI Costs by 43% in 30 Days

Weekly AI Spend: Before vs. After FORG Rules