Back to blog
Product May 20, 2025 8 min read

Introducing FORG v3: The AI Control Plane

After six months of rebuilding from the ground up, FORG v3 is here. Complete architecture rewrite, three new pillars, and a fundamentally different approach to AI observability at team scale.


When we shipped FORG v1.5.2 last September, we had a working product. Engineers could install the Claude Code adapter, signals would flow into the dashboard, and you could see token counts. It was useful. But it wasn't what we set out to build.

The problem with v1 was that it answered the question "how much are we spending?"without answering the more important ones: why are we spending that,who is spending it, what rules should govern it going forward, andwhat can we learn from six months of usage data. Those are the questions engineering leaders actually have. v3 is built to answer them.

The Three Pillars

FORG v3 is organized around three pillars: Observe,Control, and Optimize. Each pillar has dedicated infrastructure, dedicated UI surface, and dedicated API primitives. They're designed to be used together, but each delivers standalone value.

Pillar 1: Observe

Observation in FORG v3 means capturing a complete, structured record of every LLM interaction your team makes — across every tool, every model, every developer — without storing a single prompt or completion.

Each signal contains: timestamp, session ID, adapter ID (which tool), model, token counts (input/output/cached), latency (TTFT + total), cost in USD, and a set of dimensions you can add (user, project, team, environment). That's it. No payloads. No context. No way to accidentally exfiltrate sensitive code or data.

The v3 rule engine ingests signals at over 10,000/second per worker with sub-5ms median latency added to the critical path. The schema is stable and versioned. You can backfill from local adapter logs if you started collecting before connecting to the cloud.

Pillar 2: Control

The Control pillar is new in v3. It's the enforcement layer: budget rules, model policies, user allowlists, rate limits, and kill switches. Rules evaluate before each LLM call (gateway mode) or asynchronously after (enforcement mode, zero latency impact).

A rule looks like this at the config level:

# .forg/rules.yaml
rules:
  - name: "monthly-dev-budget"
    type: budget
    scope: user
    limit: 100.00
    period: monthly
    action: block
    notify: true

  - name: "no-gpt4-in-prod"
    type: model_policy
    scope: environment
    match:
      environment: production
    deny_models:
      - "gpt-4*"
      - "claude-opus*"
    action: block

  - name: "team-daily-cap"
    type: budget
    scope: team
    limit: 500.00
    period: daily
    action: warn_at: 80
    notify: slack

Rules are evaluated in priority order. Budget rules are cumulative across window periods. Model policy rules are stateless. Conflict resolution is deterministic: the most restrictive rule wins, and every enforcement decision is written to the audit log with full context.

Pillar 3: Optimize

Optimize is the intelligence layer. It's two things: a cost intelligence dashboard that automatically identifies waste patterns, and FORG Atlas — a vector-embedded representation of your usage data that you can query in plain English.

The cost intelligence dashboard surfaces patterns like: developer X is running 50 sessions/day that each use a 128k context but terminate after 3 turns (probably a misconfigured tool), team Y is paying 3x the market rate for a model family that underperforms on their task type, 40% of your spend happens in the last 2 hours of the business day.

FORG Atlas lets you ask: "Which model has the best cost-per-task ratio for code review tasks in the last 30 days?" and get an answer with citation to actual usage data.

Technical Architecture

FORG v3 is three independent services:

  • Rule Engine Worker — Cloudflare Worker atforg.pro/engine/*. Signal ingestion, classifier, rules evaluation, Supabase profile read/write. No LLM in the real-time path.
  • License Worker — Cloudflare Worker atforg.pro/agent/*. License and identity on D1. Handles activation, verification, machine fingerprinting, release manifest.
  • Dashboard (site/) — Next.js 15 on Vercel. Marketing plus the authenticated dashboard. Supabase auth. Reads from both workers via API routes.

Data is split by design: D1 for license/identity data (License Worker), Supabase + pgvector for all behavioral data (Rule Engine). These two stores never talk to each other directly. The agent binary holds a signed license token (format: lic_<20hex>) and session keys derived per-session using HKDF-SHA256.

The Agent: Signal Collection Only

The forg binary is a Go agent. It is a signal collector. Nothing more. Zero on-device intelligence. It hooks into your tools (Claude Code, Cursor, VS Code) via lightweight adapters, captures metadata from LLM call completions, and emits signals to the Rule Engine over HTTPS. The critical design constraint: the agent never touches prompt content. It reads token counts and timing from the tool's completion event, not from the HTTP body.

# Install and activate
npm install -g forg-agent

# Activate with your license key
forg activate lic_a1b2c3d4e5f6a7b8c9d0

# Check status
forg status

# Tail live signals
forg tail --format=json

The signal payload that flows from your machine to the Rule Engine looks like this:

{
  "v": 3,
  "session_id": "sess_01hwxyzabc123",
  "adapter": "claude-code",
  "model": "claude-sonnet-4-5",
  "ts": 1716840000000,
  "tokens": {
    "input": 2847,
    "output": 412,
    "cache_read": 1200,
    "cache_write": 0
  },
  "cost_usd": 0.00892,
  "latency_ms": {
    "ttft": 312,
    "total": 1847
  },
  "dimensions": {
    "user": "alice@company.com",
    "project": "backend-api",
    "environment": "development"
  }
}

No prompt. No completion. No file paths. No tool calls. Just the metadata you need to understand and govern your AI usage.

What's in v3.0.0 GA

  • Complete Go agent rewrite — CGO-native keystore on macOS/Linux/Windows
  • Rules engine v2 — budget, model policy, rate limit, and user scope rules
  • Signal ingestion API with versioned schema
  • Unified dashboard with Observe / Control / Optimize tabs
  • FORG Atlas alpha — natural language usage queries
  • Claude Code, Cursor, VS Code, JetBrains adapters
  • Team + Business plans with org hierarchy, SCIM, SAML SSO
  • Data residency: US and EU (Business+)
  • Audit log with cryptographic chain (tamper-evident)
  • Webhooks for rule enforcement events

Roadmap

v3 is the foundation. Here's what we're building next:

  • Q3 2025 — Gateway mode for zero-latency enforcement, pre-call rule evaluation via sidecar proxy
  • Q3 2025 — Terraform provider for rules-as-code
  • Q4 2025 — Anomaly detection on usage patterns
  • Q4 2025 — Cost forecasting with 30/60/90 day projections
  • Q1 2026 — Model recommendation engine (optimize for cost-per-task by category)

Getting Started

FORG v3 is available today on all plans. If you're on v1 or v2, the migration is straightforward — the adapter protocol is backwards-compatible, and we have an automated migration guide in the docs.

New to FORG? Start with the Solo plan — one user, full access to the Observe pillar, and pricing that scales cleanly into Professional, Team, and Business as your usage grows.

The goal was never to build a billing dashboard for AI. The goal was to build the control plane that gives engineering leaders the same visibility into their AI toolchain that they have into their cloud infrastructure. v3 is the first version that actually does that.

If you have questions, the best place is Discord. We're in there daily and the community has grown to over 500 engineers who are all dealing with the same problems you are.

Download FORG v3 and read the docs. We can't wait to see what you build.