Back to blog
Product January 14, 2025 8 min read

FORG Atlas: Teaching AI to Understand Your AI Usage

FORG Atlas is a vector-embedded representation of your team's AI usage data that you can query in plain English. It's the intelligence layer on top of the observability layer — turning raw signals into answers you can act on.


Example Queries

Q:

Which model has the best cost-per-output-token ratio for our code review tasks?

A:

claude-sonnet-4-5 at $0.0021/output token (30 days, n=1,847 code review sessions). Opus costs 4.3× more with no measurable output length difference for this task type.


Q:

What percentage of our AI spend is happening outside business hours?

A:

23% of spend occurs between 8pm–8am local time. Backend team: 31%. Frontend team: 12%. Highest after-hours user: bob@company.com ($187 in the last 30 days).


Q:

Are our prompt caching settings working?

A:

Cache hit rate: 34% (last 7 days). Expected for your context patterns: 55-65%. Likely cause: short session durations (median 4.2 min) not amortizing cache write cost. Recommendation: increase idle timeout to 45 min.

Why We Built It

The observability layer in FORG collects a complete record of your team's AI usage: every signal, every session, every cost. After a few months of data collection, you have a rich dataset. The problem is that most of the questions you want to ask of that dataset are not well-served by traditional dashboard charts.

"Which model is best for our code review tasks?" requires grouping signals by task type (which requires inference from session context), by model, by output quality proxy, and by cost — and then ranking. That's a hard query to write in SQL, and it's even harder to visualize. Natural language is a much better interface.

FORG Atlas is our answer: a Retrieval Augmented Generation system that uses your signal data as its knowledge base and exposes it through a natural language query interface.

How It Works

Step 1: Signal Ingestion and Chunking

As signals arrive in the Rule Engine, they're written to the signal store (Supabase) in the raw format. Nightly, a background job aggregates signals into semantic chunks: session summaries, daily developer summaries, team weekly summaries, model performance summaries, and cost anomaly records.

A session summary looks like:

{
  "type": "session_summary",
  "session_id": "sess_01hwx",
  "date": "2025-01-14",
  "user": "alice@company.com",
  "team": "backend",
  "project": "payment-service",
  "duration_min": 28,
  "call_count": 14,
  "model_primary": "claude-sonnet-4-5",
  "tokens_total": 42847,
  "cost_usd": 0.847,
  "cache_hit_rate": 0.34,
  "avg_ttft_ms": 287,
  "task_type_inferred": "code_review",  // from session pattern
  "tools_used": ["Bash", "Edit", "Read"]
}

Step 2: Embedding

Each chunk is converted to a natural-language description and embedded using OpenAI's text-embedding-3-small model (1536 dimensions). The embedding is stored in Supabase with pgvector.

For example, the session summary above becomes:

"Backend developer alice@company.com ran a 28-minute code review session on the payment-service project on January 14, 2025. She made 14 API calls to claude-sonnet-4-5, using 42,847 tokens at a cost of $0.85. Cache hit rate was 34%. Average time to first token was 287ms."

This natural language representation embeds better than the raw JSON for semantic retrieval — it captures the conceptual structure of the session, not just the field values.

Step 3: Query Execution

When you submit a query, FORG Atlas:

  1. Embeds your query using the same model
  2. Performs a cosine similarity search over the vector store to retrieve the top-N most relevant chunks (typical N: 20-40)
  3. Passes the retrieved chunks plus your query to Claude Sonnet with a structured prompt that asks for a specific, quantified answer
  4. Returns the answer with citations to the source data
// FORG Atlas query API
POST /engine/v1/atlas/query
{
  "query": "Which developer has the highest cost per session this month?",
  "date_range": { "start": "2025-01-01", "end": "2025-01-14" },
  "include_sources": true
}

// Response
{
  "answer": "bob@company.com has the highest cost per session this month at $1.24/session
             (34 sessions, $42.16 total). This is 2.1× the team average of $0.59/session.
             The primary driver is large context size (avg 8,400 input tokens/session vs.
             team avg of 3,200), consistent with the database optimization work he's been
             doing on the payments-service.",
  "confidence": 0.87,
  "sources": [
    { "type": "session_summary", "session_id": "sess_...", "relevance": 0.94 },
    // ... 12 more source chunks
  ]
}

Privacy Constraints

FORG Atlas respects the same k-anonymity constraints as the rest of FORG. Individual developer data is only surfaced if the team has ≥5 members. Below that threshold, queries that would return individual-level data return team-level aggregates instead, with a note explaining the privacy constraint.

The retrieval step is also tenant-isolated at the row level (Supabase RLS). Your queries never retrieve signal data from other organizations.

What You Can Ask

FORG Atlas is best at questions that require aggregation, comparison, or inference across your signal data:

  • Model performance comparisons by task type
  • Cost attribution by developer, team, project, environment
  • Cache efficiency analysis and recommendations
  • Usage pattern anomalies ("has anything changed this week?")
  • Budget utilization forecasts ("at this rate, when do we hit our limit?")
  • Peak usage analysis ("when does our team use the most AI?")
  • Model substitution analysis ("what would we save if we switched X model to Y?")

FORG Atlas is weaker on:

  • Questions about prompt content (we don't have it — by design)
  • Very recent data (chunks are updated nightly, not real-time)
  • Highly specific drill-downs better served by the dashboard UI

Current Status and Roadmap

FORG Atlas is in alpha as of v2.8.0. It's available on Business+ plans. Known limitations:

  • Response latency: 3-8 seconds per query (embedding + retrieval + generation)
  • Data freshness: nightly batch, not real-time
  • Query complexity: single-turn only (no follow-up questions yet)

Roadmap for FORG Atlas through v3.x:

  • Multi-turn conversations with context retention
  • Scheduled reports: "Send me a weekly answer to this query"
  • Recommendation engine: proactive suggestions based on usage patterns
  • Streaming responses for lower perceived latency
  • Custom chunk types for teams that want to add their own metadata

If you're on a Business plan and want early access to FORG Atlas, it's in the dashboard under Optimize → AI Insights. Send us feedback at hello@forg.pro— we're actively iterating based on the queries teams are actually asking.