Rules Engine Deep Dive: Budget Enforcement at Scale — FORG Blog

Architecture Overview

The FORG Rules Engine is a Cloudflare Worker deployed atforg.pro/engine/*. It's stateless by design: every signal evaluation is self-contained. State (budget accumulators, rule versions) lives in Supabase, accessed via connection pooling through Cloudflare Workers.

The evaluation path for a single signal:

Signal arrives at the Worker over HTTPS (mutual TLS)
HMAC signature verification (session key derived from license)
Signal parsed and normalized to internal schema (v3)
Rule set fetched (in-memory cache, 5s TTL)
Rules evaluated in priority order
Budget accumulators updated atomically (Supabase RPC)
Enforcement actions executed (block/warn/notify)
Signal written to signal store
Audit log entry written

Total P50 latency: 3.2ms. P99: 11ms. This is the overhead added to your AI toolchain when using synchronous enforcement mode. In async mode (default), the signal is queued and the adapter gets an immediate 200 response — zero latency impact.

The Four Rule Types

1. Budget Rules

Budget rules enforce spending limits over a time window. They're the most commonly used rule type and the most complex to implement correctly because budget windows must be maintained accurately across distributed signal sources.

# Budget rule schema
rules:
  - name: string             # Unique identifier
    type: budget
    scope: user | team | org # What entity the limit applies to
    limit: number            # USD limit
    period: daily | weekly | monthly | rolling_30d
    warn_at: number          # Percentage to send warning (0-100)
    action: notify | block   # What happens at limit
    notify: boolean | slack | email  # Notification channel

Budget windows are evaluated from the start of the current period. For period: monthly, the window is the calendar month. For period: rolling_30d, it's the last 30 days from now. Budget accumulators are updated atomically using Postgres advisory locks on the scope+period combination to prevent race conditions.

# Example: per-developer monthly budget with Slack notify
- name: "dev-monthly-75"
  type: budget
  scope: user
  limit: 75.00
  period: monthly
  warn_at: 80
  action: block
  notify: slack

# Example: team-level daily cap with warning only
- name: "backend-daily-500"
  type: budget
  scope: team
  match:
    team: "backend"
  limit: 500.00
  period: daily
  warn_at: 70
  action: notify
  notify: true

2. Model Policy Rules

Model policy rules control which models can be used in which contexts. They're stateless — no accumulators needed — so evaluation is O(1) and adds negligible overhead.

# Model policy rule schema
rules:
  - name: string
    type: model_policy
    scope: global | user | team | environment
    match:                    # Optional: conditions
      team: string
      environment: string
      user: string
    allow_models: [string]    # Allowlist (OR)
    deny_models: [string]     # Denylist (takes precedence)
    redirect_to: string       # Optional: redirect to this model
    action: block | redirect | warn

# Example: restrict expensive models globally,
# allow for specific environments
- name: "global-model-policy"
  type: model_policy
  scope: global
  deny_models:
    - "claude-opus*"
    - "gpt-4o"
    - "gemini-ultra*"
  action: redirect
  redirect_to: "claude-sonnet-4-5"

- name: "arch-review-allow-opus"
  type: model_policy
  scope: environment
  match:
    environment: "architecture-review"
  allow_models:
    - "claude-opus*"
  action: allow

3. Rate Limit Rules

Rate limit rules cap the number of API calls in a time window, independent of cost. Useful for preventing runaway automated processes or controlling peak load.

- name: "api-rate-limit"
  type: rate_limit
  scope: user
  limit: 100           # calls
  period: per_hour
  action: block
  burst: 120           # Allow short bursts above limit

4. Session Policy Rules

Session policy rules govern session lifecycle. The most common use case is idle session termination — closing sessions that haven't had activity for a configurable period.

- name: "session-idle-timeout"
  type: session_policy
  idle_timeout_minutes: 30
  action: terminate
  notify: false

# Advanced: max session duration
- name: "max-session-duration"
  type: session_policy
  max_duration_minutes: 240  # 4 hours
  action: warn
  warn_message: "Session approaching maximum duration. Consider starting fresh."

Evaluation Order and Conflict Resolution

Rules are evaluated in priority order (lower number = higher priority, default 100). The evaluation stops at the first rule that takes a blocking action. For non-blocking actions (warn, notify), evaluation continues through all matching rules.

# Explicit priority
- name: "emergency-kill-switch"
  type: budget
  scope: org
  limit: 10000.00
  period: monthly
  action: block
  priority: 1          # Evaluated first

- name: "dev-monthly-budget"
  type: budget
  scope: user
  limit: 75.00
  period: monthly
  action: block
  priority: 100        # Default, evaluated after priority-1

Conflict resolution rules:

Block beats allow: If any rule blocks, the action is blocked regardless of other rules
More specific beats less specific: A user-scope rule overrides a team-scope rule for the same user
Allow rules can override deny rules: If you have both a deny and an allow rule matching the same signal, the higher-priority rule wins
All matching rules log: Even if a signal passes, all rules that matched (including passed rules) are written to the audit log

Budget Windowing: The Hard Part

Budget enforcement sounds simple but has subtle correctness requirements. The challenges:

Distributed signal sources: Multiple machines per developer, multiple adapters, signals arriving out of order. We use PostgresUPDATE ... RETURNING with optimistic locking for atomic accumulator updates.
Period boundaries:A signal arriving at 11:59:59 PM shouldn't consume the new period's budget. Period evaluation uses the signal timestamp, not the processing timestamp.
Retroactive signals: Signals from offline agents catch up when reconnected. These are processed in chronological order and accumulators are rebuilt if needed.

-- Budget accumulator update (simplified)
WITH lock AS (
  SELECT pg_advisory_xact_lock(
    hashtext(scope_key || period_key)
  )
),
current AS (
  SELECT accumulated_usd, limit_usd
  FROM budget_accumulators
  WHERE scope_key = $1
    AND period_key = $2
  FOR UPDATE
)
UPDATE budget_accumulators
SET accumulated_usd = accumulated_usd + $3,
    last_signal_ts = $4
WHERE scope_key = $1
  AND period_key = $2
RETURNING accumulated_usd, limit_usd,
  (accumulated_usd + $3) >= limit_usd AS over_limit;

Enforcement Modes

FORG supports two enforcement modes:

Async mode (default): The agent emits signals asynchronously. Rule evaluation happens after the LLM call completes. Zero latency impact. Enforcement actions (block/warn) apply to subsequent calls, not the current one.
Sync/Gateway mode: The agent queries the Rule Engine before each LLM call. If a blocking rule matches, the call is prevented. Adds ~3ms P50 to the critical path. Requires gateway adapter configuration.

Most teams start with async mode and move to gateway mode for hard budget limits after their rules are tuned. The transition is a config change in the adapter:

# ~/.forg/config.yaml
enforcement_mode: gateway   # or: async (default)
gateway_timeout_ms: 500     # fail-open if gateway unreachable

Testing Rules

Use the FORG CLI to test rules against synthetic signals before deploying:

# Test a signal against your current rules
forg rules test \
  --model claude-opus-4 \
  --user alice@company.com \
  --team backend \
  --cost 0.05

# Output:
# Rule evaluation for synthetic signal
# ✓ dev-monthly-75: PASS (accumulated: $62.40 / $75.00)
# ✗ global-model-policy: BLOCK (claude-opus not allowed)
#   → redirect to claude-sonnet-4-5
# Total: BLOCKED (1 blocking rule)

Rules can also be deployed in warn-only mode first — enforcement actions are logged but not executed — so you can observe behavior before committing to enforcement.

The full rule reference is in the documentation.