Rules Engine Deep Dive: Budget Enforcement at Scale
A technical walkthrough of FORG's rules engine: the four rule types, evaluation order, conflict resolution, budget windowing, and how we achieve sub-millisecond enforcement overhead at 10,000+ signals/second.
Architecture Overview
The FORG Rules Engine is a Cloudflare Worker deployed atforg.pro/engine/*. It's stateless by design: every signal evaluation is self-contained. State (budget accumulators, rule versions) lives in Supabase, accessed via connection pooling through Cloudflare Workers.
The evaluation path for a single signal:
- Signal arrives at the Worker over HTTPS (mutual TLS)
- HMAC signature verification (session key derived from license)
- Signal parsed and normalized to internal schema (v3)
- Rule set fetched (in-memory cache, 5s TTL)
- Rules evaluated in priority order
- Budget accumulators updated atomically (Supabase RPC)
- Enforcement actions executed (block/warn/notify)
- Signal written to signal store
- Audit log entry written
Total P50 latency: 3.2ms. P99: 11ms. This is the overhead added to your AI toolchain when using synchronous enforcement mode. In async mode (default), the signal is queued and the adapter gets an immediate 200 response — zero latency impact.
The Four Rule Types
1. Budget Rules
Budget rules enforce spending limits over a time window. They're the most commonly used rule type and the most complex to implement correctly because budget windows must be maintained accurately across distributed signal sources.
# Budget rule schema
rules:
- name: string # Unique identifier
type: budget
scope: user | team | org # What entity the limit applies to
limit: number # USD limit
period: daily | weekly | monthly | rolling_30d
warn_at: number # Percentage to send warning (0-100)
action: notify | block # What happens at limit
notify: boolean | slack | email # Notification channelBudget windows are evaluated from the start of the current period. For period: monthly, the window is the calendar month. For period: rolling_30d, it's the last 30 days from now. Budget accumulators are updated atomically using Postgres advisory locks on the scope+period combination to prevent race conditions.
# Example: per-developer monthly budget with Slack notify
- name: "dev-monthly-75"
type: budget
scope: user
limit: 75.00
period: monthly
warn_at: 80
action: block
notify: slack
# Example: team-level daily cap with warning only
- name: "backend-daily-500"
type: budget
scope: team
match:
team: "backend"
limit: 500.00
period: daily
warn_at: 70
action: notify
notify: true2. Model Policy Rules
Model policy rules control which models can be used in which contexts. They're stateless — no accumulators needed — so evaluation is O(1) and adds negligible overhead.
# Model policy rule schema
rules:
- name: string
type: model_policy
scope: global | user | team | environment
match: # Optional: conditions
team: string
environment: string
user: string
allow_models: [string] # Allowlist (OR)
deny_models: [string] # Denylist (takes precedence)
redirect_to: string # Optional: redirect to this model
action: block | redirect | warn# Example: restrict expensive models globally,
# allow for specific environments
- name: "global-model-policy"
type: model_policy
scope: global
deny_models:
- "claude-opus*"
- "gpt-4o"
- "gemini-ultra*"
action: redirect
redirect_to: "claude-sonnet-4-5"
- name: "arch-review-allow-opus"
type: model_policy
scope: environment
match:
environment: "architecture-review"
allow_models:
- "claude-opus*"
action: allow3. Rate Limit Rules
Rate limit rules cap the number of API calls in a time window, independent of cost. Useful for preventing runaway automated processes or controlling peak load.
- name: "api-rate-limit"
type: rate_limit
scope: user
limit: 100 # calls
period: per_hour
action: block
burst: 120 # Allow short bursts above limit4. Session Policy Rules
Session policy rules govern session lifecycle. The most common use case is idle session termination — closing sessions that haven't had activity for a configurable period.
- name: "session-idle-timeout"
type: session_policy
idle_timeout_minutes: 30
action: terminate
notify: false
# Advanced: max session duration
- name: "max-session-duration"
type: session_policy
max_duration_minutes: 240 # 4 hours
action: warn
warn_message: "Session approaching maximum duration. Consider starting fresh."
Evaluation Order and Conflict Resolution
Rules are evaluated in priority order (lower number = higher priority, default 100). The evaluation stops at the first rule that takes a blocking action. For non-blocking actions (warn, notify), evaluation continues through all matching rules.
# Explicit priority
- name: "emergency-kill-switch"
type: budget
scope: org
limit: 10000.00
period: monthly
action: block
priority: 1 # Evaluated first
- name: "dev-monthly-budget"
type: budget
scope: user
limit: 75.00
period: monthly
action: block
priority: 100 # Default, evaluated after priority-1Conflict resolution rules:
- Block beats allow: If any rule blocks, the action is blocked regardless of other rules
- More specific beats less specific: A user-scope rule overrides a team-scope rule for the same user
- Allow rules can override deny rules: If you have both a deny and an allow rule matching the same signal, the higher-priority rule wins
- All matching rules log: Even if a signal passes, all rules that matched (including passed rules) are written to the audit log
Budget Windowing: The Hard Part
Budget enforcement sounds simple but has subtle correctness requirements. The challenges:
- Distributed signal sources: Multiple machines per developer, multiple adapters, signals arriving out of order. We use Postgres
UPDATE ... RETURNINGwith optimistic locking for atomic accumulator updates. - Period boundaries:A signal arriving at 11:59:59 PM shouldn't consume the new period's budget. Period evaluation uses the signal timestamp, not the processing timestamp.
- Retroactive signals: Signals from offline agents catch up when reconnected. These are processed in chronological order and accumulators are rebuilt if needed.
-- Budget accumulator update (simplified)
WITH lock AS (
SELECT pg_advisory_xact_lock(
hashtext(scope_key || period_key)
)
),
current AS (
SELECT accumulated_usd, limit_usd
FROM budget_accumulators
WHERE scope_key = $1
AND period_key = $2
FOR UPDATE
)
UPDATE budget_accumulators
SET accumulated_usd = accumulated_usd + $3,
last_signal_ts = $4
WHERE scope_key = $1
AND period_key = $2
RETURNING accumulated_usd, limit_usd,
(accumulated_usd + $3) >= limit_usd AS over_limit;Enforcement Modes
FORG supports two enforcement modes:
- Async mode (default): The agent emits signals asynchronously. Rule evaluation happens after the LLM call completes. Zero latency impact. Enforcement actions (block/warn) apply to subsequent calls, not the current one.
- Sync/Gateway mode: The agent queries the Rule Engine before each LLM call. If a blocking rule matches, the call is prevented. Adds ~3ms P50 to the critical path. Requires gateway adapter configuration.
Most teams start with async mode and move to gateway mode for hard budget limits after their rules are tuned. The transition is a config change in the adapter:
# ~/.forg/config.yaml
enforcement_mode: gateway # or: async (default)
gateway_timeout_ms: 500 # fail-open if gateway unreachableTesting Rules
Use the FORG CLI to test rules against synthetic signals before deploying:
# Test a signal against your current rules
forg rules test \
--model claude-opus-4 \
--user alice@company.com \
--team backend \
--cost 0.05
# Output:
# Rule evaluation for synthetic signal
# ✓ dev-monthly-75: PASS (accumulated: $62.40 / $75.00)
# ✗ global-model-policy: BLOCK (claude-opus not allowed)
# → redirect to claude-sonnet-4-5
# Total: BLOCKED (1 blocking rule)Rules can also be deployed in warn-only mode first — enforcement actions are logged but not executed — so you can observe behavior before committing to enforcement.
The full rule reference is in the documentation.