Skip to main content

AGENTS.md Linter

Grade your existing CLAUDE.md or AGENTS.md A-F: contradictions, bloat and dead rules.

100% client-side⛁ data verified 2026-06-11⌁ zero network calls
never leaves your browser

103 tokens (chars ÷ 4) · 15 lines · loaded every session

F

10 findings5 high-severity. Add the missing commands and resolve contradictions first; they cost whole sessions.

  • No test command · highNo test command found — the agent will guess your toolchain or skip verification entirely. Add a copy-pasteable command.
  • No build command · highNo build or type-check command found — the agent cannot verify its work compiles before claiming completion.
  • Vague directive · mediumL4: "be careful" is not actionable — rewrite as a concrete, checkable rule the agent can follow.
  • Vague directive · mediumL8: "best practices" is not actionable — rewrite as a concrete, checkable rule the agent can follow.
  • Vague directive · mediumL10: "be thoughtful" is not actionable — rewrite as a concrete, checkable rule the agent can follow.
  • Vague directive · mediumL15: "appropriately" is not actionable — rewrite as a concrete, checkable rule the agent can follow.
  • Contradiction · highL9: Line 7 says "always" and line 9 says "never" about "commit" — the agent resolves this unpredictably.
  • Contradiction · highL9: Line 11 says "always" and line 9 says "never" about "commit" — the agent resolves this unpredictably.
  • Duplicate rule · lowL11: Identical to line 7 — duplicated rules waste the per-session token budget and signal copy-paste drift.
  • Stale TODO · highL14: Leftover TODO/placeholder — the agent reads this every session as an unfulfilled instruction.
18
models in the dataset
2026-06-11
reference data verified
100%
logic runs in your browser
0
network requests per keystroke

How it works

Paste your CLAUDE.md or AGENTS.md and get an A-to-F grade plus line-referenced findings. The linter runs seven heuristics built specifically for agent config files — not generic prompt rules — entirely in your browser. A deliberately flawed example is prefilled so you can watch every rule fire before pasting your own file.

The rules target what actually breaks agent sessions. Missing test and build commands top the list: a config without them forces the agent to guess your toolchain, and a wrong guess wastes a whole session. Vague directives are next — "be careful", "use best practices" and friends give the model nothing checkable, so the same instruction produces different behaviour every run. Contradictions are the most corrosive defect: when one rule says "always commit after each change" and another says "never commit without asking", the agent resolves the conflict unpredictably, and you experience it as flaky, unreproducible behaviour.

The bloat check estimates tokens at one per four characters and flags files past 4,000 tokens. That threshold reflects how these files are consumed: the entire config is injected into context at session start, every session, so a bloated file is a recurring tax — and long-context research consistently shows mid-file rules receiving less attention than rules near the start and end. Duplicate-rule detection catches the residue of months of incremental edits, the missing forbidden-patterns check flags configs that never tell the agent what NOT to do, and the placeholder rule finds the TODO fragments that quietly shipped.

Treat the grade as a structural pre-flight, not a verdict on your engineering culture. A clean config can still encode bad advice, and only watching real agent sessions tells you whether your rules land. But the inverse is reliable: a config with contradictions, no commands and shipped TODOs will degrade every session it touches. Lint it, fix the mechanical defects, then iterate on substance with the token budget you just reclaimed.

Frequently asked questions

What rules does the linter check?

Seven heuristics tuned specifically to agent config files: missing test and build commands (the single most impactful omission), vague directives like 'be careful' that an agent cannot act on, contradictory always/never pairs, token bloat past roughly 4,000 tokens estimated at one token per four characters, duplicate rules from copy-paste drift, a missing forbidden-patterns section, and stale TODO or placeholder fragments that were never meant to ship.

Why do missing test and build commands matter so much?

An agent that doesn't know your test command either guesses one — running npm test in a pnpm repo, for example — or skips verification entirely and declares unverified work complete. Explicit, copy-pasteable commands are the highest-leverage lines in any agent config because they turn 'I think this works' into 'the suite passed'. Every config should have at minimum a test command and a build or type-check command.

How big should a CLAUDE.md or AGENTS.md be?

The whole file is loaded into context at the start of every session, so you pay its token cost on every single agent run. Most effective configs land between 500 and 2,000 tokens. Past about 4,000 tokens, instruction-following measurably degrades — agents attend less to mid-file rules — and you are paying for the privilege. If yours is bigger, move reference material into docs the agent can read on demand.

What counts as a vague directive?

Anything a reasonable agent cannot turn into a checkable behaviour: 'be careful with the database', 'use best practices', 'write clean code', 'be thoughtful'. These read fine to humans but give the model nothing testable, so behaviour varies run to run. Rewrite them as concrete rules: 'never run destructive SQL without a WHERE clause', 'all exported functions need explicit return types'. The linter flags the common vague phrases so you can find and tighten them.

Does my config file leave the browser?

No. Every rule is a regex or heuristic check implemented in client-side JavaScript — there is no network request, no logging and no storage of any kind. You can confirm this in your browser's network tab while linting. Config files often encode internal architecture details and operational gotchas, so they should not transit a third-party server just to get a quality check.

Turn this analysis into a live rule with the FORG rule engine — route models and enforce limits automatically.

Explore the rule engine