Question 1

What rules does the linter check?

Accepted Answer

Six heuristic rules, all running locally in your browser: contradictory absolutes (an 'always X' and a 'never X' targeting the same verb), vague verbs that give the model no testable instruction (handle, ensure, deal with, be smart about), a missing output-format section, token bloat (prompts over roughly 2,000 tokens get penalised), duplicate lines that waste context, and leftover TODO/FIXME/placeholder text that was never meant to ship.

Question 2

What does the grade actually mean?

Accepted Answer

It is a weighted score, not a judgment of your product. An A means no findings worth acting on; B-C means a handful of fixable issues like a vague verb or one duplicate; D-F means structural problems — contradictions or shipped placeholders — that measurably degrade model behaviour. Contradictions weigh the most because the model resolves them unpredictably, which shows up as inconsistent output between runs.

Question 3

Is there a sweet spot for system prompt length?

Accepted Answer

For most production prompts, 200-800 tokens covers role, constraints and output format with room to spare. Beyond about 2,000 tokens, instruction-following degrades: the model attends less to mid-prompt rules and you pay the full token cost on every single call. If you genuinely need more, structure it with headings and put the non-negotiable rules at the start and end, where attention is strongest.

Question 4

How should I test a system prompt beyond linting?

Accepted Answer

Linting catches structural defects; behaviour needs evals. Build a small set of representative inputs — including adversarial ones — and check outputs against expectations every time the prompt changes, exactly like a regression test suite. Even ten well-chosen cases catch most prompt regressions. Pin the model version while iterating, since provider model updates change behaviour underneath an unchanged prompt.

Question 5

Does my prompt leave the browser?

Accepted Answer

No. All six rules are regex and heuristic checks implemented in client-side JavaScript — there is no network call, no logging and no storage. You can verify this in your browser's network tab while linting. This matters because production system prompts often contain proprietary product logic that should not transit a third-party server just to get a quality check.

System Prompt Linter

How it works

Frequently asked questions

What rules does the linter check?

What does the grade actually mean?

Is there a sweet spot for system prompt length?

How should I test a system prompt beyond linting?

Does my prompt leave the browser?

Related tools

Prompt Compressor

CLAUDE.md Generator

Temperature & Top-p Playground

Token Counter