Question 1

What patterns does the scanner detect?

Accepted Answer

Six heuristic families: instruction-override phrases ('ignore previous instructions' and its many paraphrases), role-switch attempts ('you are now', 'act as', 'pretend you are'), encoded payloads such as long base64-looking spans that hide instructions from human reviewers, markdown-link exfiltration patterns where data is smuggled into image or link URLs, invisible unicode like zero-width spaces and direction overrides that hide text from humans but not from models, and system-prompt extraction asks ('repeat your instructions verbatim').

Question 2

Why scan tool outputs, not just user input?

Accepted Answer

Because indirect injection is the dangerous case. Direct injection — a user typing 'ignore your instructions' — is visible and bounded by what that user could do anyway. Indirect injection arrives through content the agent fetches: a web page, a README in a dependency, an issue comment, an email. The agent treats that text as data, but the model reads it as potential instructions. Anything that flows from an untrusted source into your context window deserves scanning before it gets there.

Question 3

Will this catch every injection?

Accepted Answer

No, and any tool claiming otherwise is misleading you. These are heuristic pattern matchers: they catch the known, common attack phrasings and encodings cheaply and instantly, which is genuinely useful as a first filter. A motivated attacker can paraphrase around any fixed pattern list. Treat the scanner as one layer — combine it with capability restrictions via a permission matrix, least-privilege credentials, and human review gates on consequential actions, so a missed injection has bounded blast radius.

Question 4

What should I do when something is flagged?

Accepted Answer

Each finding ships with a specific mitigation, but the general playbook is: for templates you control, simply remove or rewrite the flagged text. For external content, do not pass it to the model verbatim — strip invisible unicode, neutralize markdown links, and wrap untrusted text in clearly delimited blocks with an instruction that the content is data, not commands. For repeated attack patterns from one source, stop ingesting that source rather than playing whack-a-mole.

Question 5

Is my pasted content uploaded anywhere?

Accepted Answer

No. Every check is a regex or character-class scan running in client-side JavaScript — there is no network request, no logging and no storage, which you can verify in your browser's network tab. This matters more here than for most tools: the things you scan are often suspected-malicious content or proprietary prompt templates, and neither should transit a third-party server as the price of a security check.

Prompt Injection Scanner

How it works

Frequently asked questions

What patterns does the scanner detect?

Why scan tool outputs, not just user input?

Will this catch every injection?

What should I do when something is flagged?

Is my pasted content uploaded anywhere?

Related tools

Agent Permission Matrix

Secret Leak Scanner

System Prompt Linter

Structured Output Validator