How do I prevent MCP prompt injection?

Updated
Short answer

Three layers: pre-prompt-output sanitization, scope-bounded egress (only operator-allowed hosts), and Llama Guard 4 classification on every tool response. Microsoft's Apr 2026 advisory + Unit42's MCP attack-vector taxonomy converged on this defense.

Indirect prompt injection happens at the boundary between tool output and model context. An MCP tool returning raw HTML/text from any URL is the canonical attack vector — embedded instructions get parsed by the model.

The defense ships in three layers: (1) sanitize tool outputs before the model sees them (strip invisible-Unicode + instruction-shaped phrases), (2) bound the agent's egress to operator-allowed hosts (no arbitrary URL fetch), (3) classify every tool output via Llama Guard 4 — block any output flagged as adversarial.

Securie's llm-safety crate bundles SafetyFilter + InferenceProxy + LlamaGuard4Classifier. Production-tier boot refuses to start without LLAMA_GUARD_URL set.

People also ask