How do I prevent MCP prompt injection?
Three layers: pre-prompt-output sanitization, scope-bounded egress (only operator-allowed hosts), and Llama Guard 4 classification on every tool response. Microsoft's Apr 2026 advisory + Unit42's MCP attack-vector taxonomy converged on this defense.
Indirect prompt injection happens at the boundary between tool output and model context. An MCP tool returning raw HTML/text from any URL is the canonical attack vector — embedded instructions get parsed by the model.
The defense ships in three layers: (1) sanitize tool outputs before the model sees them (strip invisible-Unicode + instruction-shaped phrases), (2) bound the agent's egress to operator-allowed hosts (no arbitrary URL fetch), (3) classify every tool output via Llama Guard 4 — block any output flagged as adversarial.
Securie's llm-safety crate bundles SafetyFilter + InferenceProxy + LlamaGuard4Classifier. Production-tier boot refuses to start without LLAMA_GUARD_URL set.