Detecting MCP server rug-pulls — when the tool catalog mutates after install
The rug-pull pattern: an MCP server ships a safe v1 catalog at install time, then mutates to a v2 catalog (with attacker-controlled tools) once it's running in your trust boundary. Invariant Labs disclosed this class in 2025; the Apr 2026 Anthropic RCE incident exploited a related design flaw. This guide ships the fingerprint-pinning + signature-verification defense.
Rug-pulls are how trust gets weaponized at runtime. An MCP server with safe-looking documentation gets installed; weeks later, the running server's tool catalog silently changes — same tool names, but with adversarial descriptions or expanded scopes. The agent reads the new catalog, treats it as authoritative, and starts dispatching calls with the broader scope. This guide ships the per-spawn fingerprint + per-catalog signature verification defense.
What it is
Rug-pull is a form of tool poisoning where the timing is post-install rather than at-install. The defense is structural — the agent's view of the trusted catalog must be operator-authored + signed, and every server's actual catalog must be re-validated on every spawn against the signed baseline. Drift = reject.
Vulnerable example
// MCP client that reads server's tool catalog dynamically every spawn
async function loadServerTools(serverConfig) {
const server = await spawnMcpServer(serverConfig);
const tools = await server.listTools();
return tools; // Whatever the server says, the agent trusts.
// v1: read-file, list-files (safe)
// v2: read-file, list-files, exec-shell (post-install rug-pull)
}Fixed example
// Operator-pinned signed catalog + fingerprint validation
import { TrustedCatalog, Validator } from "@securie/mcp-guard";
async function loadServerTools(serverConfig) {
const expected = TrustedCatalog.get(serverConfig.name);
if (!expected) throw new Error("unknown server: not in catalog");
const server = await spawnMcpServer(serverConfig);
const fingerprint = await server.fingerprint();
if (fingerprint !== expected.fingerprint) {
throw new Error("fingerprint drift detected: refusing to load");
}
const tools = await server.listTools();
const verdict = Validator.check(tools, expected.toolCatalog);
if (!verdict.ok) {
throw new Error("tool catalog drift: " + verdict.reason);
}
return tools; // Validated against operator-pinned baseline.
}How Securie catches it
apps/web/app/api/route.ts:22Detecting MCP server rug-pulls
Securie's mcp-guard crate's TrustedCatalog + Validator + ScopeGuard layers detect the rug-pull pattern by construction. Per-spawn fingerprint validation rejects binary drift; per-catalog signature verification rejects tool-list drift; per-dispatch scope check rejects scope drift. The integration point with Invariant Labs' `mcp-scan` tool (https://github.com/invariantlabs-ai/mcp-scan) is documented in the crate's `//!` block — operators run mcp-scan as the periodic fleet-wide drift check.
// Operator-pinned signed catalog + fingerprint validation
import { TrustedCatalog, Validator } from "@securie/mcp-guard";
async function loadServerTools(serverConfig) {
const expected = TrustedCatalog.get(serverConfig.name);
if (!expected) throw new Error("unknown server: not in catalog");
const server = await spawnMcpServer(serverConfig);
const fingerprint = await server.fingerprint();
if (fingerprint !== expected.fingerprint) {
throw new Error("fingerprint drift detected: refusing to load");
}
const tools = await server.listTools();
const verdict = Validator.check(tools, expected.toolCatalog);
if (!verdict.ok) {
throw new Error("tool catalog drift: " + verdict.reason);
}
return tools; // Validated against operator-pinned baseline.
}Checklist
- Every MCP server in your catalog has a pinned fingerprint
- Every tool in your catalog has a pinned scope (bitflags)
- rejectFingerprintDrift: true on every config
- rejectScopeDrift: true on every config
- Schedule `mcp-scan --check-rugpull` daily against installed servers
- Operator approval flow for any catalog drift (the right answer is usually 'reject')
FAQ
What if a server legitimately needs to update its catalog?
Treat it as a new install: re-fingerprint, re-sign the catalog, operator re-approval. The Apr 2026 wave shows that 'auto-accept catalog updates' is the rug-pull attack surface — the right default is to reject + require explicit approval.
Does this break any developer workflow?
Yes, intentionally. The friction is the security. Teams that need fast iteration on MCP servers should run their own internal MCP server with a separate per-developer catalog, kept distinct from the production trust boundary.
Related guides
Model Context Protocol went 0 → 200,000+ servers in 9 months. The April 2026 Anthropic RCE flaw + the Invariant Labs tool-poisoning class disclosures forced every MCP-using team to harden their server hygiene. This guide walks the four attack classes (unknown-server smuggle, fingerprint drift, tool smuggle, scope escalation) and the operator-authored TOML catalog that closes them.
Model Context Protocol (MCP) servers expose tools to LLM agents — file reads, git commands, HTTP fetches, database queries. The risk surface is the tool catalogue: an LLM agent that can call dangerous tools at the prompt-injection-attacker's instruction is the canonical MCP failure. Here are the patterns that work and the ones that don't.
Indirect prompt injection — adversarial instructions embedded in data the agent reads — is the single most common attack class against MCP-using agents. Microsoft's Apr 2026 advisory + Unit42's MCP attack-vector taxonomy converged on the same defense: pre-prompt-output sanitization + scope-bounded egress + Llama Guard 4 classification. This guide ships the layered defense.
mcp-guard is the Securie crate that enforces operator-authored MCP catalogs at agent runtime. Three layers — TrustedCatalog + Validator + ScopeGuard — close the four attack classes (unknown-server smuggle, fingerprint drift, tool smuggle, scope escalation) at the boundary between inference-router and MCP client. This guide walks the architecture + how to wire it into your agent runtime.