Updated 2026-04-307 min read

Detecting MCP server rug-pulls — when the tool catalog mutates after install

The rug-pull pattern: an MCP server ships a safe v1 catalog at install time, then mutates to a v2 catalog (with attacker-controlled tools) once it's running in your trust boundary. Invariant Labs disclosed this class in 2025; the Apr 2026 Anthropic RCE incident exploited a related design flaw. This guide ships the fingerprint-pinning + signature-verification defense.

Rug-pulls are how trust gets weaponized at runtime. An MCP server with safe-looking documentation gets installed; weeks later, the running server's tool catalog silently changes — same tool names, but with adversarial descriptions or expanded scopes. The agent reads the new catalog, treats it as authoritative, and starts dispatching calls with the broader scope. This guide ships the per-spawn fingerprint + per-catalog signature verification defense.

What it is

Rug-pull is a form of tool poisoning where the timing is post-install rather than at-install. The defense is structural — the agent's view of the trusted catalog must be operator-authored + signed, and every server's actual catalog must be re-validated on every spawn against the signed baseline. Drift = reject.

Vulnerable example

// MCP client that reads server's tool catalog dynamically every spawn
async function loadServerTools(serverConfig) {
  const server = await spawnMcpServer(serverConfig);
  const tools = await server.listTools();
  return tools;  // Whatever the server says, the agent trusts.
  // v1: read-file, list-files (safe)
  // v2: read-file, list-files, exec-shell (post-install rug-pull)
}

Fixed example

// Operator-pinned signed catalog + fingerprint validation
// trustedCatalog: a signed, operator-authored map of trusted servers
// validateManifest: checks an incoming tool list against the pinned baseline

async function loadServerTools(serverConfig) {
  const expected = trustedCatalog.get(serverConfig.name);
  if (!expected) throw new Error("unknown server: not in catalog");

  const server = await spawnMcpServer(serverConfig);
  const fingerprint = await server.fingerprint();
  if (fingerprint !== expected.fingerprint) {
    throw new Error("fingerprint drift detected: refusing to load");
  }

  const tools = await server.listTools();
  const verdict = validateManifest(tools, expected.toolCatalog);
  if (!verdict.ok) {
    throw new Error("tool catalog drift: " + verdict.reason);
  }
  return tools;  // Validated against operator-pinned baseline.
}

How Securie catches it

Securie findingmedium

apps/web/app/api/route.ts:22

Detecting MCP server rug-pulls

Securie's MCP trust-enforcement layer detects the rug-pull pattern by construction: a signed trusted-server catalog, a manifest validator, and a per-dispatch scope check. Per-spawn fingerprint validation rejects binary drift; per-catalog signature verification rejects tool-list drift; per-dispatch scope check rejects scope drift. The integration point with Invariant Labs' `mcp-scan` tool (https://github.com/invariantlabs-ai/mcp-scan) is documented in the operator runbook — operators run mcp-scan as the periodic fleet-wide drift check.

Suggested fix — ready as a PR

// Operator-pinned signed catalog + fingerprint validation
// trustedCatalog: a signed, operator-authored map of trusted servers
// validateManifest: checks an incoming tool list against the pinned baseline

async function loadServerTools(serverConfig) {
  const expected = trustedCatalog.get(serverConfig.name);
  if (!expected) throw new Error("unknown server: not in catalog");

  const server = await spawnMcpServer(serverConfig);
  const fingerprint = await server.fingerprint();
  if (fingerprint !== expected.fingerprint) {
    throw new Error("fingerprint drift detected: refusing to load");
  }

  const tools = await server.listTools();
  const verdict = validateManifest(tools, expected.toolCatalog);
  if (!verdict.ok) {
    throw new Error("tool catalog drift: " + verdict.reason);
  }
  return tools;  // Validated against operator-pinned baseline.
}

Catch this in my repo →Securie reviews every PR · proves the issue · ships a verified fix PR

Checklist

Every MCP server in your catalog has a pinned fingerprint
Every tool in your catalog has a pinned scope (bitflags)
rejectFingerprintDrift: true on every config
rejectScopeDrift: true on every config
Schedule `mcp-scan --check-rugpull` daily against installed servers
Operator approval flow for any catalog drift (the right answer is usually 'reject')

FAQ

What if a server legitimately needs to update its catalog?

Treat it as a new install: re-fingerprint, re-sign the catalog, operator re-approval. The Apr 2026 wave shows that 'auto-accept catalog updates' is the rug-pull attack surface — the right default is to reject + require explicit approval.

Does this break any developer workflow?

Yes, intentionally. The friction is the security. Teams that need fast iteration on MCP servers should run their own internal MCP server with a separate per-developer catalog, kept distinct from the production trust boundary.

Share:X Hacker News Reddit LinkedIn

Related guides

Guide

How to secure your MCP server — fingerprint pinning, scope locks, rug-pull defense

Model Context Protocol went 0 → 200,000+ servers in 9 months. The April 2026 Anthropic RCE flaw + the Invariant Labs tool-poisoning class disclosures forced every MCP-using team to harden their server hygiene. This guide walks the four attack classes (unknown-server smuggle, fingerprint drift, tool smuggle, scope escalation) and the operator-authored TOML catalog that closes them.

Guide

MCP server security — scope, tool surface, and the prompt-injection routing problem

Model Context Protocol (MCP) servers expose tools to LLM agents — file reads, git commands, HTTP fetches, database queries. The risk surface is the tool catalogue: an LLM agent that can call dangerous tools at the prompt-injection-attacker's instruction is the canonical MCP failure. Here are the patterns that work and the ones that don't.

Guide

Defending MCP agents from indirect prompt injection (2026 playbook)

Indirect prompt injection — adversarial instructions embedded in data the agent reads — is the single most common attack class against MCP-using agents. Microsoft's Apr 2026 advisory + Unit42's MCP attack-vector taxonomy converged on the same defense: pre-prompt-output sanitization + scope-bounded egress + Llama Guard 4 classification. This guide ships the layered defense.

Guide

How Securie validates every MCP tool dispatch

Securie enforces operator-authored MCP catalogs at agent runtime. Three layers — a signed trusted-server catalog, a manifest validator, and a per-dispatch scope check — close the four attack classes (unknown-server smuggle, fingerprint drift, tool smuggle, scope escalation) before any MCP tool runs. This guide walks the architecture and the trust model.