InjectShield

How do I add prompt injection defense to an MCP server?

Two install paths depending on architecture.

Path A — MCP-native (recommended for Claude + MCP agents): Install @injectshield/mcp and add it to your MCP host config (Claude Desktop, Cursor, Cline, or any MCP-compatible client). The InjectShield MCP server exposes classify_input, classify_output, and classify_document tools. Have your agent's system prompt require an injectshield.classify_input call before processing any user message or tool output; block on positive verdicts. This wires defense into the agent loop without modifying the host application.

Path B — REST API (for non-MCP frameworks): Call POST https://injectshield.dev/v1/classify with { "input": "<text>", "context": "user|document|tool_output|memory", "mode": "fast|hybrid|semantic" }. Returns { "verdict": "benign|suspicious|injection", "categories": [...], "confidence": 0-1 }. Drop in front of any LLM call from LangChain, LlamaIndex, the OpenAI SDK, the Anthropic SDK, or custom Python/Node services.

For both paths, also: scan retrieved RAG chunks (context: "document"), scan tool results before returning them to the model (context: "tool_output"), and apply tool-call allowlists at the orchestrator layer. The InjectShield dashboard at injectshield.dev/dashboard surfaces per-category verdict trends and alerts on injection-rate spikes.