How do I add prompt injection defense to an MCP server?

Question

Accepted Answer

Two install paths depending on architecture. **Path A — MCP-native (recommended for Claude + MCP agents):** Install `@injectshield/mcp` and add it to your MCP host config (Claude Desktop, Cursor, Cline, or any MCP-compatible client). The InjectShield MCP server exposes `classify_input`, `classify_output`, and `classify_document` tools. Have your agent's system prompt require an `injectshield.classify_input` call before processing any user message or tool output; block on positive verdicts. This wires defense into the agent loop without modifying the host application. **Path B — REST API (for non-MCP frameworks):** Call `POST https://injectshield.dev/v1/classify` with `{ "input": "<text>", "context": "user|document|tool_output|memory", "mode": "fast|hybrid|semantic" }`. Returns `{ "verdict": "benign|suspicious|injection", "categories": [...], "confidence": 0-1 }`. Drop in front of any LLM call from LangChain, LlamaIndex, the OpenAI SDK, the Anthropic SDK, or custom Python/Node services. For both paths, also: scan retrieved RAG chunks (`context: "document"`), scan tool results before returning them to the model (`context: "tool_output"`), and apply tool-call allowlists at the orchestrator layer. The InjectShield dashboard at injectshield.dev/dashboard surfaces per-category verdict trends and alerts on injection-rate spikes.