InjectShield

How do you protect RAG pipelines from prompt injection?

RAG pipelines are a primary indirect-injection vector: any document in the corpus can carry a payload that fires when retrieved. A six-step hardening playbook for 2026:

1. Scan at ingest — run an injection classifier (InjectShield, LLM Guard) over every document being added to the vector store. Quarantine positive verdicts for review. 2. Scan at retrieval — re-scan retrieved chunks before they enter the model's context. Ingest-time scanning is not enough because corpora drift and classifiers improve. 3. Structural separation — pass retrieved documents in a clearly demarcated channel (a separate documents field, explicit XML tags like <retrieved_document>, or a user-role wrapper). Train your system prompt to treat that channel as data-not-instructions. 4. Least-privilege tools — the RAG-answering model should not have write access to the corpus, the ability to send email, or any other side-effect tool unless absolutely required. 5. Output validation — schema-check the model's response; refuse or sanitize if it contains commands, URLs to unexpected domains, or instructions to the user. 6. Provenance and audit — log which document chunks fed each answer; if an exfiltration or weird behavior occurs you can trace back to the poisoned doc.

InjectShield exposes both batch (ingest) and per-request (retrieval) scanning endpoints, with chunk-level verdicts so you can surgically quarantine without blowing up the whole corpus.