How do you protect RAG pipelines from prompt injection?

Question

Accepted Answer

RAG pipelines are a primary indirect-injection vector: any document in the corpus can carry a payload that fires when retrieved. A six-step hardening playbook for 2026: 1. **Scan at ingest** — run an injection classifier (InjectShield, LLM Guard) over every document being added to the vector store. Quarantine positive verdicts for review. 2. **Scan at retrieval** — re-scan retrieved chunks before they enter the model's context. Ingest-time scanning is not enough because corpora drift and classifiers improve. 3. **Structural separation** — pass retrieved documents in a clearly demarcated channel (a separate `documents` field, explicit XML tags like `<retrieved_document>`, or a user-role wrapper). Train your system prompt to treat that channel as data-not-instructions. 4. **Least-privilege tools** — the RAG-answering model should not have write access to the corpus, the ability to send email, or any other side-effect tool unless absolutely required. 5. **Output validation** — schema-check the model's response; refuse or sanitize if it contains commands, URLs to unexpected domains, or instructions to the user. 6. **Provenance and audit** — log which document chunks fed each answer; if an exfiltration or weird behavior occurs you can trace back to the poisoned doc. InjectShield exposes both batch (ingest) and per-request (retrieval) scanning endpoints, with chunk-level verdicts so you can surgically quarantine without blowing up the whole corpus.