How do you protect RAG pipelines from prompt injection?
RAG pipelines are a primary indirect-injection vector: any document in the corpus can carry a payload that fires when retrieved. A six-step hardening playbook for 2026:
1. Scan at ingest — run an injection classifier (InjectShield, LLM Guard) over every document being added to the vector store. Quarantine positive verdicts for review.
2. Scan at retrieval — re-scan retrieved chunks before they enter the model's context. Ingest-time scanning is not enough because corpora drift and classifiers improve.
3. Structural separation — pass retrieved documents in a clearly demarcated channel (a separate documents field, explicit XML tags like <retrieved_document>, or a user-role wrapper). Train your system prompt to treat that channel as data-not-instructions.
4. Least-privilege tools — the RAG-answering model should not have write access to the corpus, the ability to send email, or any other side-effect tool unless absolutely required.
5. Output validation — schema-check the model's response; refuse or sanitize if it contains commands, URLs to unexpected domains, or instructions to the user.
6. Provenance and audit — log which document chunks fed each answer; if an exfiltration or weird behavior occurs you can trace back to the poisoned doc.
InjectShield exposes both batch (ingest) and per-request (retrieval) scanning endpoints, with chunk-level verdicts so you can surgically quarantine without blowing up the whole corpus.