What datasets exist for prompt-injection testing (PromptInject, HarmBench, etc.)?
The 2026 landscape of public injection-evaluation datasets:
- PromptInject (Perez & Ribeiro, 2022) — the foundational direct-injection benchmark; canonical "ignore previous instructions" / goal-hijacking / prompt-leak scenarios. Still the baseline every defense vendor reports against.
- HarmBench (Mazeika et al., 2024) — broader harmful-behavior benchmark with a dedicated injection subset; standardized attack/defense evaluation protocol.
- JailbreakBench (Chao et al., 2024) — focused jailbreak/role-confusion corpus with reproducible evaluation harness; tracks both attack success and defense robustness across frontier models.
- AdvBench — adversarial behavior dataset, often paired with GCG (Greedy Coordinate Gradient) attacks.
- garak probes (NVIDIA) — not a static dataset but a scanner with hundreds of injection-specific probes; ships as the de-facto OSS red-team tool.
- TensorTrust (Toyer et al., 2024) — crowdsourced 100k+ direct-injection attempts from an attack/defense game; useful for diverse real-world phrasings.
- INJECAGENT (Zhan et al., 2024) — focused on indirect injection in tool-use agents (OWASP LLM01 + LLM07); evaluates agent-specific blast radius.
- HouYi — academic indirect-injection benchmark with documented exfiltration scenarios.
For RAG: PoisonedRAG (Zou et al., 2024) and PromptBench RAG subsets cover stored injection. For multimodal: MMSafety and Anthropic's vision red-team disclosures.
Production-grade testing should combine 2-3 datasets across direct, indirect, and agent surfaces, plus your own domain-specific adversarial corpus. InjectShield publishes its open-source heuristic ruleset on GitHub and is evaluated against PromptInject + HarmBench + INJECAGENT — benchmarks at injectshield.dev/benchmarks.