Can a regex or keyword filter stop prompt injection?
Short answer: no. Regex and keyword filters catch the most naive attacks ("ignore previous instructions," "you are now DAN," "system:") and that's worth doing — InjectShield's open-source heuristic layer includes hundreds of such patterns and runs in ~1 ms. But regex alone is insufficient for three reasons.
Semantic equivalence. "Disregard everything above" / "set aside the prior guidance" / a paragraph of polite English ending in a behavior change all bypass keyword filters. LLMs are semantic systems; defense needs semantic understanding too.
Encoding bypasses. Base64, ROT13, leet-speak, unicode tag characters, zero-width joiners, language switching, and prompt translation all defeat literal pattern matching while remaining executable instructions to the model.
Indirect and multi-turn. A payload distributed across five turns, or split between a user message and a retrieved document, will never match a single regex.
The 2026 best practice is a hybrid stack: cheap, transparent heuristics as a first pass (fast, free, auditable), with a semantic classifier (small LLM like Haiku, Llama Guard, or a fine-tuned encoder) for the cases regex can't see. InjectShield runs heuristics first and only escalates ambiguous traffic to the Haiku semantic layer — this keeps cost near-zero on benign traffic while catching the long tail.