What is prompt injection via tool output (function-calling)?

Question

Accepted Answer

Tool-output prompt injection is indirect injection (OWASP LLM01) in which the payload arrives in the return value of a tool the model itself called. The model invokes a tool — `web_search`, `fetch_url`, `read_file`, `query_database`, `slack_search`, `github_get_issue` — and the tool's response contains attacker-authored text that, when fed back into the model's context, hijacks behavior. This is the dominant injection vector for agent stacks (LangChain agents, LangGraph, OpenAI Assistants with function calling, Claude with MCP, AutoGen) because most useful agents pull data from external systems by design. Real-world structure: an attacker posts a GitHub issue with embedded instructions; a coding agent calls `github_get_issue`; the response with payload enters context; the agent now follows attacker instructions. Same shape for poisoned web pages fetched by browsing tools, malicious rows in database queries, adversarial Slack messages searched by an agent, and prompt-injected error messages returned by intentionally-failing tools. This is OWASP LLM01 chained with LLM07 (Insecure Plugin/Tool Design) and frequently LLM02 (Insecure Output Handling) downstream. Defense: **Scan every tool response** before the model re-enters its loop — InjectShield's `context: "tool_output"` mode is built for this. **Structural separation** — wrap tool output in explicit "this is data from tool X" framing so the system prompt biases against treating it as instruction. **Tool-call allowlists** — restrict what tools the agent can call next; the previous tool's output cannot grant new capabilities. **Human-in-the-loop on irreversible actions** chained from tool output. **Provenance logs** that tie every model action back to the tool response that triggered it.