A complete visual guide to understanding, executing, and defending against the five most dangerous classes of large language model attacks. Built for penetration testers and security engineers.
LLMs process language as their primary input — which means language is the attack surface. Unlike traditional software where input is data, in LLMs, malicious instructions and benign context look identical to the model. Here's the intuition behind each attack class.
Visual walkthrough of each attack class with conversation flow, injection points, and model response patterns.
Detailed breakdown of each attack class with concrete payload examples, variations, and what makes each effective.
Understanding the fundamental architectural reasons these attacks succeed is essential for both attackers (to find more) and defenders (to fix the right thing).
Annotated real-world payload collection. All for educational/CTF/authorized testing purposes only.
How these attacks manifest in production AI systems and the blast radius when they succeed.
A 5-step structured methodology for finding, documenting, and reporting LLM injection vulnerabilities in authorized engagements.
Numbered remediation steps with code examples. Not just "validate input" — concrete implementation patterns for each attack class.
Everything at a glance — the attack, why it works, and the defense in a single card.
| Attack Class | OWASP LLM Top 10 | Severity Range | Primary Target | Key Defense |
|---|---|---|---|---|
| Multi-Turn | LLM01 — Prompt Injection | High | Any chatbot | Session-level monitoring |
| Encoded Payloads | LLM01 — Prompt Injection | High | Safety filters | Decode-then-scan pipeline |
| Indirect Injection | LLM01 — Prompt Injection | Critical | AI agents | Trust boundary labeling |
| Long Context | LLM01 + LLM06 Sensitive Info | Medium–High | Long-doc systems | Full-context scanning |
| Chained Attacks | LLM08 — Excessive Agency | Critical | Agentic AI systems | Least-privilege + HITL |
LLMs are designed to be excellent instruction followers — and that is both their greatest strength and their core vulnerability. Every attack in this guide exploits the same root property: the model cannot reliably distinguish who gave an instruction from what the instruction says.
As a penetration tester, your job is to think like an attacker who knows this. As a developer, your job is to architect systems where it doesn't matter — where even a successfully injected instruction can't cause real harm because permissions are scoped, actions are confirmed, and every output is validated before it becomes an action.
Test deeply. Document clearly. Fix systematically.