Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
,
The Future of AI & Cybersecurity
Researchers Say AI Prompt Injection Has Emerged As a Dangerous New Class of Attacks

The large language model industry has mostly treated prompt injection attacks as a risk analogous to traditional web server prompt injection attacks. Researchers now say the industry has been solving the wrong problem.
See Also: Agentic AI and the Future of Automated Threats
Prompt injection, or feeding rogue instructions to an artificial intelligence system, merits its own classification as “promptware” – malware that uses a large language model as its own execution engine, say researchers in a paper co-authored by researchers at Tel Aviv University, Ben-Gurion University of the Negev and Harvard University.
The analogy the industry has relied on, that prompt injection is the AI equivalent of SQL injection, understates the threat, the authors argue. SQL injection corrupts a database, while promptware, depending on what tools and permissions the targeted AI system holds, can exfiltrate data, replicate across systems, manipulate Internet of Things devices or execute arbitrary code on a victim’s machine. The U.K. National Cyber Security Centre called prompt injection “dangerously misunderstood,” a characterization the paper cites approvingly.
Paper authors – among them Bruce Schneier – propose a seven-stage kill chain to describe how an attack unfolds from first entry to final damage. It begins with privilege escalation via jailbreaking, the act of pushing a model past its built-in safeguards. Reconnaissance, persistence, command and control, and lateral movement come after, with the final step as “actions on objective.” The authors distinguish between prompt injection, which first gets attacker instructions into the model’s context window, and jailbreaking that removes the guardrails that would otherwise block them from doing harm.
To test whether the framework describes reality, the researchers analyzed 36 documented attacks against production AI systems spanning three years. In February 2023, recorded attacks typically covered two or three stages, while in the more recent period, 15 of 21 documented incidents traversed four or more stages.
The paper distinguishes two forms of persistence. One, retrieval-dependent persistence, plants a payload inside documents, emails or calendar events that an AI system regularly pulls in. The malicious prompt reactivates whenever the poisoned content is fetched. Retrieval independent persistence is when hackers implant a long-term memory feature, such as the “memories” function in ChatGPT, ensuring the payload surfaces in every subsequent session regardless of what the user asks.
Ben Nassi, who leads the Adversarial Minds research group at Tel Aviv University and is an author of the paper, said the research has practical stakes. Systems that store and incorporate new data from the internet into every inference may be vulnerable to retrieval independent persistence, he told Information Security Media Group. Systems that activate memory based on relevance to a user’s query face the dependent form.
A computer worm targeting generative AI applications documented in March 2024 demonstrated an attack traversing all five stages. The “Morris II” worm targeted AI-powered email assistants, self-replicating by embedding copies of malicious instructions into outgoing emails and spreading from user to user. A demonstration showed ChatGPT could be turned into a remotely controlled agent – the paper calls these instances “ZombAI” – by writing instructions into its long-term memory that directed the model to fetch updated commands from an attacker-controlled GitHub page. That case became the first confirmed instance of what the paper terms “promptware-native command and control,” meaning the attacker’s ongoing direction of a compromised system operated entirely through the model’s prompt layer, without conventional malware infrastructure underneath it.
AI coding assistants emerged as a prominent promptware target, since these tools execute code and often hold developer credentials. They drew seven of the 21 attacks documented in 2025 and so far in 2026. A flaw in GitHub Copilot, tracked as CVE-2025-53773, allowed remote code execution through a prompt injection attack.
Paper authors criticize the AI industry’s response to this new class of attacks. “The primary problem, in my opinion, is that people are under the perception that prompt injections could be mitigated with better classifiers instead of assuming that prompt injection will occur and preventing it with security in-depth for each step of the kill chain,” Nassi said.
The authors reviewed more than 20 categories of defensive measures and found that reconnaissance, one of the seven stages, currently has no dedicated mitigations at all. Defenses offering the strongest prevention tend to impose the steepest usability costs, a tradeoff with no clean resolution.
Nassi said it’s likely attackers will find a way to automate promptware attacks “in the near future.”
