Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Tested on Three OpenAI Models, ‘Minja’ Has High Injection and Attack Rates

A memory injection attack dubbed Minja turns AI chatbots into unwitting agents of misinformation, requiring no hacking and just a little clever prompting. The exploit allows attackers to poison an AI model’s memory with deceptive information, potentially altering its responses for all users.
See Also: Capturing the cybersecurity dividend
Discovered by researchers from Michigan State University, the University of Georgia and Singapore Management University this month, attackers can launch Minja solely through user interactions without requiring administrative access to the AI’s backend. Unlike previous threats, which assumed that attackers needed control over an AI model’s memory bank, Minja enables any user to corrupt an AI agent’s knowledge, influencing how it processes future queries from others.
Memory retention in AI models has been a game-changer for the user experience, allowing chatbots and AI agents to provide contextually relevant responses based on past engagements.
Minja works by tricking an AI model into accepting fabricated information as part of its retained memory. By crafting a series of seemingly innocuous prompts, an attacker can insert misleading data into an AI agent’s memory bank, which the model later relies on to answer unrelated queries from other users.
Researchers tested Minja on three AI agents developed on top of OpenAI’s GPT-4 and GPT-4o models. These include RAP, a ReAct agent with retrieval-augmented generation that integrates past interactions into future decision-making for web shops; EHRAgent, a medical AI assistant designed to answer healthcare queries; and QA Agent, a custom-built question-answering model that reasons using Chain of Thought and is augmented by memory.
A Minja attack on the EHRAgent caused the model to misattribute patient records, associating one patient’s data with another. In the RAP web shop experiment, a Minja attack tricked the AI into recommending the wrong product, steering users searching for toothbrushes to a purchase page for floss picks. The QA Agent fell victim to manipulated memory prompts, producing incorrect answers to multiple-choice questions based on poisoned context.
Minja operates in stages. An attacker interacts with an AI agent by submitting prompts that contain misleading contextual information. Referred to as indication prompts, they appear to be legitimate but contain subtle memory-altering instructions. The AI model over time incorporates these deceptive records into its memory bank, treating them as factual references.
When a victim submits a query that overlaps with the manipulated memory, the AI retrieves the poisoned information, influencing its response. The researchers say the technique’s effectiveness is high, achieving a 95% injection success rate across AI agents and datasets and over 70% attack success rate on most datasets.
One reason Minja is so effective is that it circumvents traditional content moderation. AI models often have security mechanisms that detect and block harmful input and output, but Minja evades these by embedding its payload within legitimate-seeming reasoning steps. Since these steps appear plausible to both the model and human reviewers, the attack slips under the radar.
AI providers also rely mostly on input filtering, output moderation and post-deployment monitoring to detect attacks. But Minja operates differently from traditional prompt injection attacks since it does not require direct manipulation of the model’s parameters and instead exploits its memory retention system.