Agentic AI
,
Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
AI Firm Discovers New Prompt Injection Attack Class

OpenAI faces a years-long battle to secure its ChatGPT Atlas web browser against prompt injection attacks, a threat the company says will require continuous defense strengthening much like the arms race against online scams targeting humans.
See Also: A CISO’s Perspective on Scaling GenAI Securely
The company shipped a security update to Atlas following internal discovery through automated red-teaming of a new class of prompt injection attacks.
Prompt injection attacks embed malicious instructions into content that AI agents process, overriding the agent’s intended behavior to follow an attacker’s commands instead. For browser agents like the one in ChatGPT Atlas, this creates a threat distinct from traditional web security risks.
The attack surface is extensive. Agents may encounter untrusted instructions in emails, attachments, calendar invites, shared documents, forums, social media posts and webpages. Since the agent can perform many actions a user can in a browser, attacks could result in forwarding sensitive emails, sending money, editing or deleting cloud files and other harmful actions.
OpenAI built an automated attack system trained with reinforcement learning to discover prompt injection attacks against its browser agent. It learns from successes and failures. During its reasoning process, it proposed injection attacks and sent them to a simulator that showed how the victim agent would respond. The attacker uses that feedback to refine attacks through multiple iterations before finalizing them.
The automated attacker discovered what OpenAI calls a new class of attacks: the ability to steer agents into executing harmful workflows spanning tens or hundreds of steps. This contrasts with simpler prompt injection attacks that typically elicit specific output strings or trigger single-step tool calls.
In one example, the automated attacker placed a malicious email in a user’s inbox containing instructions directing the agent to send a resignation letter to the user’s chief executive. When the user later requested the agent draft an out-of-office reply, the agent encountered the malicious email during normal task execution, treated the injected prompt as authoritative and sent the resignation message instead of the requested out-of-office reply.
OpenAI describes prompt injection as an open challenge for agent security that it expects to work on for years. “Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,'” OpenAI wrote.
Agent mode in ChatGPT Atlas allows the browser agent to view webpages and take actions, clicks and keystrokes inside a user’s browser. This functionality enables ChatGPT to work on day-to-day workflows using the same context and data as users. As the browser agent handles more tasks, it becomes a higher-value target for adversarial attacks. A security update to Atlas includes an adversarially trained model and strengthened safeguards.
OpenAI is not alone in confronting the prompt injection challenge. The U.K. National Cyber Security Centre warned in early December that prompt injection attacks against generative AI applications may never be totally mitigated, advising organizations to focus on reducing risk and impact rather than attempting to stop the attacks entirely.
