Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Researchers Keep Prompts Under Wraps
Academics at a U.S. university found that if you feed a GPT-4 artificial intelligence agent public security advisories, it can exploit unpatched “real-world” vulnerabilities without precise technical information.
See Also: Key Security Challenges and Tooling Approaches for APAC in 2024
Researchers at the University of Illinois Urbana-Champaign fed to AI agents descriptions of more than a dozen disclosed but unpatched – or “one-day” – vulnerabilities, including two bugs rated as “critical” on the CVSS scale. The agent they created with OpenAI’s GPT-4 exploited 87% of the vulnerabilities. Fourteen other agents made with models including GPT-3.5, several open-source large language models and open-source vulnerability scanners ZAP and Metasploit, failed entirely.
Daniel Kang, one of the four scientists who published the paper, said GPT-4 was “incredibly good” at following instructions and planning around possibly vague descriptions such as CVE descriptions. “The other LLMs we tested struggled with this: this was my biggest surprise, given how other LLMs are great at other tasks,” he told Information Security Media Group.
Kang said the tested models did not include top GPT-4 competitors Claude 3 and Gemini 1.5 Pro because the team did not have access to them at the time of the experiments.
Kang and his colleagues created the GPT-4 AI agent with just 91 lines of code. “If you extrapolate to what future models can do, it seems likely they will be much more capable than what script kiddies can get access to today,” the paper says.
GPT-4’s success has a key caveat: It needs a CVE description of the flaw to carry out the task. Without that, the AI agent could only exploit 7% of the vulnerabilities.
AI agents are large language models that are combined with automation software. In this study, GPT-4 doesn’t demonstrate an emergent capability to autonomously analyze and exploit software vulnerabilities, but it does show its value as a key component of software automation by joining existing content and code snippets, said Chris Rohlf, a non-resident research fellow at the Georgetown Center for Security and Emerging Technology’s CyberAI Project.
The only vulnerabilities GPT-4 could not exploit were the Iris XSS and Hertzbeat RCE.
Iris is an application that allows incident responders to share technical details during investigations. It is “extremely difficult for an LLM agent to navigate, as the navigation is done through JavaScript,” the researchers said. But Rohlf said he could get around the issue by explaining to GPT-4 what the advisory meant and how the code snippet worked. “Extracting the proof-of-concept exploit from this advisory and exploiting the JNDI endpoint is rather trivial,” he said.
The English-prompted GPT-4 could not exploit Hertzbeat since the CVE description was in Chinese, which confused the agent, the university scientists said. Rohlf said this limitation was “somewhat ironic,” as the agent and GPT-4 are being framed in this research as an exploitation automation engine and yet “they were unable to overcome a UI navigation issue.” He said the limitation could possibly be “easily overcome,” but he could not test it because the authors did not publish their code.
Kang said OpenAI “explicitly asked the team to not release the prompts publicly at this point.” OpenAI did not respond to a request for comment.
The researchers said the prompt is “detailed and encourages the agent to be creative, not give up, and try different approaches.” They did not make the prompts public for ethical reasons but said they will share them on request.
The latest study comes months after the researchers in February published a paper that describes how LLMs could be used to automate attacks on websites in a sandboxed environment.
LLM agents have become increasingly popular in the past few years. Instead of just collating and presenting information, agents can use tools at their disposal to create other subagents to complete subsets of a task, perform complex software engineering tasks and assist in scientific investigations.
Other similar studies have “only been in the context of toy capture-the-flag exercises. In our work, we explore the capabilities of LLMs to hack real-world vulnerabilities,” the researchers said.
The results of the study show the possibility of an emergent capability and that uncovering a vulnerability is more difficult than exploiting it, they said.
The researchers computed the cost of carrying out an attack to be $8.80 per exploit – nearly three times lower than hiring a human penetration tester for half an hour. Kang said that GPT-4 agents were also “trivially parallelizable,” which means that the cost would essentially be linear to the number of agents deployed. “This is not the case for humans,” he said.
“It is my personal opinion that many people underestimate the trends in AI – both in terms of ability and cost,” Kang said. GPT-4 turbo is already three times cheaper than GPT-4, and prices are likely to drop. GPT-4 is much more capable than GPT-3.5, and it is very likely that GPT-5 will be more capable than GPT-4, he said. According to Kang, the team’s primary goal was to highlight trends in the capabilities of frontier agents and “not to say that existing agents are dangerous, as others may imply.”
Kang said his team was still “thinking through the implications of our work” and was “not sure what impact our findings will have.” The landscape of LLMs and computer security is changing rapidly, he said – “so fast that I find it hard to predict where things will go.”