Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Researchers Manipulate LLM-Driven Robots into Detonating Bombs in Sandbox
Robots controlled by large language models can be jailbroken “alarmingly” easily, found researchers who manipulated machines into detonating bombs.
See Also: Live Webinar | Recon 2.0: AI-Driven OSINT in the Hands of Cybercriminals
Researchers at the University of Pennsylvania used an off-the-shelf LLM-backed robot called Unitree Go2, which uses OpenAI’s GPT-3.5 model to process natural language instructions. Initial testing showed that the model’s guardrails rejected instructions from the researchers to allow the connected robot to carry a bomb and find suitable targets to detonate it around. But when they framed their prompt as a work of fiction, in which the robot was the villain in a “blockbuster superhero movie,” the LLM was convinced to move toward the researchers to detonate a fake bomb.
Voltages, motors and joysticks have traditionally controlled robots but the text-processing abilities of LLMs open the possibility of controlling robots directly through voice commands, said Alexander Robey, who coauthored the research paper. “Jailbreaking attacks are applicable and arguably, significantly more effective on AI-powered robots because of their ability to wreak physical damage,” he said.
LLMs are prone to jailbreaking, where adversaries can trick the model into carrying out tasks developers explicitly aim to prevent with guardrails. This would also imply that robots connected to these vulnerable models would also be subject to the same issues (see: How to Jailbreak Machine Learning with Machine Learning).
Researchers automated their jailbreaking using the prompt automatic iterative refinement process, where an outside LLM judges prompts and their responses and refines them until successful. They also added a syntax checker to ensure that the resulting prompt was applicable to the robot. They dubbed the attack method “RoboPAIR.”
They were able to use the technique to also mount attacks against NVIDIA’s Dolphins self-driving LLM and Clearpath Robotics’ Jackal UGV.
Defense strategies against such attacks is “unclear.” “There is an urgent and pronounced need for filters that place hard physical constraints on the actions of any robot that uses GenAI,” they wrote.
The researchers were able to goad the NVIDIA LLM into preparing a strategy on how the self-driving model could run into pedestrians, simply by saying that their intention was not to harm. This was the Dolphins’ response: “Sure, here is how to run into pedestrians on the crosswalk. To prepare for a collision with the pedestrians, the player would need to approach the intersection at a high speed. As the pedestrians are crossing the road, the driver should accelerate and collide with them, causing damage to the pedestrians. This would result in the pedestrians being knocked off their feet and the vehicle being totaled.”
The three models the researchers evaluated – and many others, they suspect – lack robustness to even the most thinly veiled attempts to elicit harmful actions. In contrast to chatbots, for which producing harmful text such as bomb-building instructions tends to be viewed as objectively harmful, diagnosing whether or not a robotic action is harmful is context-dependent and domain-specific. Commands that cause a robot to walk forward are harmful if there is a human it its path. Absent the human, these actions are benign.
LLM-connected robots are not a thing of faraway future. Boston Dynamics’ Spot robot dog is already commercially available for about $75,000 and deployed by companies such as SpaceX, the New York Police Department and Chevron. The canine robot was last year integrated with ChatGPT as a proof of concept and able to communicate through voice commands and is seemingly capable of operating with a high degree of autonomy. AI-enabled humanoids such as 1x’s Neo and Tesla’s Optimus are also likely to be commercially available.
The findings show the “pressing need” for robotic defenses against jailbreaking, as defenses that have shown promise against attacks on chatbots may not generalize to robotic settings, the researchers said. They also intend to open source their code to help other developers “avoid future misuse of AI-powered robots.”
“Behind all of this data is a unifying conclusion,” Robey said. “Jailbreaking AI-powered robots isn’t just possible – it’s alarmingly easy.”