Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Update to Anthropic Model Allows Automation Without Human Oversight
Anthropic’s artificial intelligence “assistant” can now autonomously run tasks on computers it’s used on, a feature the AI giant positions as a perk.
See Also: Supercharge your Security with Unit 42 MDR
The updated Claude 3.5 Sonnet can “reason” and essentially use computers “the same way people do,” Anthropic said. The public beta feature allows the AI model to perform tasks beyond simple text-based operations. It can autonomously execute tasks typically requiring manual computer interaction such as moving cursors, opening web pages, typing text, taking screenshots, downloading files and running bash commands.
Its use potentially extends to handling complex workflows, data management and computer-based automation, with Anthropic saying that companies such as Asana, Canva and DoorDash are already testing the feature for jobs that take “dozens, and sometimes even hundreds, of steps to complete.”
Anthropic said the new feature could be faulty or make mistakes. “Please be aware that computer use poses unique risks that are distinct from standard API features or chat interfaces,” it said. “These risks are heightened when using a computer to interact with the internet.”
The warning adds that in some circumstances, Claude will follow commands found in content even if it conflicts with the user’s instructions. For example, instructions on webpages or contained in images may override instructions or cause Claude to make mistakes, the company said. This essentially means that the model could follow malicious instructions in files it autonomously opens, putting the system and its connected entities at risk of cyberattack (see: Insiders Confuse Microsoft 365 Copilot Responses).
The company suggested that users isolate Claude from sensitive data and actions to avoid risks related to prompt injection without additional details on what the steps could include (see: Anyone Can Trick AI Bots Into Spilling Passwords).
Claude is built to refuse tasks such as making purchases, perform actions that require personal authentication or send emails on its own, but AI developer Benjamin De Kraker claims it is possible to use a “wrapper” to circumvent those restrictions without providing additional details.
Claude’s computer use is “untested AI safety territory,” said Jonas Kgomo, founder of the AI safety group Equianas Institute, adding that it was possible to use the tool to carry out cyberattacks.
SocialProof Security CEO Rachel Tobac said the feature could “easily” scale cyberattacks by automating the task of getting a machine to go to a website and downloading malware or by putting sensitive information into the wrong hands. “Breaking out into a sweat thinking about how cybercriminals could use this tool,” she said.
Hackers could design websites with malicious code or prompts that override the instructions given to the AI model, forcing it to execute an attack. It could also potentially blur the lines on who would be responsible for unintended harm the AI model causes, such as cyberattack or data compromise, she said. “I’m majorly crossing my fingers that Anthropic has massive guardrails. This is some serious stuff,” she said.
Anthropic has so far not detailed specific guardrails, except for advice on limiting Claude’s access to sensitive details and monitoring its operations.
Tobac said it’s unlikely many people will voluntarily take steps to protect themselves. “I know a lot of folks may accidentally get pwn’d bc the tool automates tasks in such a way that they get confused in the moment and make serious permission mistakes with big consequences,” she said.