Agentic AI
,
Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
AI Agent Can Access File Upload API to Exfiltrate Documents

Security researchers have demonstrated how Anthropic’s new Claude Cowork productivity agent can be tricked into stealing user files and uploading them to an attacker’s account, exploiting a vulnerability the company allegedly knew about but left unpatched for three months.
See Also: Proof of Concept: Bot or Buyer? Identity Crisis in Retail
The vulnerability allows attackers to manipulate Cowork through prompt injection into uploading user files to an attacker’s Anthropic account, without requiring any additional approval from the victim. Security firm PromptArmor published a proof of concept, showing how the attack works against the artificial intelligence agent.
The attack chain starts when a user connects Cowork to a local folder containing sensitive information. The user uploads a document that contains a hidden prompt injection. When Cowork analyzes the files, the injected prompt triggers automatically. PromptArmor demonstrated this using a scenario in which the malicious document posed as a Claude Skill, a type of instruction file users can upload to extend the AI’s capabilities.
The injection instructs Claude to execute a curl command to Anthropic’s file upload API using the attacker’s API key, rather than the victim’s. Code executed by Claude runs in a virtual machine that restricts outbound network requests to almost all domains, but the Anthropic API is whitelisted as trusted, allowing the attack to succeed.
The vulnerability affects Claude Haiku and the company’s flagship model Claude Opus 4.5. PromptArmor demonstrated data exfiltration from Opus 4.5 when a simulated user uploaded a malicious integration guide while developing a new AI tool. The firm said that prompt injection exploits architectural vulnerabilities rather than model intelligence gaps, meaning that reasoning provides no defense.
Security researcher Johann Rehberger first disclosed the Files API exfiltration vulnerability to Anthropic via HackerOne in October 2025. He said Anthropic allegedly closed the bug report an hour later, dismissing the issue as out of scope and classifying it as a model safety concern, rather than a security vulnerability.
Rehberger said Anthropic again contacted him that month to say that data exfiltration vulnerabilities are in scope for reporting. But, he said, the company did not implement a fix. When Cowork launched on Jan. 13, nearly three months after the initial disclosure, the API was still vulnerable.
To mitigate the risks, Anthropic advised Cowork users to avoid connecting the tool to sensitive documents, limit its Chrome extension to trusted sites and monitor for suspicious actions that may indicate prompt injection. Developer Simon Willison, who reviewed Cowork, questioned the company’s approach. “I do not think it is fair to tell regular non-programmer users to watch out for ‘suspicious actions that may indicate prompt injection,'” Willison said.
Anthropic said that Cowork was released as a research preview with unique risks due to its agentic nature and internet access. It plans to ship an update to the Cowork virtual machine to improve its interaction with the vulnerable API and that other security improvements will follow.
PromptArmor researchers also discovered that Claude’s API struggles when a file does not match the type it claims to be. When operating on a malformed PDF that is actually a text file, Claude throws API errors in every subsequent chat in the conversation. Researchers said this failure could potentially be exploited through indirect prompt injection to cause a limited denial of service attack.
The broader implications of the vulnerability extend beyond file exfiltration. Cowork was designed to interact with a user’s entire work environment, including browsers and model context protocol servers that grant capabilities such as sending texts or controlling a Mac with AppleScripts. Those functions increase the likelihood that the model will process sensitive and untrusted data sources that users do not manually review for injections, creating what PromptArmor describes as an ever-growing attack surface.
