Agentic AI
,
Artificial Intelligence & Machine Learning
,
Governance & Risk Management
Fallout Continues Over Leaked Claude Source Code Incident

The tension between artificial intelligence developers and cybersecurity vendors is becoming increasingly apparent as new models show sudden leaps in capability – and Anthropic, for better or for worse, finds itself at the center of the drama.
See Also: AI Security Risks Rise With Agentic Systems
On Tuesday, the company accidentally leaked the source code for its agentic harness, which tells its Claude Code agents how to – and how not to – interact with other software. This was an embarrassing episode in which Anthropic erroneously included a source map file in a new version of the Claude Code npm package and, according to Claude Code creator Boris Cherny, “It was human error.”
“Our deploy process has a few manual steps, and we didn’t do one of the steps correctly. We have landed a few improvements and are digging in to add more sanity checks,” Cherny said in a late-Tuesday X post, adding that “more automation and Claude checking the results” should improve the process in future.
But this data leak probably wasn’t the most significant security-related incident in Anthropic’s rather messy March, as it came just days after the firm’s unwitting revelation of an upcoming model that supposedly has unprecedented bug-finding powers.
Last Thursday, Fortune found an unpublished blog post in a data cache that Anthropic had left publicly exposed. The post detailed a new model that’s codenamed either Mythos or Capybara – multiple archived versions of the text diverge on this detail – and that is apparently “far ahead of any other AI model in cyber capabilities,” presaging “an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.”
For that reason, Anthropic said, it would provide the first access to the model to “cyber defenders” within organizations, “giving them a head start in improving the robustness of their codebases against the impending wave of AI-driven exploits.”
The Mythos miscue prompted a quick and now-familiar reaction in the markets, briefly pummeling cybersecurity stocks including Palo Alto Networks and CrowdStrike on Friday. Much the same scenario played out at the end of February after Anthropic unveiled the vulnerability-scanning, patch-proposing Claude Code Security. Just as in that incident, the markets soon realized that their reaction was overblown. This is just a piece of what the big cybersecurity players provide, and the losses were largely reversed (see: After the Panic, the Reality of Claude Code Security).
But the meat of Anthropic’s message – that the newest AI models could be a boon for both defenders and attackers – could scarcely be more timely. As repeatedly demonstrated at last week’s RSA Conference, cybersecurity professionals are deeply concerned about the risks being introduced by the technology, such as companies deploying AI agents that potentially undermine their security, and hackers using AI to enhance their capabilities.
Alessandro Pignati, a security researcher at agentic governance company NeuralTrust, praised Anthropic’s apparent decision to release Mythos/Capybara to the security community before regular users.
“[They] have to think about it and not just release everything,” he told ISMG on Tuesday. “It’s not just a matter of safeguards because we know that safeguards can be bypassed sometimes.”
Anthropic research scientist Nicholas Carlini delved deeply into that subject in an early March talk at the [un]prompted security conference in San Francisco. In his address, Carlini revealed that he had used Claude to find multiple heap buffer overflow vulnerabilities in the venerable Linux kernel – with one dating all the way back to 2003.
“Language models can autonomously and without fancy scaffolding find and exploit zero-day vulnerabilities in very important pieces of software,” Carlini said at the conference. “This is not something that was true even, let’s say, three or four months ago.”
AI models, he said, are “getting really, really good, really fast. And this means that the nice balance we had between attackers and defenders over the last 20 years or so seems like it’s coming to an end. It really seems to me like the language models that we have now are probably the most significant thing to happen to security since we got the internet.”
The Anthropic security researcher said it’s currently impossible to use his company’s model to perform this kind of vulnerability scanning at large scale: “If I take a piece of software and I ask Claude to find a bunch of vulnerabilities and run it multiple times, it will probably find the same bug each time. Also it’s not very thorough. It will review some of the code but not all of the code.”
However, he pointed out, this ability only became apparent in Claude Opus 4.5 and 4.6, released in the last couple months, and there’s every reason to believe upcoming models will keep getting much better at finding vulnerabilities.
“The rate of progress is very large, so you should expect that the best models can do this today [and] the average model you have on your laptop probably can do this in a year,” Carlini said. “If we continue on this trend for even just another year, they’ll probably be better vulnerability researchers than all of you and I don’t know what that world looks like. It’s quite scary to live in a world where you can automatically find bugs that previously only the top one or two people in the world could have found.”
In the long run, he added, defenders will probably win out as AI helps them to harden security. But until then, “things probably are very bad,” and AI companies like Anthropic will need the security industry’s ideas for how to better manage risky releases.
“I do want to make sure people can’t use these things for harm, and indeed Anthropic’s models and OpenAI’s models and DeepMind’s models will generally refuse if you’re very explicitly doing nasty things,” Carlini said. “Clearly, they’re going to need to get better if they’re going to be able to refuse everything.”
It’s hard to find the right balance for locking down models, he said, as overly weak safeguards will allow bad actors to jailbreak the model, while excessive safeguards will stop “good people” from getting the most out of it. “I think we’re doing an OK job, but I think this is one of the areas where we need a lot more help to figure out how to do this better,” Carlini said, appealing to his audience for ideas.
But it’s not enouh to trust the AI vendors to contain the cybersecurity spillover of their releases, warned NeuralTrust’s Pignati, who favors more AI regulation – something that’s unlikely in the U.S. anytime soon, certainly at the federal level. “Of course we cannot rely on the companies, because every time it’s a matter of personal interests,” he said.
