Agentic AI
,
Artificial Intelligence & Machine Learning
,
Governance & Risk Management
Anthropic’s Mythos Leak Points to Pattern of Failures, Sloppy Practices at AI Labs

Anthropic spent years positioning itself as the company that thinks hardest about what could go wrong with artificial intelligence. But in the past two weeks, the AI company accidentally announced its own new product – and then handed competitors a detailed blueprint of its most widely deployed tool.
See Also: AI Impersonation Is the New Arms Race—Is Your Workforce Ready?
Neither incident involved an adversary. The first was a misconfigured content management system and the second was a debugging file that was not meant to be shipped. Both failures were, in Anthropic’s own words, human error.
What makes this a bad stretch for the broader AI sector is that it fits a pattern spanning the last three years. Meta’s Llama model escaped a restricted research release within a week of being shared with approved academics. Microsoft’s AI research team exposed 38 terabytes of internal data through a cloud credential set with the wrong permissions. OpenAI has acknowledged that a class of attack that manipulates AI agents through their inputs may never be fully contained. The mechanisms differ each time, but the structure doesn’t: The failure originates inside the organization, not at the perimeter, and organizations depending on these tools get no advance notice.
It raises the standard vendor due diligence question: “Has this vendor been breached?” But it doesn’t capture the entire perspective. None of these incidents required a breach. The data left the companies on their own, through gaps that routine operational controls are supposed to catch. The more useful question for a security or technology leader evaluating AI platforms is whether the vendor’s internal development and data-handling processes are mature enough to match what they are building – and at the scale enterprises depend on.
The Anthropic incidents arrived in quick succession. Security researchers Roy Paz of LayerX Security and Alexandre Pauwels of the University of Cambridge last Thursday found a draft blog post announcing an unreleased model, along with internal PDFs and an itinerary for an invite-only executive retreat, in a publicly searchable data cache. Anthropic’s content management system published uploaded files publicly by default and only an explicit manual change would have kept them private.
Anthropic reportedly restricted access and acknowledged that “an issue with one of our external CMS tools led to draft content being accessible,” attributing it to “human error.” Close to 3,000 unpublished assets were exposed. Anthropic separately confirmed to Fortune that the model, internally called Mythos, was “a step change” in AI performance and “the most capable we’ve built to date.”
Security researcher Chaofan Shou on Tuesday spotted an anomaly in a routine update to Claude Code, distributed through npm, the platform developers use to share and update software packages. The update included a source map file, a debugging artifact that links compressed production code back to the original, readable source. That file pointed to a zip archive on Anthropic’s own cloud storage containing the complete Claude Code codebase: roughly 512,000 lines across 1,900 files. Shou posted the link on X. Within hours, the code was mirrored across GitHub and forked tens of thousands of times. Anthropic called it “a release packaging issue caused by human error, not a security breach.”
Software engineer Gabriel Anhaia, who analyzed the leak, said: “A single misconfigured .npmignore or files field in package.json can expose everything.” The .npmignore file is a list that tells the packaging tool what to leave out of a public release. A missing entry means the debug file ships with the product. It’s the kind of check that is part of pre-release audit checklists, but it doesn’t appear to have happened in this instance.
Paz, whose firm first located the Mythos documents, assessed the source code as well. “Usually, large companies have strict processes and multiple checks before code reaches production, like a vault requiring several keys to open,” he told Fortune. “At Anthropic, it seems that the process wasn’t in place and a single misconfiguration or mis-click suddenly exposed the full source code.” His concern was that beyond competitors now having the code, the leaked files also surfaced non-public details about how Anthropic’s systems work – information that could help threat actors probe for weaknesses. For organizations that have integrated Claude Code into their development pipelines, it’s a supply-chain trust question.
Alex Kim, a developer who published a breakdown of the Claude Code leak the day it happened, addressed the argument that the exposure was minor because rival tools from Google and OpenAI are already open source. “Those companies open-sourced their agent SDK, not the full internal wiring of their flagship product,” he said. “The real damage isn’t the code. It’s the feature flags. KAIROS, the anti-distillation mechanisms: These are product road map details that competitors can now see and react to. The code can be refactored. The strategic surprise can’t be un-leaked.” Essentially, the internal permission enforcement logic, agent orchestration design and system prompts that govern how Claude Code behaves in their environments are now public.
Not Just an Outlier
Anthropic’s week is the most recent entry in the pattern, not an outlier. In February 2023, after Meta released its Llama language model to a restricted group of approved researchers under a noncommercial license, a torrent of the model appeared on 4chan within a week, posted by someone who had been granted access. Meta filed takedown requests, but the model spread anyway. As Vice reported at the time, it was the first occasion a major technology company’s proprietary AI model had leaked to the public – and not because anyone broke in, but because an approved participant shared a file. The access control model had failed.
Six months later, cloud security firm Wiz disclosed that Microsoft’s AI research team had exposed 38TBs of internal data while posting open-source training data to GitHub. The team shared access using an Azure cloud storage credential, which is a URL-based token that when configured correctly, restricts access to a specific folder. This miscue granted full-control permissions across the entire storage account and was set to expire in 2051. Exposed data included employee workstation backups, internal passwords, private keys and more than 30,000 internal Teams messages. Wiz reported it to Microsoft in June 2023. Microsoft revoked the credential two days later and said no customer data was exposed. The exposure had been live and publicly accessible for months before an external researcher found it. Microsoft’s own monitoring had not caught it.
OpenAI, for its part, acknowledged in December 2025 that attacks embedding malicious instructions in content that AI agents process, overriding what the agent was configured to do, represent a threat it can’t fully eliminate. “Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully solved,” the company wrote. For enterprises running AI agents that interact with emails, documents or web pages, it’s a live exposure in production environments today, acknowledged by the vendor as permanent.
Each of these incidents has a clean technical explanation: a CMS default, a packaging oversight, an overly permissive credential and an architectural property of how language models process instructions. Taken individually, each looks like a one-time lapse. But taken together across three years, four of the most prominent AI companies in the world demonstrate a consistent gap between the speed at which AI products are being built and shipped and the maturity of the processes surrounding the development work.
For a security or technology leader deciding which AI vendors to trust with core enterprise workflows, the key question isn’t whether or not the vendor has suffered a breach. It’s whether the vendor’s own pipelines reflect the same standard of care it asks of its customers.
