Agentic AI
,
Artificial Intelligence & Machine Learning
,
Cyberwarfare / Nation-State Attacks
Evaluations of Claude Mythos 5 Elevates Offensive Cyber, But Isn’t Fully Autonomous

Anthropic’s new frontier artificial intelligence model can substantially automate meaningful offensive cyber operations but hasn’t yet crossed into a fully autonomous cyber offense tool, according to the company.
See Also: Edge Transformation: Top 5 SASE Predictions and Trends
The company’s Mythos 5 model introduced Tuesday can meaningfully contribute to offensive cyber work, raising questions around how much autonomy these systems should be granted and how effectively safeguards can limit harmful use. Mythos 5 isn’t restricted by the safeguards placed around Fable 5, but access will initially be restricted to the 200 organizations vetted through Anthropic’s Project Glasswing.
“Claude Mythos 5 demonstrates the strongest overall cyber capabilities of any model we have ever evaluated,” Anthropic wrote Tuesday. “Across our internal evaluation suite, it meets or exceeds the performance of Claude Mythos Preview, whose step-change in autonomous vulnerability discovery and exploitation led us to restrict access to a limited set of partners for defensive cybersecurity purposes.”
Large language models could explain vulnerabilities, generate proof-of-concept code and assist with penetration testing tasks, but Anthropic said Mythos 5 appears to have moved beyond that. It demonstrated the ability to discover vulnerabilities, triage them, develop exploit chains and ultimately achieve arbitrary code execution with a level of consistency previously unseen, Anthropic said.
“Although Mythos 5 is in Tier 1, its performance was strong enough on our evaluations that we have chosen to deploy additional mitigations that block potentially harmful offensive cyber uses,” Anthropic wrote in a 319-page system card for Claude Fable 5 and Claude Mythos 5.
Exploit development traditionally required a combination of deep reverse-engineering expertise, understanding of memory corruption, knowledge of mitigations such as ASLR and sandboxing, and substantial experimentation, Anthropic said. What makes Mythos 5 noteworthy is not merely that it occasionally succeeds, but that it succeeds consistently, producing working exploits 90% of the time (see: Anthropic Unveils Claude Fable 5, Keeps Mythos Restricted).
“Claude Opus 4.8 frequently achieves register control, but rarely converts it into full code execution,” Anthropic wrote. “By contrast, Mythos 5, like Claude Mythos Preview before it, converts usable corruption primitives into working exploits at a very high rate.”
Mythos 5 Succeeds at Attacking IT Systems, But Struggles With OT Systems
Mythos 5 not only finds vulnerabilities but frequently develops meaningful exploit primitives after doing so, Anthropic said, meaning it can dramatically accelerate both defensive vulnerability research and offensive discovery. But Anthropic notes that Mythos 5 remains below the threshold of independently conducting large-scale offensive campaigns.
“In this evaluation, the model is tasked with finding a vulnerability in a fully patched build and developing an exploit primitive for that vulnerability,” Anthropic wrote. “To set this up, the model is given a fuzzing entry point. It does not receive any target-specific vulnerability clues.”
The U.K. AI Security Institute concluded that Mythos 5 is capable of attacking small enterprise networks that already have weak security and where initial access has been obtained, allowing it to function as a force multiplier for attackers. The near-term risk is not fully autonomous cyber warfare but highly capable AI copilots that make human attackers more productive, Anthropic said.
“We judge that [Claude Mythos 5], like Mythos Preview, is capable of attacking small enterprise networks with weak security where it has already gained access to the network,” the U.K. AI Security Institute shared. “Our results indicate that [Claude Mythos 5] is more proficient at this than any other publicly available model we have tested.”
But in an evaluation simulating industrial control systems, Mythos 5 achieved only limited success and failed to complete the overall objective. Mythos Preview succeeded only occasionally. Enterprise IT environments are increasingly standardized and software-centric. Industrial environments are far more heterogeneous, reliant on proprietary protocols, specialized hardware and decades-old systems.
“[Claude Mythos 5] made only limited progress on the ‘Cooling Tower’ industrial control system range, but we draw no strong conclusion about [Claude Mythos 5]’s autonomous capability against operational technology environments at this stage,” the U.K. AI Security Institute wrote.
How Effective Fable 5’s Fallback Cyber Strategy Is in Practice
Anthropic’s deployment strategy for Claude Fable 5 revolves around restricting access to cyber expertise during potentially harmful interactions through activation monitoring, classifier systems and model fallbacks. Safety in frontier AI development increasingly depends not on reducing model capability but on controlling access to that capability, Anthropic said.
“On most interfaces, Fable 5 falls back to the most recent Opus model (Opus 4.8) for requests that are flagged by our classifier system,” Anthropic wrote. “Since our classifiers consistently fire across all tested cyber capability evaluations, Fable 5’s performance on cyber tasks is nearly identical to Opus 4.8. For this reason, we conclude that Fable 5 does not provide an uplift on cyber tasks relative to Opus 4.8.”
Cyber-specific safeguards have become much more sophisticated, Anthropic said, with researchers struggling to identify universal jailbreaks capable of broadly bypassing protections and successful attacks instead tending to be highly task-specific and difficult to generalize. This is an evolution from earlier generations of AI systems, where simple prompt-engineering tricks could often bypass restrictions.
“The public bug bounty has received approximately 100,000 attempts on the challenge, which we believe corresponds to on the order of 1,000 hours of effort,” Anthropic wrote. “This process has not resulted in a single universal jailbreak, and there have only been two successful task-specific jailbreaks.”
Mythos 5 achieved the strongest prompt-injection resistance among the company’s models, particularly in coding and computer-use environments, but browser-based environments were at first vulnerable until additional safeguards were developed, Anthropic said. Defending autonomous agents against prompt injection may become as important as defending against phishing or malware today, researchers said.
“The Mythos models are our most resilient models against prompt injection to date,” Anthropic wrote. “Since Claude Fable 5 shares the same core model as Claude Mythos 5, it inherits these gains, making it our most robust generally available model. We continue to improve safeguards in our agentic products to further protect our users against prompt injection.”
