Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Researchers Turn to AI to Fix a Zombie Flaw that AI Helped Propagate

Artificial intelligence tools that inadvertently perpetuated a decade-old bug may now also help eliminate it.
See Also: Taming Cryptographic Sprawl in a Post-Quantum World
A developer in 2010 published a small code snippet as a GitHub Gist to show how to create a static file server in Node.js. It included a subtle path traversal vulnerability allowing attackers to navigate outside a designated directory. Over time, the insecure pattern propagated through Stack Overflow answers, blog posts, university tutorials and even production repositories of major companies.
Over the years, it became so embedded in developer culture that it found its way into training data for today’s AI models.
“We are not 100% sure that the 2010 Gist is the original source, but it’s the earliest instance we could trace,” said Jafar Akhoundali, a doctoral candidate at Leiden University and lead author of a pre-publication paper analyzing the issue. “It’s likely the code snippet spread because it solved a common problem – serving static content – and could be easily reused with copy-paste.”
Tutorials and platforms like Stack Overflow are likely major contributors to propagation, as they are more visible, more trusted by developers seeking quick solutions and designed for code reuse. Direct copy-pasting between projects happens less frequently, as it often requires more context and effort to extract reusable code from business logic, said Akhoundali.
To measure how deeply rooted the bug has become, Akhoundali and his team asked large language models including GPT-3.5, GPT-4, Claude, Gemini and Copilot models to write code for a static file server. They tested general prompts and ones explicitly requesting secure code. Even when instructed to prioritize security, 70% of the responses contained vulnerable logic. GPT-3.5 and Copilot in balanced mode failed to generate secure code in any of the tested scenarios.
“Many LLMs are trained on large public codebases like GitHub,” Akhoundali said. If the data includes insecure patterns, the models will inevitably reproduce them, he said.
It is a case of “garbage in, garbage out,” he said. “We can’t simply expect the models to perform better when the data itself lacks quality… Neither humans nor AI are to blame here,” he said. A potential solution to such issues is self-improving models that can make hypotheses, experiment and reason rather than only relying on the untrusted available data on the internet, like AI agents.
The researchers did not find any significant difference in vulnerability reproduction across model providers such as OpenAI, Gemini and Claude. The team also built a system to fix it. It scans public GitHub repositories for path traversal vulnerability patterns and attempts to confirm exploitability using sandboxed tests. If successful, the tool uses GPT-4 to generate a patch. It avoids false positives by validating that the vulnerable code can be triggered in practice, a distinction from pure static analysis.
Designing the patching step proved technically challenging. Early attempts asked the model to rewrite entire files, which led to syntax errors and unintended changes. Later, the team refined its approach by appending line numbers and requesting diffs instead. Even then, ensuring patch correctness required additional heuristics and programmatic checks.
“There are tons of programs, each requires different functionality and logic. Even if we go for customized per-program tests, it’s still not possible to cover all corner cases, which requires a final check from maintainers,” Akhoundali said.
From an initial scan of 40,546 repositories, the tool validated that 1,756 were exploitable path traversal flaws. It generated 1,600 valid patches. Only 63 projects accepted the fix.
Akhoundali attributes the low uptake to several factors. Some repositories were abandoned, others only used the code in test or dev environments, and in a few cases, they couldn’t get a secure channel to reach the maintainers.
To avoid tipping off malicious actors, the team withheld full technical details. They reached out through private channels when possible. “We tried to balance the benefits of notifying the community about the vulnerability against the risks that potential attackers find this vulnerability and exploit it,” Akhoundali said.
The team’s research focused on path traversal vulnerability in JavaScript projects, but the pipeline is extensible. In principle, the approach could be applied to other vulnerability classes or even other programming languages, he said.
As for the role of AI vendors, Akhoundali believes secure-by-default training and model alignment can help. But it will be challenging, he said. Most of the time, secure code means more complex code. Will all developers want it or would they be interested in only functionality? How about performance – sometimes, running secure code makes it much slower, and multiple checks are required, he said.
“Ultimate design choices lie with developers, but vendors play a key enabling role. The design choices and priorities vary with different developers and companies,” he said.