Artificial Intelligence & Machine Learning
,
Governance & Risk Management
,
Next-Generation Technologies & Secure Development
Vulnerability Tool Detected Flaws in OpenAI and Nvidia APIs Used in GitHub Projects
Security researchers have developed an autonomous artificial intelligence tool that can detect remote code flaws and arbitrary zero-day code in software. The AI tool is still gives some inconsistent results, but researchers said it identifies fewer false positives.
See Also: 2024 Threat Landscape: Data Loss is a People Problem
Security firm Protect AI developed the Python static code analyzer called Vulnhuntr, built on Anthropic’s Claude 3.5 Sonnet large language model, to identify vulnerabilities in code and develop proofs of concept for compromises.
The researchers found vulnerabilities in GitHub projects using OpenAI, Nvidia and YandexGPT APIs. For example, an OpenAI file – “get_api_provider_stream_iter function in api_provider.py” – included server-side-request forgery flaw could enable attackers to control API requests and redirect them to arbitrary endpoints.
“Generally, a Vulnhuntr confidence score of 7 means that it’s likely a valid vulnerability but it may require some tweaking of the proof of concept. Confidence scores of 8, 9 or 10 are extremely likely to be valid vulnerabilities, and confidence scores 1 to 6 are unlikely to be valid vulnerabilities,” the researchers said.
To develop the solution, Protect AI researchers had to overcome context windows limitations typically found in LLM models – which limit the amount of information an LLM can parse when processing a prompt or question.
To overcome context windows limitations, researchers used retrieval augmented generation to parse large amounts of text directly into tokens, and they fine-tuned the tool with pre-patch and post-patch code and combined it with vulnerability databases such as CVEFixes. The researchers then isolated sections of code into smaller units.
“Instead of overwhelming the LLM with multiple whole files, it requests only the relevant portions of the code,” Protect AI said. “It automatically searches the project files for files that are likely to be the first to handle user input. Then it ingests that entire file and responds with all the potential vulnerabilities.”
The tool uses four prompts designed to guide the LLM, shape its responses for complex reasoning and filter outputs to identify flaws. Vulnhuntr analyzed data such as functions, classes or other related snippets to gain a full picture of the code or to confirm or deny the presence of any vulnerabilities.
“Once the full picture is clear, it returns a detailed final analysis, pointing out trouble spots, providing a proof-of-concept exploit, and attaching a confidence rating for each vulnerability,” the researchers added.
Accuracy Challenges
Like most AI applications still in an early stages of development, the tool is prone to accuracy and other training data limitations, Protect AI said.
Since the application is trained to identify only seven types of flaws, it cannot identify additional types of vulnerabilities, the researchers said.
Although it can be trained using additional prompts to recognize more flaws, the researchers said this would increase the application run time. Since the tool only supports Python code, the application also generates less accurate data for code that is developed in any other programming languages, researchers added.
“Last, because LLMs aren’t deterministic, one can run the tool multiple times on the exact same project and get different results,” Protect AI said.
Regardless of the limitations of the application, the researchers added Vulnhuntr is an improvement over other static code analyzers for finding complex vulnerabilities and limiting false positives. Protect AI researchers said they plan to add more tokens to enable the tool to parse entire codebases rather than smaller units.