Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
A Fake Software Library Made Up by a ChatBot Was Downloaded More Than 35,000 Times
Generative artificial intelligence is good at sounding authoritative – even when it’s making stuff up. Hapless attorneys have submitted court briefs referencing legal precedents invented out of thin air. News outlet CNET issued a correction after published wildly inaccurate personal finance advice generated by AI.
One community that thinks so-called AI hallucinations are actually a good thing: hackers. Especially when developers use AI tools that hallucinate entire software libraries. Threat actors can ensure those libraries come into existence – with extra malicious functions.
See Also: Generative AI Survey Result Analysis: Google Cloud
One security researcher probing AI-hallucinated libraries said late last month that he found chat bots calling for a nonexistent Python package dubbed “huggingface-cli.” Bar Lanyado of Lasso Security uploaded an empty package with that name, wondering what would happen. The result? More than 35,000 authentic downloads in three months, he told Information Security Media Group.
AI coding tools including ChatGPT are popular among programmers to help automate tasks, understand code logic, identify errors and even assist with writing the code itself. The results of a summer 2023 GitHub programmer survey show that “92% of U.S.-based developers are already using AI coding tools both in and outside of work,” and 70% of them say that AI provides, “significant benefits” to their code. “Many times when users get a code example or a recommendation for a package from these tools they trust them and copy paste many times without checking the answers,” Lanyado said.
Several large companies used or recommended the fake package in their repositories, including Alibaba, Lanyado said. Alibaba did not respond to a request for comment. Lasso Security contacted all companies that used the fake package, including Alibaba, Lanyado said.
Lanyado said he predicted something like this would happen, but the scale of adoption and its integration into the development environments of known enterprises surprised him. “Following my previous research, l anticipated that OpenAI and similar models would address the challenge of hallucinated answers, yet this research unequivocally reveals they did not,” he said.
He conducted his previous research in mid-2023 while employed by cybersecurity company Vulcan and first identified the term “AI package hallucination.”
For this round of research, in addition to uploading an empty software package, Lanyado asked GPT-3.5-Turbo, GPT-4, Gemini Pro or Bard and Coral or Cohere models, via their APIs, to find hallucinated packages. He discovered that the two GPT models and Cohere produced hallucinated output about 20% of the time, and Gemini did so 64.5% of the time.
Still, not all hallucinated packages can be exploited by hackers. The number of hallucinated libraries for .Net
and the G0 programming language were high, but there is no centralized package repository for Go. “When we checked the hallucinated packages we received in Go, we found that many of them were pointed to repositories that don’t exist but the username in path does or pointed to domains that were already taken,” Lanyado said.
There is a centralized .Net
repository, but many of the hallucinated packages begin with reserve prefixes controlled by companies such as Google and Microsoft, “which means an attacker that finds these hallucinations will not be able to upload packages to these paths as well.”
Lanyado said he conducted this test to show how easy it is to carry out and how dangerous it is, but he has yet to find an attacker who has used this technique for malicious purposes. Maybe they’ve just been good at hiding their activity. “It’s complicated to identify such an attack, as it doesn’t leave a lot of footsteps,” he told ISMG.
It is imperative that developers cross-verify information while using LLMs and open-source software, Lanyado said.
“On the one hand it is hard to detect a malicious package if it is obfuscated well or if the attacker performed it in a tricky way – look what happened right now with XZ package,” he said (see: Backdoor Found and Defused in Widely Used Linux Utility XZ).
Developers should look at the details in the package repository; check the publish date, the commits, the maintainers, the community and how many downloads it has; and look for suspicious signs. “If you see something suspicious, think twice before you download it,” Lanyado said.