Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
So-Called Mallas Are Easily Bought or Rented
The underground market for illicit large language models is a lucrative one, said academic researchers who called for better safeguards against artificial intelligence misuse.
See Also: The SIEM Selection Roadmap: Five Features That Define Next-Gen Cybersecurity
Academics at the Indiana University Bloomington said they identified 212 malicious LLMs on underground marketplaces from April through September. The financial haul for the threat actor behind one of them, WormGPT, is calculated at $28,000 over just two months, which underscores the allure for bad agents to break artificial intelligence guardrails and also the raw demand propelling them to do so.
Several illicit LLMs on sale were uncensored and built on open-source standards, and some were jailbroken commercial models. Academics behind the paper call the malicious LLMs “Mallas.”
Hackers can maliciously use Mallas to write targeted phishing emails at scale at a fraction of the cost, develop malware and automatically scope and exploit zero-days.
Tech giants developing artificial intelligence models have mechanisms in place to prevent jailbreaking and working on methods to automate detection of jailbreaking prompts. But hackers have also discovered methods to bypass the guardrails.
Microsoft recently detailed hackers using a “skeleton key” to force OpenAI, Meta, Google and Anthropic’s LLMs to respond to illicit requests and reveal harmful information. Researchers from Robust Intelligence and Yale University also identified an automated method for jailbreaking OpenAI, Meta and Google LLMs that doesn’t require specialized knowledge, such as the model parameters.
University of Indiana researchers found two uncensored LLMs: DarkGPT, sold for 78 cents for every 50 messages, and Escape GPT, a subscription service that costs $64.98 a month. Both models produced accurate, malicious code that went undetected by antivirus tools about two-thirds of the time. WolfGPT, available for a $150 flat fee, allowed users to write phishing emails that could evade a majority of spam detectors.
Nearly all of the malicious LLMs the researchers examined were capable of generating malware, and 41.5% could produce phishing emails.
The malicious products and services were primarily built on OpenAI’s GPT-3.5 and GPT-4, Pygmalion-13B, Claude Instant and Claude-2-100k. OpenAI is the LLM vendor that the malicious GPT builders targeted most frequently.
To help prevent and defend against attacks the researchers discovered, they made available to other researchers the dataset of prompts used to create malware through the uncensored LLMs and to bypass the safety features of public LLM APIs. They also urged AI companies to default to releasing models with censorship settings in place and allow access to uncensored models only to the scientific community, with safety protocols in place. Hosting platforms such as FlowGPT and Poe should do more to ensure that Mallas aren’t available through them, they said, adding, “This laissez-faire approach essentially provides a fertile ground for miscreants to misuse the LLMs.”