AI-Based Attacks
,
Artificial Intelligence & Machine Learning
,
Fraud Management & Cybercrime
Data-Poisoning Attacks Are Critical Threat to Machine Learning Security, NIST Warns
Machine learning systems are vulnerable to cyberattacks that could allow hackers to evade security and prompt data leaks, scientists at the National Institute of Standards and Technology warned. There is “no foolproof defense” against some of these attacks, researchers said.
See Also: How to Put Observability to Work in Support of Mission Outcomes
In a new paper on adversarial machine learning threats, NIST researchers detail emerging cyberthreats to predictive and generative AI models that stem from data set training, vulnerabilities in the software components and supply chain weaknesses.
One of the most critical threats is data-poisoning attacks, in which hackers use corrupted data to alter system functions. These threats include evasion tactics to compromise already-deployed AI systems, attacks that can result in information leaks, and abuse tactics in which hackers feed information from a compromised source to induce changes to the AI models.
“Most of these attacks are fairly easy to mount and require minimum knowledge of the AI system and limited adversarial capabilities,” said Alina Oprea, a report co-author and professor at Northeastern University. “Poisoning attacks, for example, can be mounted by controlling a few dozen training samples, which would be a very small percentage of the entire training set.”
NIST researchers said the sophistication and execution of these attacks vary depending on hackers’ level of knowledge about an AI system. NIST categorizes the attackers into three main subsets: white-box hackers with full knowledge about an AI system, sandbox hackers with minimal access, and gray-box hackers who may have information about the AI models but lack access to training data.
When white-box hackers execute data-poisoning attacks, they can mount corrupted data during the training stage to misclassify spam filters and evade security detection, perform prompt injection, and mimic traffic signals to carry out denial-of-service attacks.
Gray-box hackers may use data poisoning to conduct transferability attacks, in which hackers generate malicious code on different systems and transfer it to the target model, allowing attackers to fool the models, the NIST report says.
Using evasion tactics, hackers can get full access to training data and can query APIs to obtain model predictions or to induce misclassification by changing the AI model’s predictions. Such tactics can be used to alter an AI model’s malware classification and detection capabilities, the researchers said.
The NIST researchers said prompt injection to leak sensitive data remains the primary threat in privacy attacks. These include hackers querying chatbots to disclose users’ third-party data and tricking AI models into disclosing sensitive information such as credentials.
Researchers said vulnerabilities emerging from AI supply chains, such as open-source libraries, are another possible attack vector, allowing attackers to jailbreak large language models for remote code execution and data exfiltration. Examples include the remote code execution flaw found in the popular open-source machine learning framework TensorFlow and the arbitrary code flaw in NumPy, an open-source Python library, the researchers said.
“Because the datasets used to train an AI are far too large for people to successfully monitor and filter, there is no foolproof way as yet to protect AI from misdirection,” they said.
The NIST researchers recommend following basic hygiene practices to avoid potential misuse by hackers, including ensuring appropriate human oversight while fine-tuning models, filtering training data to input clean data and removing poisoned samples before the machine learning training is performed.