Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
A Preparedness Team Will Warn of Current, Future Dangers in the Firm’s AI Models
OpenAI on Monday released a framework it says will help assess and protect against the “catastrophic risks” posed by the “increasingly powerful” artificial intelligence models it develops.
See Also: Entering the Era of Generative AI-Enabled Security
The ChatGPT maker’s preparedness team looks to monitor the technology’s use and share warnings if it sees danger signs with its AI models’ capabilities, such as allowing bad actors to use them to build chemical and biological weapons, spread malware, or carry out social engineering attacks. The company said in a 27-page preparedness framework that it will also monitor for emerging risks beyond the current dangers and “hypothetical scenarios to concrete measurements and data-driven predictions.”
“We believe the scientific study of catastrophic risks from AI has fallen far short of where we need to be,” OpenAI said. It categorizes risks that could result in “hundreds of billions of dollars in economic damage or lead to the severe harm or death of many individuals – this includes, but is not limited to, existential risk,” as catastrophic.
There has been much discussion of the potential dangers of AI, including from prominent technology leaders such as those who in October called for limits before future AI systems might “learn to feign obedience” to human directives or “exploit weaknesses in our safety objectives and shutdown mechanisms.” Such AI systems could avoid human intervention by spreading their algorithms through wormlike infections that insert and exploit cybersecurity vulnerabilities to control computer systems that underpin communications, media, government and supply chains, wrote 24 academicians and experts, including Yoshua Bengio and Geoffrey Hinton, widely known as the “godfathers of AI” (see: Experts Urge Safeguards Before AI Can ‘Feign Obedience’).
The framework is a matrix approach that will include four risk categories – cybersecurity, persuasion, model autonomy, and chemical, biological, radiological and nuclear threats. It will score each AI model with a low, medium, high or critical grade, both before and after implementing mitigations. Models with a risk score of medium or lower will be deployed, but models with higher risk scores will not.
OpenAI’s CEO will make the day-to-day decisions, but the company’s board will have access to the risk findings and power to veto the chief’s decision.
The team will also coordinate with OpenAI’s Trustworthy AI team to carry out third-party auditing and manage safety drills.
Led by MIT AI professor Aleksander Mądry, the growing preparedness team was set up in October and is one of the company’s three teams that work on security issues. The others include the safety team, which looks to address risks such as racist bias in tools currently in the market, and the superalignment team, which looks at mitigating the risks of future systems whose capabilities may overtake those of humans.
The preparedness framework is part of the company’s overall approach to safety, which includes investments into mitigating bias, hallucinations and misuse, as well as a way to meet the voluntary commitment it made to the Biden administration with 14 other technology companies to build a safe and trustworthy AI ecosystem (see: IBM, Nvidia, Others Commit to Develop ‘Trustworthy’ AI).
The company also partnered in July with other big tech giants such as Google and Microsoft to form an industry watchdog group to help regulate AI development, but it is not part of a recent alliance of more than 50 companies that look to drive open innovation in the industry.