Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
NYC’s New Chatbot, Hallucinating LLMs Just Can’t Be Fixed, Says Linguistics Expert
Employers can now fire a staff member who complains about sexual harassment, take a cut of their workers’ tips and serve customers cheese nibbled on by rats – at least according to the advice doled out by New York City’s artificial intelligence-powered chatbot meant to help small business owners navigate the city’s bureaucratic maze.
See Also: Key Security Challenges and Tooling Approaches for APAC in 2024
The hallucinations in Microsoft Azure-powered MyCity are by no means rare. AI chatbots present misleading and false information as facts about 20% of the time. It’s not even a first for Microsoft: The company shut down 2016 chatbot Tay hours after it was rolled out, when it began to spew out harmful stereotypes based on its training data from X, formerly Twitter. More recently, a federal judge sanctioned and fined two lawyers $5,000 for submitting legal research riddled with nonexistent cases generated by ChatGPT.
New York City Mayor Eric Adams defended the administration’s implementation of the chatbot, saying the city aims to identify and fix the problems and have “the best chatbot system on the globe.” The city chose to leave the tool up and running on the government website, as the chatbot was only “wrong in some areas,” Adams said.
“Any time you use technology, you need to put it into the real environment to iron out the kinks. You can’t live in the lab. You can’t stay in the lab forever. You must be willing to say, ‘I’m going to put it out among the real universe to iron out the next level of perfection,'” Adams told reporters last week.
Emily Bender, a linguistics professor and director of the University of Washington’s Computational Linguistics Laboratory, told Information Security Media Group that a fix is not possible. “Unless they move away from LLM-driven chatbots, these are not bugs that can be fixed, but rather a fundamental mismatch between tech and task. The kinks can’t be worked out, but NYC should clearly have tested the system much more rigorously before piloting it, and then decided not to,” she said.
The generative AI technology, of which chatbots are only a part, will explode in the coming years. The McKinsey Global Institute estimated that it will add up to $4.4 trillion to the global economy. But ironing out kinks such as hallucinations does not involve an easy – or even a likely – fix.
That’s because language model-driven chatbots are designed to make things up. If we want something else from the system – e.g., accurate answer -, we need a different type of system altogether, Bender said. “A language model is a system for modeling the distribution of word forms in text. When it’s used to create text, it is only ever answering the question: “What’s a plausible word to use next?” When those strings of words add up to something that we interpret as true and relevant, that is only by chance,” she told ISMG.
The NYC chatbot includes a disclaimer that it may “occasionally produce incorrect, harmful or biased” information and that its responses must not be considered legal advice. Despite the legal disclaimer, the city is still responsible for the information it imparts to citizens, experts said.
If AI systems interface with critical, public-facing applications, the impact can be drastic, Bender said. In the U.S. last year, the National Eating Disorders Association replaced its helpline with a chatbot. That chatbot advised people in need of help to engage in further disordered eating behaviors.
“Enterprises and government entities should only use language-model driven chatbots in cases where they would be happy to stand behind whatever the chatbot invents, which is basically no cases,” Bender said. The only effective choice is to build systems that actually represent the information they are supposed to be sharing and then generate responses based on those semantic representations. Such systems are necessarily purpose-built and tailored to the task at hand – and this is a strength, not a weakness, she said.
The New York City administration has been “very clear” about the project being a pilot program and about its intent to “learn from it,” said Maria Torres Springer, the deputy mayor for economic development, housing, and workforce development.
“This isn’t the first chatbot in the history of chatbots or technological deployments where we have to improve. We’re seeing this across all forms of artificial intelligence,” she said. The city cannot “wag our finger at technology and say, ‘Oh, it’s too hard,’ because that would be retrograde and a disservice to New Yorkers who expect us to come into the modern age and help them by getting smarter and better at technology.”
But Bender said if the city’s goal is to provide reliable pointers into the 2,000 web pages of regulations, it just needed a search interface to point to the relevant collection of documents. “The chatbot they set up, even though it provided links, also provided misleading ‘summaries’ of what was in those links – effectively misinforming the public and encouraging people to do illegal things,” she said, citing the investigation by The Markup, which first reported the hallucination issues.
DeepMind co-founder Mustafa Suleyman reportedly said that AI hallucinations would “largely be eliminated” by next year. “That estimate is completely unrealistic, because LLMs are designed to make stuff up,” Bender said.