Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Experts Say AI Model Makers Are Prioritizing Profit Over Security

Hackers don’t need the deep pockets of a nation-state to break GPT-5, OpenAI’s new flagship model. Analysis from artificial intelligence security researchers finds a few well-placed hyphens are enough to convince the large language model into breaking safeguards against adversarial prompts.
See Also: AI Agents Demand Scalable Identity Security Frameworks
Researchers at SPLX inserted hyphens between every character to frame a malicious prompt as an “encryption challenge.” The model complied, bypassing GPT-5’s safety layers.
The SPLX team tested GPT-5 with more than a thousand adversarial prompts. “GPT-5’s raw model is nearly unusable for enterprise out of the box,” was its conclusion.
OpenAI has framed GPT-5 as its most advanced model yet, equipped with auto-routing between fast-response and deep-reasoning modes, and an internal self-checking system designed to validate multiple reasoning paths before answering. It also debuted a new training approach called “safe completions,” meant to keep the model helpful without reflexively refusing questions. SPLX’s findings show that none of these architectural refinements prevent common attack vectors. GPT-5’s raw, unguarded version fell for 89% of SPLX’s attacks.
OpenAI’s prior model GPT-4o was less susceptible to falling for attacks. A basic model without any additional layers of safeguards known as prompt hardening fell for attacks 71% of the time. With advanced layers of prompt hardening, it succumbed to attacks only 3% of the time. Similar safeguards placed on GPT-5 produced a fail rate of 45%.
Dorian Granosa, SPLX’s lead red team data scientist, said most attacks against GPT-5 involve obfuscation or roleplaying. Simple tricks like base64 encoding, leetspeak substitutions and multilingual prompts in low-resource languages all bypassed GPT-5’s guardrails. “We can easily close some of the attack surface by limiting the model to answer those types of messages,” Granosa said. More novel multi-step attacks sailed through, including SPLX’s own “Opposite Red Teamer” technique, which pieces together dangerous outputs across multiple turns.
Red teamers from NeuralTrust found GPT-5 could be manipulated using an “Echo Chamber” strategy combined with storytelling. Instead of disguising a malicious request with symbols, this method embeds a subtly dangerous context into the conversation and reinforces it turn after turn. Since GPT-5 is designed to maintain narrative consistency, it echoes and expands on the poisoned context until it produces harmful output.
In one example, NeuralTrust seeded a story using words like “cocktail,” “survival,” “Molotov,” and “safe.” GPT-5 initially produced harmless sentences, but as the story developed, the researchers asked for elaboration in ways that preserved continuity. Within a few turns, GPT-5 was generating instructions for making Molotov cocktails. The prompt never contained explicit instructions, which meant traditional keyword-based filters were ineffective.
Safety filters often evaluate prompts one message at a time, while attackers play the long game. “These gaps appear when safety checks judge prompts one-by-one while attackers work the whole conversation, nudging the model to keep a story consistent until it outputs something it shouldn’t,” said J Stephen Kowski, Field CTO at AI security SlashNext. He said hardening is the initial policy, enforcing live input and output inspections, monitoring links and tools, and deploying kill-switches when dialogue drifts into risky territory are key for protection.
Part of the challenge is structural. Testing a frontier model is more complex than testing traditional software, and release cycles are accelerating. Maor Volokh, vice president of product at Noma Security, described the pace as a “race to the bottom,” with new models arriving every one or two months. OpenAI alone has launched around seven models this year. The speed prioritizes performance and innovation over safety, he said. “This breakneck speed typically prioritizes performance and innovation over security considerations, leading to an expectation that more model vulnerabilities will emerge as competition intensifies.”
SPLX’s Granosa advocated for dynamic classification, which analyzes user intent. The user experience “will be a bit worse,” he said, “but in exchange for keeping users and their data more secure.”