OpenAI’s New Cost-Efficient AI Reasoning Model Excels in Math, Coding, and Science

OpenAI released a new reasoning model Friday that provides faster response times, enhancing reasoning capabilities and improved safety features.
See Also: Uncovering Risk With Social Due Diligence
The San Francisco-based generative AI behemoth said the o3-mini is optimized for STEM, coding and structured problem solving with new developer tools, customizable reasoning efforts and integrated search, making it a cost-effective alternative for technical and problem-solving tasks. The news comes just a week after DeepSeek made its R1 model widely available after spending little on development.
“Evaluations by expert testers showed that o3-mini produces more accurate and clearer answers, with stronger reasoning abilities, than OpenAI o1-mini,” OpenAI wrote in an announcement Friday. “Testers preferred o3-mini’s responses to o1-mini 56% of the time and observed a 39% reduction in major errors on difficult real-world questions.”
o3-mini is now available for ChatGPT Plus, Team, and Pro users, and will roll out to Azure OpenAI Service & Enterprise users in February 2025, according to OpenAI. The model offers more flexible reasoning, structured outputs, and better developer controls, OpenAI said (see: Microsoft CEO: AI Scaling Laws Drive Efficiency, Lower Costs).
“OpenAI o3-mini is our first small reasoning model that supports highly requested developer features including function calling, Structured Outputs, and developer messages, making it production-ready out of the gate,” OpenAI wrote in its announcement. “Like OpenAI o1-mini and OpenAI o1-preview, o3-mini will support streaming.”
O3-mini is highly optimized for math, science and coding, outperforming early models while maintaining lower costs and faster response times, according to OpenAI. It outperforms OpenAI o1-mini – which was unveiled in September – and matches or exceeds OpenAI o1 at higher reasoning levels, OpenAI found.
“OpenAI reasoning models are trained with reinforcement learning to perform complex reasoning,” the company wrote in a research paper released Friday. “Models in this family think before they answer – they can produce a long chain of thought before responding to the user. Through training, the models learn to refine their thinking process, try different strategies, and recognize their mistakes.”
How OpenAI o3 Approaches Safety, Security
OpenAI o3-mini delivers responses 24% faster than o1-mini, making it more efficient while maintaining intelligence levels close to OpenAI o1, according to the company. O3-mini is faster and more efficient than o1-mini, with lower latency and higher throughput, according to OpenAI.
“Developers can choose between three reasoning effort options—low, medium, and high—to optimize for their specific use cases,” OpenAI wrote in its announcement. “This flexibility allows o3-mini to ‘think harder’ when tackling complex challenges or prioritize speed when latency is a concern.”
O3-mini surpasses previous models in safety, jailbreak resistance, and refusal behavior, but still carries medium risk in some areas, OpenAI found. The reasoning model incorporates deliberative alignment – meaning it’s trained to reason about human-written safety specifications before answering user prompts – resulted in improved safety, robustness, and refusal consistency in handling sensitive content.
“Reasoning allows these models to follow specific guidelines and model policies we’ve set, helping them act in line with our safety expectations,” OpenAI wrote in its research paper. “This means they are better at providing helpful answers and resisting attempts to bypass safety rules, to avoid producing unsafe or inappropriate content.”
OpenAI classifies o3-mini as a medium risk for persuasion, autonomy, and chemical, biological and nuclear threats since it can produce human-level arguments, demonstrate strong coding and reasoning, and help experts in threat planning. Cybersecurity, meanwhile, was classified as low risk under OpenAI’s preparedness framework since the o3-mini does not advance real-world exploitation capabilities.
“OpenAI o3-mini performs chain-of-thought reasoning in context, which leads to strong performance across both capabilities and safety benchmarks,” OpenAI wrote in its research paper. “These increased capabilities come with significantly improved performance on safety benchmarks, but also increase certain types of risk.”