Benchmark Results From Google Show Gemini 2.5 Outperforming Rivals

On Tuesday, Google introduced its “most intelligent” AI reasoning model yet, designed to pause and “think” before responding.
See Also: Future-Proof Your Business: A Comprehensive Guide to Application Modernization and Development for Public and Private Sectors
Reasoning models like Gemini 2.5 have become a key focus in the AI arms race, with companies including OpenAI, Anthropic, DeepSeek and xAI pushing to refine AI systems capable of more thoughtful decision-making. Google said that all its future AI models will incorporate enhanced reasoning capabilities (see: How Test-Time Compute Can Help Scale AI).
Gemini 2.5’s debut follows OpenAI’s launch of its reasoning AI model o1 in September. Competition has intensified since then, with tech firms aiming to develop AI systems capable of complex tasks such as coding and math. Google claims Gemini 2.5 represents its strongest effort to date in this competitive space.
Benchmark results from Google show the model outperforming rivals. On the Aider Polyglot evaluation, which measures code-editing capabilities, Gemini 2.5 Pro scored 68.6%, besting models from OpenAI, Anthropic and DeepSeek. But in software development assessments using the SWE-bench Verified test, it lagged behind Anthropic’s Claude 3.7 Sonnet, scoring 63.8% compared to Claude’s 70.3%.
Gemini 2.5 Pro also performed well on Humanity’s Last Exam, a multimodal evaluation covering subjects from mathematics to the humanities. With a score of 18.8%, it outpaced many competing flagship models, said Google.
The model ships with a 1 million-token context window, capable of processing about 750,000 words in a single input – more than the entire Lord of the Rings series. Google plans to double this capacity to 2 million tokens soon.
Google is yet to disclose API pricing for Gemini 2.5 Pro, stating that details will be available in the coming weeks. The model is currently available through Google AI Studio and the Gemini app for subscribers of the company’s $20-a-month AI plan.
Despite its promise, the reliance on additional computational resources makes reasoning models like Gemini 2.5 expensive to operate (see: New Benchmarks Challenge Brute Force Approach to AI).
OpenAI also introduced “Images in ChatGPT” on Tuesday, which followed Google’s rollout from earlier this month of native image generation in Gemini AI Studio. The new feature enables users to generate images using GPT-4o within the chatbot. The feature is available across all subscription tiers, with usage limits for free users.
The new OpenAI model claimed to improve text rendering and attribute accuracy in generated images. Unlike previous versions, it uses an autoregressive process, generating images in stages, which slows down creation but aims to improve quality.
The OpenAI image content will include digital markers to indicate AI-generated content.