Google’s new Gemini Pro model has record benchmark scores—again
Summary
Google released Gemini Pro 3.1, a powerful new AI model praised for significant performance improvements in benchmarks and real professional tasks, amid intense competition.
Google releases Gemini 3.1 Pro in preview
Google has released a new preview of its most powerful large language model, Gemini 3.1 Pro. The company announced the model on Thursday, stating it will be generally available soon.
This release marks a significant step in the ongoing competition among AI giants. It follows recent model launches from rivals like OpenAI and Anthropic, all focused on improving capabilities for complex, multi-step reasoning tasks.
New model shows major performance leap
Early benchmarks suggest Gemini 3.1 Pro is a substantial upgrade over its predecessor, Gemini 3.0. Google shared results from independent tests, including one called "Humanity's Last Exam," showing significantly improved performance.
The model is already topping new leaderboards. Brendan Foody, CEO of AI startup Mercor, posted that Gemini 3.1 Pro now leads its APEX-Agents benchmark, which measures how well AI performs real professional knowledge work.
"The model's impressive results show how quickly agents are improving at real knowledge work," Foody said in a social media post.
The AI model wars intensify
The launch underscores the rapid pace of development in the foundation model space. Companies are racing to release models capable of more autonomous "agentic" work, where AI can plan and execute a series of tasks.
Key competitors in this high-stakes race include:
- OpenAI, which recently released new models like o1.
- Anthropic, with its Claude 3.5 Sonnet model.
- Google, now pushing forward with the Gemini family.
Each new release aims to claim a temporary performance crown, driving the entire field forward at a breakneck speed.
What comes next for Gemini
For now, developers and enterprise customers can access Gemini 3.1 Pro through a preview. Google has not announced a specific date for its full, general release.
The model's performance suggests Google is closing the gap with other top-tier models. Its strong showing on benchmarks designed to mimic real-world professional tasks could make it a compelling option for businesses building AI applications.
The focus now shifts to how the model performs in widespread, real-world testing and what new applications it will enable as the AI arms race continues.
Related Articles
AI breakthrough could replace rare earth magnets in electric vehicles
AI speeds discovery of new magnetic materials, reducing reliance on rare earth elements for tech like EVs. Database lists 67,573 compounds, including 25 new high-temperature magnets.
Why sky-high pay for AI researchers is bad for the future of science
Top AI researchers are leaving academia for high-paying industry jobs, threatening academic innovation and collaboration. This shift risks overemphasizing individual "genius" over team-based science, which is more effective for complex problems.
Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.
