Go back

OpenAI's New 'o5' Model Crushes Coding Benchmarks – And It's Dropping Soon

OpenAI's New 'o5' Model Crushes Coding Benchmarks – And It's Dropping Soon

OpenAI’s o5 just scored 92% on HumanEval – higher than any rival – and devs get early access next week.

Imagine submitting a PR that passes every test on the first try. That’s the reality OpenAI’s teasing with o5, their boldest coding leap since GPT-4.

Today OpenAI unveiled o5, a frontier model that obliterates coding benchmarks: 92% on HumanEval, 89% on LeetCode Hard, and it debugs entire repos in one shot. Early API access rolls out to developers next Tuesday, with full release by March. This isn’t incremental – it’s a 25% jump over o1-preview’s already-insane scores.

For developers, this obliterates the ‘AI hallucination in code’ problem. o5 reasons through multi-file refactors, catches edge cases humans miss, and even suggests optimizations with runtime benchmarks. Think Cursor or GitHub Copilot, but 2x faster and 40% fewer errors on production workloads.

Compared to Anthropic’s Claude 3.5 Sonnet (87% HumanEval) or Google’s Gemini 2.0 (84%), o5 pulls ahead decisively. xAI’s Grok-3 lags at 81%. But the real game-changer? OpenAI’s committing to daily fine-tunes based on dev feedback, something competitors can’t match at scale.

Grab early access via the OpenAI playground now. Test it on your toughest bugs – will it finally replace that senior dev you can’t hire? The coding wars just escalated.

Source: OpenAI Blog


Share this post on:

Previous Post
Mistral Drops Mixtral-8x22B: The Open Source Beast That Fits on a Single GPU
Next Post
Anthropic's New Tool-Use API Lets Claude Build Your Entire App Stack - Game Changer

Related Posts

Comments

Share your thoughts using your GitHub account.