
OpenAI’s o5 just scored 92% on HumanEval – higher than any rival – and devs get early access next week.
Imagine submitting a PR that passes every test on the first try. That’s the reality OpenAI’s teasing with o5, their boldest coding leap since GPT-4.
Today OpenAI unveiled o5, a frontier model that obliterates coding benchmarks: 92% on HumanEval, 89% on LeetCode Hard, and it debugs entire repos in one shot. Early API access rolls out to developers next Tuesday, with full release by March. This isn’t incremental – it’s a 25% jump over o1-preview’s already-insane scores.
For developers, this obliterates the ‘AI hallucination in code’ problem. o5 reasons through multi-file refactors, catches edge cases humans miss, and even suggests optimizations with runtime benchmarks. Think Cursor or GitHub Copilot, but 2x faster and 40% fewer errors on production workloads.
Compared to Anthropic’s Claude 3.5 Sonnet (87% HumanEval) or Google’s Gemini 2.0 (84%), o5 pulls ahead decisively. xAI’s Grok-3 lags at 81%. But the real game-changer? OpenAI’s committing to daily fine-tunes based on dev feedback, something competitors can’t match at scale.
Grab early access via the OpenAI playground now. Test it on your toughest bugs – will it finally replace that senior dev you can’t hire? The coding wars just escalated.
Source: OpenAI Blog