Open-Source Just Crushed GPT and Claude on PhD-Level Science Reviews

10 Feb, 2026

An open model beat human PhDs 51% of the time at literature reviews—now with a free API devs can build on today.

Imagine feeding your RAG pipeline a model that outsmarts PhD experts on scientific papers. That’s not sci-fi—it’s happening now with OpenScholar.

Researchers from the University of Washington and Allen Institute for AI dropped OpenScholar, an open-source model specialized for scientific literature reviews. In blind tests, it outperformed GPT, Claude, and even human PhD responses 51% of the time. Pair it with a bigger base model, and preference jumps to 70%. This isn’t hype; it’s benchmarked superiority on real academic tasks.[1]

For developers, this is gold. OpenScholar taps into Semantic Scholar’s massive full-text index via a public API—perfect for building research tools, automated reviewers, or knowledge bases. No more proprietary black boxes; you get frontier performance on specialized tasks without vendor lock-in. It’s part of a 2026 open-source surge with MoE architectures slashing costs (DeepSeek-V3 trained for $6M vs. hundreds for GPT-5).[1]

Compare to closed giants: GPT and Claude lag here despite scale. Open models like Gemma 3 (27B) already beat 400B+ on arenas, Qwen3-235B matches reasoning. OpenScholar carves a niche where domain expertise trumps raw size—huge for vertical AI apps.[1]

Grab the Semantic Scholar API today and prototype. Watch DeepSeek-R1 and Kimi-K2 for the next wave. Will open-source own specialized intelligence by year’s end?

Source: Serenities AI

Comments

Share your thoughts using your GitHub account.

Open-Source Just Crushed GPT and Claude on PhD-Level Science Reviews

Related Posts

OpenScholar: The Open-Source AI Crushing Humans at Science Q&A—And It's Free

This New LLM Agent Framework Hits SOTA With Just $18 of Training (Yes, Really)

Google's Sequential Attention Just Made AI Models 10x Leaner Without Losing Power

Comments