
Feed stories to Llama, measure prediction fails as ‘surprise’—it mirrors human brain scans. Game-changer for narrative AI?
What if LLMs aren’t just word predictors, but surprise detectors syncing with human cognition? A fresh study from Monica Rosenberg’s lab did exactly that—and the results are wild.[2]
They scanned brains of people hearing stories, capturing real-time ‘surprise’ (prediction mismatches). Same stories went to Llama: chunk-by-chunk next-text predictions, with gaps to reality quantifying AI surprise. Compared side-by-side: Where humans/AI diverged revealed cognition gaps.[2]
This matters for devs building story agents, games, or therapy bots—surprise drives engagement/narrative flow. Current LLMs fake it via patterns; this grounds them in brain-like prediction errors, boosting realism in chatbots or procedural content gen.[2]
Vs. black-box benchmarks, this is novel: No standard ‘LLM surprise’ metric existed, pioneering human-AI alignment on core processes. Builds toward ‘alien intelligence’ views, per eval trends.[2][4]
Get hands-on: Replicate with Llama-3 on Hugging Face—script story chunks, compute perplexity deltas as surprise. Fine-tune for brain-aligned narratives. Could this crack true story comprehension?
Source: Medical Xpress