Microsoft’s Agent Lightning: Add Reinforcement Learning to Agents Without Rewriting Code

12 Dec, 2025

Agent Lightning lets existing LLM-based agents learn via RL with minimal code changes, improving task accuracy across real workloads.

Microsoft Research Asia introduced Agent Lightning, an open-source framework that enables reinforcement learning (RL) for existing LLM-based agents without requiring major code rewrites by separating task execution from model training and assigning credit to individual LLM calls after task completion[2]. The core LightningRL algorithm performs hierarchical credit assignment: it evaluates how much each agent step contributed to the final outcome, assigns a reward to those steps, and then applies single-step RL algorithms (e.g., PPO or GRPO) to improve behavior. This design lets teams add online learning to agent stacks built with popular frameworks such as LangChain, AutoGen, and OpenAI Agents SDK with minimal integration friction[2].

Agent Lightning was evaluated on three real-world scenarios—text-to-SQL, retrieval-augmented multi-hop QA, and mathematical question solving with tool use—where it produced consistent performance gains by optimizing when and how agents call tools, compose queries, and revise outputs[2]. By making RL modular and compatible with common agent frameworks, Microsoft aims to enable continuous improvement of deployed agents and to provide a shared platform for experimenting with automatic prompt optimization and additional RL algorithms in future releases. The project is positioned for practitioners who need to make agents learn from experience without rebuilding their entire stack, addressing a key operational barrier to deploying adaptive agentic systems in production[2].

Source: microsoft.com

Comments

Share your thoughts using your GitHub account.

Microsoft’s Agent Lightning: Add Reinforcement Learning to Agents Without Rewriting Code

Related Posts

Agent Lightning: RL for AI Agents Without Code Changes

Agent Lightning Adds RL to AI Agents Seamlessly

Open-Weight Mistral Large 3 Model Advances Multimodal and Efficient LLM Inference

Comments