Showing archive results for 2026

Feb 27, 2026

Engineering and algorithmic interventions for multimodal post-training at Microsoft scale

Aditya Challapally

Aditya Challapally leads post-training research and infrastructure for Copilot agent capabilities that process millions of multimodal interactions. This post builds on the diagnostics from Diagnosing instability in production-scale agent reinforcement learning with the engineering and algorithmic interventions we developed to get the best results ...

Feb 11, 2026

How we built the Microsoft Learn MCP Server

Tianqi,

Eric,

Pieter

When we launched the Microsoft Learn Model Context Protocol (MCP) Server in June 2025, our goal was simple: make it effortless for AI agents to use trusted, up-to-date Microsoft Learn documentation. GitHub Copilot and other agents are increasingly common, and they need to be able to ground responses just like humans with browsers do. Learn MCP Serv...

Jan 28, 2026

Diagnosing instability in production-scale agent reinforcement learning

Aditya Challapally

On January 28, 2026, Hugging Face announced that they have upstreamed the Post-Training Toolkit into TRL as a first-party integration, making these diagnostics directly usable in production RL and agent post-training pipelines. This enables closed-loop monitoring and control patterns that are increasingly necessary for long-running and continuously...