Engineering@Microsoft
How Microsoft empowers its developers to deliver at massive scale
Latest posts
Engineering and algorithmic interventions for multimodal post-training at Microsoft scale
Aditya Challapally leads post-training research and infrastructure for Copilot agent capabilities that process millions of multimodal interactions. This post builds on the diagnostics from Diagnosing instability in production-scale agent reinforcement learning with the engineering and algorithmic interventions we developed to get the best results out of post training at scale. Post-training multimodal agents at scale breaks in ways the literature doesn't prepare you for. Not because the algorithms are wrong, they work as described, but because the failure modes only become visible at production scale, under rea...
How we built the Microsoft Learn MCP Server
When we launched the Microsoft Learn Model Context Protocol (MCP) Server in June 2025, our goal was simple: make it effortless for AI agents to use trusted, up-to-date Microsoft Learn documentation. GitHub Copilot and other agents are increasingly common, and they need to be able to ground responses just like humans with browsers do. Learn MCP Server is a remote server that exposes agent-friendly tools over Streamable HTTP Transport, backed by the same Learn knowledge service described in How we built “Ask Learn”. Why MCP and why Learn MCP Server? Modern AI agents can discover and use tools dynamically through ...
Diagnosing instability in production-scale agent reinforcement learning
On January 28, 2026, Hugging Face announced that they have upstreamed the Post-Training Toolkit into TRL as a first-party integration, making these diagnostics directly usable in production RL and agent post-training pipelines. This enables closed-loop monitoring and control patterns that are increasingly necessary for long-running and continuously adapted agent systems. Documentation @ https://huggingface.co/docs/trl/main/en/ptt_integration. Overview In production-scale agent reinforcement learning systems, training runs increasingly operate over long horizons, incorporate external tools, and adapt continuousl...
The Interaction Changes Everything: Treating AI Agents as Collaborators, Not Automation
Discover how treating AI agents as collaborators, not automation, transforms engineering workflows and accelerates complex projects
Enhancing Code Quality at Scale with AI-Powered Code Reviews
Microsoft’s AI-powered code review assistant has transformed pull request workflows by automating routine checks, suggesting improvements, and enabling conversational Q&A, leading to faster PR completion, improved code quality, and enhanced developer onboarding. Its seamless integration and customizability have driven widespread adoption within Microsoft
How Microsoft Engineers Build AI: Learn about scalable RAG-enabled AI Apps
For developers, the emphasis on building intelligence into apps has never been clearer. Over the next three years, 92% of companies plan on investing in AI to achieve business outcomes like enhancing productivity and delivering better customer service. At Microsoft, developers and engineers are pushing the boundaries of AI at scale, crafting applications that harness the power of cutting-edge machine learning models and advanced AI techniques. To help both newcomers and seasoned AI developers understand these methodologies, we are thrilled to introduce a new video series – How Microsoft engineers build AI. We’l...
Dev Box Ready-To-Code Dev Box images template
Microsoft One Engineering System (1ES) team shares a sample for building Ready-To-Code Dev Box environments pre-configured with the necessary tools, repositories, and settings, ensuring consistency and reliability across teams.
Common annotated security keys
In April 2021, GitHub announced changes to their security token format that significantly enhanced security. The improvement leveraged two straightforward techniques: a fixed signature in the generated token and a checksum - both of which are highly effective in eliminating false positives (noise) and false negatives (missed findings). Microsoft also implements these techniques widely in our service providers. Internally, we refer to any key format that incorporates both techniques as 'identifiable' (a term also used in GitHub’s blog post). Identifiable secrets super-power open-source scan tools and more sophist...
Managed DevOps Pools – The Origin Story
Learn about how Microsoft's 1ES organization developed an internal service called "1ES Hosted Pools" to manage Microsoft's diverse Engineering system infrastructure and how it helped make significant improvements to productivity, cost savings, and security. This solution will soon be available as a third-party offering named "Managed DevOps Pools".
Developing with Accessibility in Mind at Microsoft
Celebrate the Global Accessibility Awareness Day GAAD by taking actionable and easy steps to build accessibility into your development life-cycle! Learn how tools like Accessibility Insights & Visual Studio can help find accessibility issues in development.
Copy-on-Write performance and debugging
This is a follow-up to our previous coverage of Dev Drive and copy-on-write (CoW) linking. See our previous articles from May 24, 2023, October 13, 2023, and November 2, 2023. Dev Drive was released in Windows 11 in October, 2023, and will be part of Windows Server 2025 this fall. Server 2025 and Windows 11 24H2 ship with an enhancement to automatically use copy-on-write linking (CoW-in-Win32). Here, we'll cover the results of several months of repo build performance testing for several large internal codebases, provide some information on determining whether a file is a CoW link, and share a few tips we found f...
How we built “Ask Learn”, the RAG-based knowledge service
My name is Bob Tabor and I’m a member of Microsoft’s Skilling organization. We create documentation and training content about Azure, developer tooling and languages, AI, Windows and much more hosted at Microsoft Learn. Our organization also develops and maintains the content publishing platform, the content hosting platform, the interactivity, and popular sites like Microsoft Q&A. One of the most ambitious and impactful projects our engineers have built recently is Ask Learn, an API that provides generative AI capabilities to Microsoft Q&A and the ground truth necessary to power the new Microsoft Copilo...
Enhancing reliability in Microsoft Fabric and Azure Synapse through load testing
Microsoft has employed Azure Load Testing to enhance the reliability of Microsoft Fabric and Azure Synapse, ensuring they can handle high loads. Azure Synapse integrates various data analytics technologies, while Microsoft Fabric offers a full enterprise analytics solution. Through rigorous daily and weekly load testing, involving complex scenarios and extensive data sizes, Microsoft aims to identify and rectify potential issues, ensuring optimal performance. This testing, integrated within their development pipelines, supports continuous improvement, leverages Azure's scalability, and utilizes Power BI for detai...
Accessibility Insights now supports WCAG 2.2 AA
To celebrate the International Day for Persons with Disabilities on December 3rd we have some exciting new announcements for Accessibility Insights, Microsoft’s open-source suite of tools to help developers deliver accessible software! Technology plays a huge role in empowering everyone, including people with disabilities around the globe. Developers can now build with more accessibility in mind using Accessibility Insights for Web: This updated version includes testing support and guidance for WCAG 2.2 within the Assessment feature. We are constantly striving to improve user experience and added features that ma...
Building Paved Paths: The Journey to Platform Engineering
Over the past year, AI has taken the world by storm. Our industry is innovating at an unprecedented rate, bringing incredible products to market that make life and work easier and more efficient for real people across a wide range of sectors and job functions. Like previous industry shifts—the introduction of the PC, internet, and search—it’s a pretty safe bet that we’ll look back at this moment and see the world before AI, and a world powered by AI. It’s an all-up mindset shift that fundamentally changes how we interact with technology. In the developer space, GitHub Copilot has become the most widely-adopted AI...
Copy-on-Write in Win32 API Early Access
(Updated Apr 4 and 26, 2024 with some release news. Also see the next post) On October 25, 2023, the Windows filesystem team released an early preview of copy-on-write (CoW) linking in the Windows 11 Insider Canary channel. This builds automatic CoW linking into the Win32 APIs when using Dev Drive or ReFS. If released next year, this will eliminate the need to update build engines, tools, and runtime frameworks to support CoW. Related release information is here. We released some early benchmarks showing the automatic gain for .NET without the need for explicitly updating tools to use CoW. In testing a large ...
Dev Drive is Now Available
(Edited Oct 31, 2023 to add info about later patch for InTune, Nov 6 and 8, 2023 to add Win11 23H2 image info, Apr 4, 2024 to add info about Server. Also see the next post and the one after.) In a previous post, Dev Drive and Copy-on-Write for Developer Performance, we published early performance numbers for the new Dev Drive feature of Windows 11 and Windows Server. This week’s Windows Update for Windows 11 22H2 includes Dev Drive and you can check by running the command and seeing if the parameter is listed in the help text. If Dev Drive doesn’t appear in the help text, you can explicitly enable it by insta...
Your Most Important Git Repos
What do you keep in your Git repos? Source code for your production applications certainly, but you probably also keep a fair amount of experimental and “hackathon” code. Maybe you keep your documentation in Git. Maybe, like the District of Columbia does, you even keep legal documents there. So which of these are the most important to protect? From the perspectives of access control and change management, clearly, they’re all vital. You might not want prying eyes seeing your internal documentation, and you certainly wouldn’t want them tampering with it. But what about from an application security perspective? Th...