Engineering@Microsoft
How Microsoft empowers its developers to deliver at massive scale
Latest posts
Engineering and algorithmic interventions for multimodal post-training at Microsoft scale
Aditya Challapally leads post-training research and infrastructure for Copilot agent capabilities that process millions of multimodal interactions. This post builds on the diagnostics from Diagnosing instability in production-scale agent reinforcement learning with the engineering and algorithmic interventions we developed to get the best results out of post training at scale. Post-training multimodal agents at scale breaks in ways the literature doesn't prepare you for. Not because the algorithms are wrong, they work as described, but because the failure modes only become visible at production scale, under rea...
How we built the Microsoft Learn MCP Server
When we launched the Microsoft Learn Model Context Protocol (MCP) Server in June 2025, our goal was simple: make it effortless for AI agents to use trusted, up-to-date Microsoft Learn documentation. GitHub Copilot and other agents are increasingly common, and they need to be able to ground responses just like humans with browsers do. Learn MCP Server is a remote server that exposes agent-friendly tools over Streamable HTTP Transport, backed by the same Learn knowledge service described in How we built “Ask Learn”. Why MCP and why Learn MCP Server? Modern AI agents can discover and use tools dynamically through ...
Diagnosing instability in production-scale agent reinforcement learning
On January 28, 2026, Hugging Face announced that they have upstreamed the Post-Training Toolkit into TRL as a first-party integration, making these diagnostics directly usable in production RL and agent post-training pipelines. This enables closed-loop monitoring and control patterns that are increasingly necessary for long-running and continuously adapted agent systems. Documentation @ https://huggingface.co/docs/trl/main/en/ptt_integration. Overview In production-scale agent reinforcement learning systems, training runs increasingly operate over long horizons, incorporate external tools, and adapt continuousl...
The Interaction Changes Everything: Treating AI Agents as Collaborators, Not Automation
Discover how treating AI agents as collaborators, not automation, transforms engineering workflows and accelerates complex projects
Enhancing Code Quality at Scale with AI-Powered Code Reviews
Microsoft’s AI-powered code review assistant has transformed pull request workflows by automating routine checks, suggesting improvements, and enabling conversational Q&A, leading to faster PR completion, improved code quality, and enhanced developer onboarding. Its seamless integration and customizability have driven widespread adoption within Microsoft
How Microsoft Engineers Build AI: Learn about scalable RAG-enabled AI Apps
For developers, the emphasis on building intelligence into apps has never been clearer. Over the next three years, 92% of companies plan on investing in AI to achieve business outcomes like enhancing productivity and delivering better customer service. At Microsoft, developers and engineers are pushing the boundaries of AI at scale, crafting applications that harness the power of cutting-edge machine learning models and advanced AI techniques. To help both newcomers and seasoned AI developers understand these methodologies, we are thrilled to introduce a new video series – How Microsoft engineers build AI. We’l...
Dev Box Ready-To-Code Dev Box images template
Microsoft One Engineering System (1ES) team shares a sample for building Ready-To-Code Dev Box environments pre-configured with the necessary tools, repositories, and settings, ensuring consistency and reliability across teams.
Common annotated security keys
In April 2021, GitHub announced changes to their security token format that significantly enhanced security. The improvement leveraged two straightforward techniques: a fixed signature in the generated token and a checksum - both of which are highly effective in eliminating false positives (noise) and false negatives (missed findings). Microsoft also implements these techniques widely in our service providers. Internally, we refer to any key format that incorporates both techniques as 'identifiable' (a term also used in GitHub’s blog post). Identifiable secrets super-power open-source scan tools and more sophist...
Managed DevOps Pools – The Origin Story
Learn about how Microsoft's 1ES organization developed an internal service called "1ES Hosted Pools" to manage Microsoft's diverse Engineering system infrastructure and how it helped make significant improvements to productivity, cost savings, and security. This solution will soon be available as a third-party offering named "Managed DevOps Pools".
Developing with Accessibility in Mind at Microsoft
Celebrate the Global Accessibility Awareness Day GAAD by taking actionable and easy steps to build accessibility into your development life-cycle! Learn how tools like Accessibility Insights & Visual Studio can help find accessibility issues in development.
Copy-on-Write performance and debugging
This is a follow-up to our previous coverage of Dev Drive and copy-on-write (CoW) linking. See our previous articles from May 24, 2023, October 13, 2023, and November 2, 2023. Dev Drive was released in Windows 11 in October, 2023, and will be part of Windows Server 2025 this fall. Server 2025 and Windows 11 24H2 ship with an enhancement to automatically use copy-on-write linking (CoW-in-Win32). Here, we'll cover the results of several months of repo build performance testing for several large internal codebases, provide some information on determining whether a file is a CoW link, and share a few tips we found f...
How we built “Ask Learn”, the RAG-based knowledge service
My name is Bob Tabor and I’m a member of Microsoft’s Skilling organization. We create documentation and training content about Azure, developer tooling and languages, AI, Windows and much more hosted at Microsoft Learn. Our organization also develops and maintains the content publishing platform, the content hosting platform, the interactivity, and popular sites like Microsoft Q&A. One of the most ambitious and impactful projects our engineers have built recently is Ask Learn, an API that provides generative AI capabilities to Microsoft Q&A and the ground truth necessary to power the new Microsoft Copilo...
Enhancing reliability in Microsoft Fabric and Azure Synapse through load testing
Microsoft has employed Azure Load Testing to enhance the reliability of Microsoft Fabric and Azure Synapse, ensuring they can handle high loads. Azure Synapse integrates various data analytics technologies, while Microsoft Fabric offers a full enterprise analytics solution. Through rigorous daily and weekly load testing, involving complex scenarios and extensive data sizes, Microsoft aims to identify and rectify potential issues, ensuring optimal performance. This testing, integrated within their development pipelines, supports continuous improvement, leverages Azure's scalability, and utilizes Power BI for detai...
Accessibility Insights now supports WCAG 2.2 AA
To celebrate the International Day for Persons with Disabilities on December 3rd we have some exciting new announcements for Accessibility Insights, Microsoft’s open-source suite of tools to help developers deliver accessible software! Technology plays a huge role in empowering everyone, including people with disabilities around the globe. Developers can now build with more accessibility in mind using Accessibility Insights for Web: This updated version includes testing support and guidance for WCAG 2.2 within the Assessment feature. We are constantly striving to improve user experience and added features that ma...
Building Paved Paths: The Journey to Platform Engineering
Over the past year, AI has taken the world by storm. Our industry is innovating at an unprecedented rate, bringing incredible products to market that make life and work easier and more efficient for real people across a wide range of sectors and job functions. Like previous industry shifts—the introduction of the PC, internet, and search—it’s a pretty safe bet that we’ll look back at this moment and see the world before AI, and a world powered by AI. It’s an all-up mindset shift that fundamentally changes how we interact with technology. In the developer space, GitHub Copilot has become the most widely-adopted AI...
Copy-on-Write in Win32 API Early Access
(Updated Apr 4 and 26, 2024 with some release news. Also see the next post) On October 25, 2023, the Windows filesystem team released an early preview of copy-on-write (CoW) linking in the Windows 11 Insider Canary channel. This builds automatic CoW linking into the Win32 APIs when using Dev Drive or ReFS. If released next year, this will eliminate the need to update build engines, tools, and runtime frameworks to support CoW. Related release information is here. We released some early benchmarks showing the automatic gain for .NET without the need for explicitly updating tools to use CoW. In testing a large ...
Dev Drive is Now Available
(Edited Oct 31, 2023 to add info about later patch for InTune, Nov 6 and 8, 2023 to add Win11 23H2 image info, Apr 4, 2024 to add info about Server. Also see the next post and the one after.) In a previous post, Dev Drive and Copy-on-Write for Developer Performance, we published early performance numbers for the new Dev Drive feature of Windows 11 and Windows Server. This week’s Windows Update for Windows 11 22H2 includes Dev Drive and you can check by running the command and seeing if the parameter is listed in the help text. If Dev Drive doesn’t appear in the help text, you can explicitly enable it by insta...
Your Most Important Git Repos
What do you keep in your Git repos? Source code for your production applications certainly, but you probably also keep a fair amount of experimental and “hackathon” code. Maybe you keep your documentation in Git. Maybe, like the District of Columbia does, you even keep legal documents there. So which of these are the most important to protect? From the perspectives of access control and change management, clearly, they’re all vital. You might not want prying eyes seeing your internal documentation, and you certainly wouldn’t want them tampering with it. But what about from an application security perspective? Th...
Load testing AAD-based authentication for Azure Cache for Redis
At Microsoft, we continue working on modernizing our services to make them faster, more reliable, and up to date with the latest technologies. In this blog post, we’ll cover how Azure Load Testing helped ensure that the Azure Active Directory (AAD) based authentication mechanism for Azure Cache for Redis met the performance criteria. Azure Cache for Redis is a fully managed, in-memory cache that enables high-performance and scalable architecture. In May 2023, Azure Cache for Redis launched a password-free authentication mechanism by integrating with AAD. This integration also included role-based access control f...
Dev Drive and Copy-on-Write for Developer Performance
At Microsoft Build 2023 the Windows team announced Dev Drive, a new evolution of the Windows ReFS filesystem retuned for developer workloads like Git and builds. This new functionality will ship later this year in the Windows 11 23H2 refresh and is available now for early testing via the Windows Insider program.
Microsoft Dev Box for Microsoft engineers
We’re in an exciting time for technology. But to take advantage of the opportunities, it’s critical for developers to have access to the tools and resources that can help them stay productive and do their best work. At Microsoft, we’re migrating many of our developers to highly productive…
The Journey to Secure the Software Supply Chain at Microsoft
A secure software supply chain represents another facet of Microsoft's built-in security to enhance and maintain trust in our products. It’s a continuation of the journey we embarked upon since the launch of Security Development Lifecycle (SDL) in 2004 and represents our commitment to continually enhance Microsoft’s foundational security.
Implementing an accessible, checkable WPF Tree View
The Accessibility Insights team recently fixed a bug in our Windows Presentation Foundation (WPF) app where checkboxes in a WPF tree view were not properly reporting their checked or unchecked state to adaptive technologies such as screen readers. This longstanding issue created a sub-par accessible experience in Accessibility Insights for Windows, our Windows app introduced in John’s December 13, 2021, post. Previously, we worked around the problem to make our app usable for adaptive technologies, but with this fix we have improved the experience to meet industry standard accessibility patterns. As a team workin...
Learnings from migrating Accessibility Insights for Web to Chrome’s Manifest V3
Since February 2022, the Accessibility Insights team has been migrating Accessibility Insights for Web–our Chrome and Edge extension introduced in Jacqueline's February 14, 2022, post from Manifest V2 (MV2) to Manifest V3 (MV3). We wanted to share learnings and takeaways from our migration journey with a walkthrough…
Microsoft open sources its software bill of materials (SBOM) generation tool
We are excited and proud to open source our software bill of materials (SBOM) generation tool. A key requirement of the Executive Order on Improving the Nation's Cybersecurity, SBOMs are lists of ingredients that make up software components, providing software transparency so organizations have insight into their supply chain dependencies. Our SBOM tool is a general purpose, enterprise-proven, build-time SBOM generator. It works across platforms including Windows, Linux, and Mac, and uses the standard Software Package Data Exchange (SPDX) format. (To see the previous announcement about our SBOM tool, please re...
The pursuit of an autonomic scale and efficiency system for Microsoft 365: Making it as easy as breathing
Through automated profiling and data collection of performance behavior, Microsoft’s M365 Core team can derive the context with which to inform the engineer about the impact of their code, as they write it. Randy Lehner likens it to the autonomic nervous system in this post on their Cloud Profiling and Reporting Pipeline.
Accessibility Insights for Web
In this post, Jacqueline Gibson goes over Accessibility Insights for Web, Microsoft's open-sourced Chrome and Edge extension that helps users find and fix web accessibility issues.