Engineering@Microsoft

How Microsoft empowers its developers to deliver at massive scale

Latest posts

Feb 27, 2026
Post comments count 0
Post likes count 0

Engineering and algorithmic interventions for multimodal post-training at Microsoft scale

Aditya Challapally

Aditya Challapally leads post-training research and infrastructure for Copilot agent capabilities that process millions of multimodal interactions. This post builds on the diagnostics from Diagnosing instability in production-scale agent reinforcement learning with the engineering and algorithmic interventions we developed to get the best results out of post training at scale. Post-training multimodal agents at scale breaks in ways the literature doesn't prepare you for. Not because the algorithms are wrong, they work as described, but because the failure modes only become visible at production scale, under rea...

Feb 11, 2026
Post comments count 2
Post likes count 4

How we built the Microsoft Learn MCP Server

Tianqi,
Eric,
Pieter

When we launched the Microsoft Learn Model Context Protocol (MCP) Server in June 2025, our goal was simple: make it effortless for AI agents to use trusted, up-to-date Microsoft Learn documentation. GitHub Copilot and other agents are increasingly common, and they need to be able to ground responses just like humans with browsers do. Learn MCP Server is a remote server that exposes agent-friendly tools over Streamable HTTP Transport, backed by the same Learn knowledge service described in How we built “Ask Learn”. Why MCP and why Learn MCP Server? Modern AI agents can discover and use tools dynamically through ...

Jan 28, 2026
Post comments count 0
Post likes count 2

Diagnosing instability in production-scale agent reinforcement learning

Aditya Challapally

On January 28, 2026, Hugging Face announced that they have upstreamed the Post-Training Toolkit into TRL as a first-party integration, making these diagnostics directly usable in production RL and agent post-training pipelines. This enables closed-loop monitoring and control patterns that are increasingly necessary for long-running and continuously adapted agent systems. Documentation @ https://huggingface.co/docs/trl/main/en/ptt_integration. Overview In production-scale agent reinforcement learning systems, training runs increasingly operate over long horizons, incorporate external tools, and adapt continuousl...

Dec 2, 2025
Post comments count 1
Post likes count 5

The Interaction Changes Everything: Treating AI Agents as Collaborators, Not Automation

Jenny Ferries

Discover how treating AI agents as collaborators, not automation, transforms engineering workflows and accelerates complex projects

Jul 14, 2025
Post comments count 4
Post likes count 2

Enhancing Code Quality at Scale with AI-Powered Code Reviews

Sneha Tuli

Microsoft’s AI-powered code review assistant has transformed pull request workflows by automating routine checks, suggesting improvements, and enabling conversational Q&A, leading to faster PR completion, improved code quality, and enhanced developer onboarding. Its seamless integration and customizability have driven widespread adoption within Microsoft

Mar 3, 2025
Post comments count 0
Post likes count 4

How Microsoft Engineers Build AI: Learn about scalable RAG-enabled AI Apps

Krezzia,
Samit

For developers, the emphasis on building intelligence into apps has never been clearer. Over the next three years, 92% of companies plan on investing in AI to achieve business outcomes like enhancing productivity and delivering better customer service. At Microsoft, developers and engineers are pushing the boundaries of AI at scale, crafting applications that harness the power of cutting-edge machine learning models and advanced AI techniques. To help both newcomers and seasoned AI developers understand these methodologies, we are thrilled to introduce a new video series – How Microsoft engineers build AI. We’l...

Dec 11, 2024
Post comments count 0
Post likes count 3

Dev Box Ready-To-Code Dev Box images template

Dmitry Goncharenko

Microsoft One Engineering System (1ES) team shares a sample for building Ready-To-Code Dev Box environments pre-configured with the necessary tools, repositories, and settings, ensuring consistency and reliability across teams.

Sep 25, 2024
Post comments count 0
Post likes count 0

Common annotated security keys

Michael C. Fanning

In April 2021, GitHub announced changes to their security token format that significantly enhanced security. The improvement leveraged two straightforward techniques: a fixed signature in the generated token and a checksum - both of which are highly effective in eliminating false positives (noise) and false negatives (missed findings). Microsoft also implements these techniques widely in our service providers. Internally, we refer to any key format that incorporates both techniques as 'identifiable' (a term also used in GitHub’s blog post). Identifiable secrets super-power open-source scan tools and more sophist...

Jul 18, 2024
Post comments count 0
Post likes count 10

Managed DevOps Pools – The Origin Story

Suraj,
Eliza

Learn about how Microsoft's 1ES organization developed an internal service called "1ES Hosted Pools" to manage Microsoft's diverse Engineering system infrastructure and how it helped make significant improvements to productivity, cost savings, and security. This solution will soon be available as a third-party offering named "Managed DevOps Pools".

May 16, 2024
Post comments count 0
Post likes count 2

Developing with Accessibility in Mind at Microsoft

Nandita Gupta

Celebrate the Global Accessibility Awareness Day GAAD by taking actionable and easy steps to build accessibility into your development life-cycle! Learn how tools like Accessibility Insights & Visual Studio can help find accessibility issues in development.

May 14, 2024
Post comments count 4
Post likes count 4

Copy-on-Write performance and debugging

Erik Mavrinac

This is a follow-up to our previous coverage of Dev Drive and copy-on-write (CoW) linking. See our previous articles from May 24, 2023, October 13, 2023, and November 2, 2023. Dev Drive was released in Windows 11 in October, 2023, and will be part of Windows Server 2025 this fall. Server 2025 and Windows 11 24H2 ship with an enhancement to automatically use copy-on-write linking (CoW-in-Win32). Here, we'll cover the results of several months of repo build performance testing for several large internal codebases, provide some information on determining whether a file is a CoW link, and share a few tips we found f...

Apr 22, 2024
Post comments count 0
Post likes count 13

How we built “Ask Learn”, the RAG-based knowledge service

Sarah,
Bob

My name is Bob Tabor and I’m a member of Microsoft’s Skilling organization. We create documentation and training content about Azure, developer tooling and languages, AI, Windows and much more hosted at Microsoft Learn. Our organization also develops and maintains the content publishing platform, the content hosting platform, the interactivity, and popular sites like Microsoft Q&A. One of the most ambitious and impactful projects our engineers have built recently is Ask Learn, an API that provides generative AI capabilities to Microsoft Q&A and the ground truth necessary to power the new Microsoft Copilo...

Mar 4, 2024
Post comments count 0
Post likes count 5

Enhancing reliability in Microsoft Fabric and Azure Synapse through load testing

Predrag Vlatković

Microsoft has employed Azure Load Testing to enhance the reliability of Microsoft Fabric and Azure Synapse, ensuring they can handle high loads. Azure Synapse integrates various data analytics technologies, while Microsoft Fabric offers a full enterprise analytics solution. Through rigorous daily and weekly load testing, involving complex scenarios and extensive data sizes, Microsoft aims to identify and rectify potential issues, ensuring optimal performance. This testing, integrated within their development pipelines, supports continuous improvement, leverages Azure's scalability, and utilizes Power BI for detai...

Dec 3, 2023
Post comments count 0
Post likes count 6

Accessibility Insights now supports WCAG 2.2 AA

Nandita,
Jacqueline,
Mark

To celebrate the International Day for Persons with Disabilities on December 3rd we have some exciting new announcements for Accessibility Insights, Microsoft’s open-source suite of tools to help developers deliver accessible software! Technology plays a huge role in empowering everyone, including people with disabilities around the globe. Developers can now build with more accessibility in mind using Accessibility Insights for Web: This updated version includes testing support and guidance for WCAG 2.2 within the Assessment feature. We are constantly striving to improve user experience and added features that ma...

Nov 15, 2023
Post comments count 3
Post likes count 16

Building Paved Paths: The Journey to Platform Engineering

Amanda Silver

Over the past year, AI has taken the world by storm. Our industry is innovating at an unprecedented rate, bringing incredible products to market that make life and work easier and more efficient for real people across a wide range of sectors and job functions. Like previous industry shifts—the introduction of the PC, internet, and search—it’s a pretty safe bet that we’ll look back at this moment and see the world before AI, and a world powered by AI. It’s an all-up mindset shift that fundamentally changes how we interact with technology. In the developer space, GitHub Copilot has become the most widely-adopted AI...

Nov 2, 2023
Post comments count 5
Post likes count 4

Copy-on-Write in Win32 API Early Access

Erik Mavrinac

(Updated Apr 4 and 26, 2024 with some release news. Also see the next post) On October 25, 2023, the Windows filesystem team released an early preview of copy-on-write (CoW) linking in the Windows 11 Insider Canary channel. This builds automatic CoW linking into the Win32 APIs when using Dev Drive or ReFS. If released next year, this will eliminate the need to update build engines, tools, and runtime frameworks to support CoW. Related release information is here. We released some early benchmarks showing the automatic gain for .NET without the need for explicitly updating tools to use CoW. In testing a large ...

Oct 13, 2023
Post comments count 3
Post likes count 5

Dev Drive is Now Available

Erik Mavrinac

(Edited Oct 31, 2023 to add info about later patch for InTune, Nov 6 and 8, 2023 to add Win11 23H2 image info, Apr 4, 2024 to add info about Server. Also see the next post and the one after.) In a previous post, Dev Drive and Copy-on-Write for Developer Performance, we published early performance numbers for the new Dev Drive feature of Windows 11 and Windows Server. This week’s Windows Update for Windows 11 22H2 includes Dev Drive and you can check by running the command and seeing if the parameter is listed in the help text. If Dev Drive doesn’t appear in the help text, you can explicitly enable it by insta...

Aug 24, 2023
Post comments count 0
Post likes count 2

Your Most Important Git Repos

Bryan Sullivan

What do you keep in your Git repos? Source code for your production applications certainly, but you probably also keep a fair amount of experimental and “hackathon” code. Maybe you keep your documentation in Git. Maybe, like the District of Columbia does, you even keep legal documents there. So which of these are the most important to protect? From the perspectives of access control and change management, clearly, they’re all vital. You might not want prying eyes seeing your internal documentation, and you certainly wouldn’t want them tampering with it. But what about from an application security perspective? Th...

Jul 31, 2023
Post comments count 0
Post likes count 8

Load testing AAD-based authentication for Azure Cache for Redis

Rohit Anand

At Microsoft, we continue working on modernizing our services to make them faster, more reliable, and up to date with the latest technologies. In this blog post, we’ll cover how Azure Load Testing helped ensure that the Azure Active Directory (AAD) based authentication mechanism for Azure Cache for Redis met the performance criteria. Azure Cache for Redis is a fully managed, in-memory cache that enables high-performance and scalable architecture. In May 2023, Azure Cache for Redis launched a password-free authentication mechanism by integrating with AAD. This integration also included role-based access control f...

May 24, 2023
Post comments count 4
Post likes count 8

Dev Drive and Copy-on-Write for Developer Performance

Erik Mavrinac

At Microsoft Build 2023 the Windows team announced Dev Drive, a new evolution of the Windows ReFS filesystem retuned for developer workloads like Git and builds. This new functionality will ship later this year in the Windows 11 23H2 refresh and is available now for early testing via the Windows Insider program.

May 23, 2023
Post comments count 0
Post likes count 4

Microsoft Dev Box for Microsoft engineers

Josh Zimmerman

We’re in an exciting time for technology. But to take advantage of the opportunities, it’s critical for developers to have access to the tools and resources that can help them stay productive and do their best work. At Microsoft, we’re migrating many of our developers to highly productive…

May 22, 2023
Post comments count 0
Post likes count 5

The Journey to Secure the Software Supply Chain at Microsoft

Adrian Diglio

A secure software supply chain represents another facet of Microsoft's built-in security to enhance and maintain trust in our products. It’s a continuation of the journey we embarked upon since the launch of Security Development Lifecycle (SDL) in 2004 and represents our commitment to continually enhance Microsoft’s foundational security.

Mar 15, 2023
Post comments count 0
Post likes count 0

Implementing an accessible, checkable WPF Tree View

Sarah Oslund

The Accessibility Insights team recently fixed a bug in our Windows Presentation Foundation (WPF) app where checkboxes in a WPF tree view were not properly reporting their checked or unchecked state to adaptive technologies such as screen readers. This longstanding issue created a sub-par accessible experience in Accessibility Insights for Windows, our Windows app introduced in John’s December 13, 2021, post. Previously, we worked around the problem to make our app usable for adaptive technologies, but with this fix we have improved the experience to meet industry standard accessibility patterns. As a team workin...

Dec 15, 2022
Post comments count 0
Post likes count 2

Learnings from migrating Accessibility Insights for Web to Chrome’s Manifest V3

Sarah Oslund

Since February 2022, the Accessibility Insights team has been migrating Accessibility Insights for Web–our Chrome and Edge extension introduced in Jacqueline's February 14, 2022, post from Manifest V2 (MV2) to Manifest V3 (MV3). We wanted to share learnings and takeaways from our migration journey with a walkthrough…

Jul 12, 2022
Post comments count 3
Post likes count 3

Microsoft open sources its software bill of materials (SBOM) generation tool

Danesh,
Adrian

We are excited and proud to open source our software bill of materials (SBOM) generation tool. A key requirement of the Executive Order on Improving the Nation's Cybersecurity, SBOMs are lists of ingredients that make up software components, providing software transparency so organizations have insight into their supply chain dependencies. Our SBOM tool is a general purpose, enterprise-proven, build-time SBOM generator. It works across platforms including Windows, Linux, and Mac, and uses the standard Software Package Data Exchange (SPDX) format. (To see the previous announcement about our SBOM tool, please re...

Mar 29, 2022
Post comments count 0
Post likes count 3

The pursuit of an autonomic scale and efficiency system for Microsoft 365: Making it as easy as breathing

Randy Lehner

Through automated profiling and data collection of performance behavior, Microsoft’s M365 Core team can derive the context with which to inform the engineer about the impact of their code, as they write it. Randy Lehner likens it to the autonomic nervous system in this post on their Cloud Profiling and Reporting Pipeline.

Feb 14, 2022
Post comments count 0
Post likes count 0

Accessibility Insights for Web

Jacqueline Gibson

In this post, Jacqueline Gibson goes over Accessibility Insights for Web, Microsoft's open-sourced Chrome and Edge extension that helps users find and fix web accessibility issues.

Feb 1, 2022
Post comments count 0
Post likes count 3

Improving developer productivity via flaky test management

Suresh Thummalapenta

Flaky tests are a well-known problem across the industry and Microsoft is no exception. In this post, Suresh Thummalapenta walks us through the team's comprehensive flaky test management system that helps to infer, triage, and quarantine those tests.

Dec 13, 2021
Post comments count 1
Post likes count 1

Accessibility Insights for Windows

John Alkire

In this post, John Alkire walks through the features of Accessibility Insights for Windows, which enables users to inspect and test Windows applications to find and fix accessibility issues.

Oct 25, 2021
Post comments count 2
Post likes count 1

CloudTest: A multi-tenant, scalable, performant and extensible verification service

Sina Jafari

In this post, Sina Jafari discusses key characteristics of the CloudTest infrastructure used at Microsoft and why similar characteristics should be considered in all large-scale test infrastructures to improve engineers’ productivity and help them ship high-quality software.

Oct 13, 2021
Post comments count 3
Post likes count 6

Generating Software Bills of Materials (SBOMs) with SPDX at Microsoft

Adrian Diglio

In this post, Adrian Diglio walks us through how Microsoft is planning to generate SBOMs not just to meet the U.S. Presidential Executive Order on Improving the Nation's Cybersecurity, but for all software that Microsoft produces.

Sep 27, 2021
Post comments count 0
Post likes count 1

Caesar, standards, and SAST: The road to SARIF

Michael C. Fanning

In this post, Michael Fanning gives us a short history on standards (think Julius Caesar), how consensus on something very small can enable something very large, and how all of it relates to the design of the ‘Static Analysis Results Interchange Format’ (SARIF).

Sep 16, 2021
Post comments count 0
Post likes count 1

You can’t have security for DevOps until you have DevOps for security

Bryan Sullivan

The faster we iterate on refining secure development practices, the faster our developers can address security pain points, and the better we protect our customers. In this post, Bryan Sullivan walks through key learnings from the 1ES Security team.

Aug 18, 2021
Post comments count 0
Post likes count 0

Large-scale distributed builds with Microsoft Build Accelerator

Michael Pysson

Learn how Microsoft evolved it’s build caching algorithm with BuildXL to support large-scale distributed builds.

Jul 19, 2021
Post comments count 0
Post likes count 0

Shifting accessibility left with Accessibility Insights

Mark Reay

We believe that we can only solve the problem of inaccessible software by shifting accessibility left into the software design and development cycle. In this post, Mark Reay describes how our open-source offering, Accessibility Insights, can help.

Jul 6, 2021
Post comments count 1
Post likes count 0

Separating the signal from the noise

Bryan Sullivan

If a security tool catches a critical vulnerability, but also reports 99 other findings that turn out to be false positives, developers are going to ignore everything that the tool reports and then miss the important issues. Bryan Sullivan talks through how you can hone your tooling to separate the signal from the noise.