Engineering@Microsoft

How Microsoft empowers its developers to deliver at massive scale

Latest posts

Engineering and algorithmic interventions for multimodal post-training at Microsoft scale
Feb 27, 2026
Post comments count 0
Post likes count 0

Engineering and algorithmic interventions for multimodal post-training at Microsoft scale

Aditya Challapally

Aditya Challapally leads post-training research and infrastructure for Copilot agent capabilities that process millions of multimodal interactions. This post builds on the diagnostics from Diagnosing instability in production-scale agent reinforcement learning with the engineering and algorithmic interventions we developed to get the best results out of post training at scale. Post-training multimodal agents at scale breaks in ways the literature doesn't prepare you for. Not because the algorithms are wrong, they work as described, but because the failure modes only become visible at production scale, under rea...

How we built the Microsoft Learn MCP Server
Feb 11, 2026
Post comments count 2
Post likes count 4

How we built the Microsoft Learn MCP Server

Tianqi,
Eric,
Pieter

When we launched the Microsoft Learn Model Context Protocol (MCP) Server in June 2025, our goal was simple: make it effortless for AI agents to use trusted, up-to-date Microsoft Learn documentation. GitHub Copilot and other agents are increasingly common, and they need to be able to ground responses just like humans with browsers do. Learn MCP Server is a remote server that exposes agent-friendly tools over Streamable HTTP Transport, backed by the same Learn knowledge service described in How we built “Ask Learn”. Why MCP and why Learn MCP Server? Modern AI agents can discover and use tools dynamically through ...

Diagnosing instability in production-scale agent reinforcement learning
Jan 28, 2026
Post comments count 0
Post likes count 2

Diagnosing instability in production-scale agent reinforcement learning

Aditya Challapally

On January 28, 2026, Hugging Face announced that they have upstreamed the Post-Training Toolkit into TRL as a first-party integration, making these diagnostics directly usable in production RL and agent post-training pipelines. This enables closed-loop monitoring and control patterns that are increasingly necessary for long-running and continuously adapted agent systems. Documentation @ https://huggingface.co/docs/trl/main/en/ptt_integration. Overview In production-scale agent reinforcement learning systems, training runs increasingly operate over long horizons, incorporate external tools, and adapt continuousl...

The Interaction Changes Everything: Treating AI Agents as Collaborators, Not Automation
Dec 2, 2025
Post comments count 1
Post likes count 5

The Interaction Changes Everything: Treating AI Agents as Collaborators, Not Automation

Jenny Ferries

Discover how treating AI agents as collaborators, not automation, transforms engineering workflows and accelerates complex projects

Enhancing Code Quality at Scale with AI-Powered Code Reviews
Jul 14, 2025
Post comments count 4
Post likes count 2

Enhancing Code Quality at Scale with AI-Powered Code Reviews

Sneha Tuli

Microsoft’s AI-powered code review assistant has transformed pull request workflows by automating routine checks, suggesting improvements, and enabling conversational Q&A, leading to faster PR completion, improved code quality, and enhanced developer onboarding. Its seamless integration and customizability have driven widespread adoption within Microsoft

How Microsoft Engineers Build AI: Learn about scalable RAG-enabled AI Apps
Mar 3, 2025
Post comments count 0
Post likes count 4

How Microsoft Engineers Build AI: Learn about scalable RAG-enabled AI Apps

Krezzia,
Samit

For developers, the emphasis on building intelligence into apps has never been clearer. Over the next three years, 92% of companies plan on investing in AI to achieve business outcomes like enhancing productivity and delivering better customer service. At Microsoft, developers and engineers are pushing the boundaries of AI at scale, crafting applications that harness the power of cutting-edge machine learning models and advanced AI techniques. To help both newcomers and seasoned AI developers understand these methodologies, we are thrilled to introduce a new video series – How Microsoft engineers build AI. We’l...

Dev Box Ready-To-Code Dev Box images template
Dec 11, 2024
Post comments count 0
Post likes count 3

Dev Box Ready-To-Code Dev Box images template

Dmitry Goncharenko

Microsoft One Engineering System (1ES) team shares a sample for building Ready-To-Code Dev Box environments pre-configured with the necessary tools, repositories, and settings, ensuring consistency and reliability across teams.

Common annotated security keys
Sep 25, 2024
Post comments count 0
Post likes count 0

Common annotated security keys

Michael C. Fanning

In April 2021, GitHub announced changes to their security token format that significantly enhanced security. The improvement leveraged two straightforward techniques: a fixed signature in the generated token and a checksum - both of which are highly effective in eliminating false positives (noise) and false negatives (missed findings). Microsoft also implements these techniques widely in our service providers. Internally, we refer to any key format that incorporates both techniques as 'identifiable' (a term also used in GitHub’s blog post). Identifiable secrets super-power open-source scan tools and more sophist...

Managed DevOps Pools – The Origin Story
Jul 18, 2024
Post comments count 0
Post likes count 10

Managed DevOps Pools – The Origin Story

Suraj,
Eliza

Learn about how Microsoft's 1ES organization developed an internal service called "1ES Hosted Pools" to manage Microsoft's diverse Engineering system infrastructure and how it helped make significant improvements to productivity, cost savings, and security. This solution will soon be available as a third-party offering named "Managed DevOps Pools".

Developing with Accessibility in Mind at Microsoft
May 16, 2024
Post comments count 0
Post likes count 2

Developing with Accessibility in Mind at Microsoft

Nandita Gupta

Celebrate the Global Accessibility Awareness Day GAAD by taking actionable and easy steps to build accessibility into your development life-cycle! Learn how tools like Accessibility Insights & Visual Studio can help find accessibility issues in development.

Copy-on-Write performance and debugging
May 14, 2024
Post comments count 4
Post likes count 4

Copy-on-Write performance and debugging

Erik Mavrinac

This is a follow-up to our previous coverage of Dev Drive and copy-on-write (CoW) linking. See our previous articles from May 24, 2023, October 13, 2023, and November 2, 2023. Dev Drive was released in Windows 11 in October, 2023, and will be part of Windows Server 2025 this fall. Server 2025 and Windows 11 24H2 ship with an enhancement to automatically use copy-on-write linking (CoW-in-Win32). Here, we'll cover the results of several months of repo build performance testing for several large internal codebases, provide some information on determining whether a file is a CoW link, and share a few tips we found f...

How we built “Ask Learn”, the RAG-based knowledge service
Apr 22, 2024
Post comments count 0
Post likes count 13

How we built “Ask Learn”, the RAG-based knowledge service

Sarah,
Bob

My name is Bob Tabor and I’m a member of Microsoft’s Skilling organization. We create documentation and training content about Azure, developer tooling and languages, AI, Windows and much more hosted at Microsoft Learn. Our organization also develops and maintains the content publishing platform, the content hosting platform, the interactivity, and popular sites like Microsoft Q&A. One of the most ambitious and impactful projects our engineers have built recently is Ask Learn, an API that provides generative AI capabilities to Microsoft Q&A and the ground truth necessary to power the new Microsoft Copilo...

Enhancing reliability in Microsoft Fabric and Azure Synapse through load testing
Mar 4, 2024
Post comments count 0
Post likes count 5

Enhancing reliability in Microsoft Fabric and Azure Synapse through load testing

Predrag Vlatković

Microsoft has employed Azure Load Testing to enhance the reliability of Microsoft Fabric and Azure Synapse, ensuring they can handle high loads. Azure Synapse integrates various data analytics technologies, while Microsoft Fabric offers a full enterprise analytics solution. Through rigorous daily and weekly load testing, involving complex scenarios and extensive data sizes, Microsoft aims to identify and rectify potential issues, ensuring optimal performance. This testing, integrated within their development pipelines, supports continuous improvement, leverages Azure's scalability, and utilizes Power BI for detai...

Accessibility Insights now supports WCAG 2.2 AA
Dec 3, 2023
Post comments count 0
Post likes count 6

Accessibility Insights now supports WCAG 2.2 AA

Nandita,
Jacqueline,
Mark

To celebrate the International Day for Persons with Disabilities on December 3rd we have some exciting new announcements for Accessibility Insights, Microsoft’s open-source suite of tools to help developers deliver accessible software! Technology plays a huge role in empowering everyone, including people with disabilities around the globe. Developers can now build with more accessibility in mind using Accessibility Insights for Web: This updated version includes testing support and guidance for WCAG 2.2 within the Assessment feature. We are constantly striving to improve user experience and added features that ma...

Building Paved Paths: The Journey to Platform Engineering
Nov 15, 2023
Post comments count 3
Post likes count 16

Building Paved Paths: The Journey to Platform Engineering

Amanda Silver

Over the past year, AI has taken the world by storm. Our industry is innovating at an unprecedented rate, bringing incredible products to market that make life and work easier and more efficient for real people across a wide range of sectors and job functions. Like previous industry shifts—the introduction of the PC, internet, and search—it’s a pretty safe bet that we’ll look back at this moment and see the world before AI, and a world powered by AI. It’s an all-up mindset shift that fundamentally changes how we interact with technology. In the developer space, GitHub Copilot has become the most widely-adopted AI...

Copy-on-Write in Win32 API Early Access
Nov 2, 2023
Post comments count 5
Post likes count 4

Copy-on-Write in Win32 API Early Access

Erik Mavrinac

(Updated Apr 4 and 26, 2024 with some release news. Also see the next post) On October 25, 2023, the Windows filesystem team released an early preview of copy-on-write (CoW) linking in the Windows 11 Insider Canary channel. This builds automatic CoW linking into the Win32 APIs when using Dev Drive or ReFS. If released next year, this will eliminate the need to update build engines, tools, and runtime frameworks to support CoW. Related release information is here. We released some early benchmarks showing the automatic gain for .NET without the need for explicitly updating tools to use CoW. In testing a large ...

Dev Drive is Now Available
Oct 13, 2023
Post comments count 3
Post likes count 5

Dev Drive is Now Available

Erik Mavrinac

(Edited Oct 31, 2023 to add info about later patch for InTune, Nov 6 and 8, 2023 to add Win11 23H2 image info, Apr 4, 2024 to add info about Server. Also see the next post and the one after.) In a previous post, Dev Drive and Copy-on-Write for Developer Performance, we published early performance numbers for the new Dev Drive feature of Windows 11 and Windows Server. This week’s Windows Update for Windows 11 22H2 includes Dev Drive and you can check by running the command and seeing if the parameter is listed in the help text. If Dev Drive doesn’t appear in the help text, you can explicitly enable it by insta...

Your Most Important Git Repos
Aug 24, 2023
Post comments count 0
Post likes count 2

Your Most Important Git Repos

Bryan Sullivan

What do you keep in your Git repos? Source code for your production applications certainly, but you probably also keep a fair amount of experimental and “hackathon” code. Maybe you keep your documentation in Git. Maybe, like the District of Columbia does, you even keep legal documents there. So which of these are the most important to protect? From the perspectives of access control and change management, clearly, they’re all vital. You might not want prying eyes seeing your internal documentation, and you certainly wouldn’t want them tampering with it. But what about from an application security perspective? Th...