July 31st, 2025
0 reactions

Smarter AI Edits in Visual Studio Copilot

When we first set out to get smarter AI edits in Visual Studio Copilot, we knew we were tackling a deeply complex problem. It wasn’t just about generating great suggestions—it was about figuring out how to seamlessly apply those suggestions to your code. While the idea seemed simple at first glance, the reality was anything but.   

The Complexity of Implementing AI-Generated Edits 

Let’s rewind to our early attempts. Copilot might give you a fantastic suggestion—a new method, a helpful refactor, or maybe even a corrected logic block. But the real challenge began when we tried to insert those changes into the existing file without breaking anything. Where does the edit go? What happens if the file has been updated since the suggestion was generated? What if the model’s output inadvertently introduces conflicts, overlaps existing code, or even forgets a required bracket? These questions made the process tricky, and the resulting red squiggles made it difficult to trust AI suggestions. 

In the early days, we approached this problem in the most straightforward way possible: brute force. We used heuristics and rule-based techniques like string matching and pattern recognition to identify where edits might belong. It worked… kind of. The results were often inconsistent, especially for complex edits spanning multiple lines. And as Copilot’s capabilities expanded to support more languages and scenarios, maintaining these rules became a moving target—always shifting, always growing harder to manage. Success rates hovered around 50%, which was far from ideal. In hindsight, it became increasingly clear that we couldn’t outpace the ever-evolving models with static rules.   

Better Models Meet Smarter Techniques 

As AI technology advanced, we saw an opportunity to revisit this problem with a fresh perspective. Two major developments in AI made it possible to rethink how we could get smarter edits in Visual Studio Copilot: modern models with larger context windows and a groundbreaking technique called speculative decoding.   

Speculative decoding became a game changer for speeding up AI-assisted edits. It works by pairing a fast model with a more sophisticated one. The fast model generates token predictions for the edit ahead of time, while the smarter model steps in only when needed to refine or verify those predictions. This collaboration improved average token generation speed by 2-3x and made it feasible to implement the generated edits using the model, which significantly enhanced accuracy when integrating changes into your files.   

By using speculative decoding, we adopted the model-based approach to applying edits that was able to fill in the gaps that the previous approach could not reach. Instead of relying on rule sets, we employed an AI model to simulate an “ideal” version of your file with the chosen suggestions seamlessly integrated. A smart diff algorithm then compares this ideal version with your actual file to pinpoint and precisely map the edits. This allowed the entire process to handle edge cases—like overlapping code or missing syntax—more intelligently than ever before.      

Balancing Accuracy and Speed 

While speculative decoding dramatically improved performance, applying AI edits still came with a natural trade-off: using models introduced latency. Previously, all string computation was done locally, meaning edits appeared almost instantly. Now, every edit involved network calls and token generation—a process that unfolds incrementally as a token stream, rather than as a single, instant response.   

To make this experience feel meaningful for users, we implemented a streaming animation in the editor. As edits are detected on the token stream, users see their document updating line by line in real time. This animation serves two purposes: showcasing progress and providing visibility into exactly how the changes are being applied. Although this approach trades the speed we had before for greater accuracy, feedback has shown that users value precision over raw speed—especially when it comes to maintaining code quality and reducing disruption.      

Looking Ahead

This work laid the foundation for Agent Mode in Visual Studio Copilot—a feature that enables the AI to not only suggest improvements but actively assist in executing them. For Agent Mode to function effectively, the edits needed to be precise, seamless, and reliable enough for the agent to build, debug, and test the code. Speculative decoding has already made significant strides in balancing speed and accuracy, but we’re not stopping there.

A faster implementation of our speculative decoding technique is on the horizon. This updated version will leverage advancements in token generation and model pairing to further reduce latency while maintaining accuracy users have come to expect. Early tests show promise of a 2-4x speed boost compared to the original rollout, moving closer to the ideal experience where precision meets near-instant response times. 

Author

Jessie Houghton
Product Manager II

Jessie is a Product Manager on the Visual Studio version control team, focusing on Git tooling and GitHub integration in the IDE.

Rhea Patel
Product Manager

0 comments