How we use your PerfWatson data to identify Unresponsive areas
At the end of April, we released a telemetry system to monitor and report the performance issues that our customers face during their everyday use of the product. To start off, I would like to thank everyone who installed Visual Studio PerfWatson! These reports have provided valuable insight into where you are encountering issues, and have helped us prioritize our performance investments for the next version of Visual Studio (“vNext”).
How are we using PerfWatson Data?
As many of you have seen, PerfWatson monitors delays on the UI thread, and submits error reports on these delays with the user’s consent. These reports include information on the duration of each delay and a mini heap dump of the root cause for the delay.
When we receive these error reports, we aggregate the data collected, extract the stacks from the mini dump, and analyze the results to identify the largest and most frequently seen delays. We also aggregate the data collected with the delays we have seen using PerfWatson internally. Periodically throughout each milestone, bugs are generated for each team in the division to track the specific subsets of frames (or methods) that are responsible for these performance issues.
Bugs are prioritized based on a combination of the total number of delays we have seen with that frame and the average length of the delay. This ensures that both very long delays and frequently encountered delays receive our attention. Because we have live data associated with each bug, we can track progress across the entire product, within specific teams, and also across internal “dogfood” builds, enabling teams to easily verify the progress they are making.
Below is an example of a bug we have fixed for vNext. The PerfWatson data shows that this bug was reported 16588 times in total for the released build of Visual Studio 2010 SP1. This data is an aggregation of all results we have collected both internally and from our customers.
As the chart below shows, after fixing the bug, the number of delays seen for this issue has gone down significantly or have completely disappeared with every new build of the product created with the code fix.
If you are interested in learning more about this process, or seeing some of the tools in action, check out this Channel9 video, where Cameron McColl from our product team discusses these mechanics in more detail.
So with all the data we have collected so far, what trends have we seen? Below are three areas that have stood out:
While we had received feedback that loading a solution can be slow, PerfWatson has given us a much clearer picture into where, specifically, delays are most likely to occur. A number of steps take place when a solution is opening, each contributing a certain amount of cost to the overall experience. However, when looking at the data, we’ve found that several tasks were more expensive than we expected. For example: reopening files that were open in the last session, or retrieving the solution settings, such as breakpoints and window layout, from SUO file (Solution User Options) can take more than a few milliseconds extending the overall cost of loading the solution.
One way we have found to help improve this experience is to reduce the overall document clutter. There are several ways that a number of files get opened besides clicking on them in Solution Explorer – Go To Definition, Debugging, Find etc. This experience has now been simplified with the introduction of a “preview tab” for opening files that are not yet needed by the user. This new IDE feature was demoed during the TechEd conference held mid-May (see Channel 9 coverage at 00:27:20 minute marker).
In addition to simplifying certain scenarios like opening files, the Visual Studio team is investigating several ways to optimize the solution open scenario for vNext. One method, for example, we are investigating is moving operations away from solution load time to when there is less competition for system resources. The PerfWatson data is helping us understand which areas would benefit from applying these improvements along with making sure there are no regressions.
Idle Loop Processing
Visual Studio has a mechanism for components to queue up non-blocking work for a time when CPU cycles are less constrained (aka, the system is “idle”). These idle operations take place on the UI thread when the message queue is empty. For information on Idle Loop Processing please refer to the MSDN documentation here.
This mechanism worked well when the product was smaller, processors were single core, and only a few operations utilized this mechanism. However, as the product has grown, this mechanism has hit its scalability limits as more tasks have started to schedule recurring computations and expensive IO operations during the “idle loop”.
PerfWatson has enabled us to quickly identify which of these scheduled tasks can become potentially expensive. The following are a few examples that we are tracking as bugs:
- Refreshing the Task and Error Lists.
- Initializing the Toolbox.
- Updating the outlining for VC++ code windows.
- Updating command bars and menus.
- Saving backup co
pies of all open files for auto recovery.
For each task, we look at a variety of ways to improve the performance. For tasks which iterate on large data sets, we look at ways to break assumptions that the operation needs to be completed as a single operation. This allows the task to be partitioned into much finer increments, quickly yielding back to the UI thread. For example, our Auto-recover mechanism has been updated to do this. Rather than saving all changed documents at once, we instead save one at a time, yielding back to the UI thread after each save. For others, we’ve rewritten the tasks to execute asynchronously off of the UI thread, allowing us to take better advantage of multi-core hardware and remove expensive IO operations from the UI thread.
Idle issues are not easy to reproduce; however with the use of PerfWatson data collected both internally and from you, our customers, we have been able to identify and open bugs for over 30 idle specific issues.
PerfWatson data indicates that one of the more frequent areas that cause the IDE to be unresponsive is solution build. For vNext, we have made significant investments to improve the overall build responsiveness by moving the build off the UI thread for VB and C# builds (C++ already does this). This feature was demoed at the TechEd conference (Channel 9 coverage at 00:45:00 minute marker).
As part of doing this work, we found that not only could we make the UI more responsive during build but, for C#, we were even able to parallelize the build, improving both build throughput and memory utilization. (VB was best optimized by moving it off the UI thread).
We hope this blog post has given you some insight into how we’re using the PerfWatson data to make improvements to the product. Thanks again to everybody who has already downloaded and installed the extension. For those of you who have not yet done so, the extension can be downloaded from here. As always, we welcome your comments and feedback!
Software Development Engineer in Test, VS Team