Improving Visual Studio performance with the new Instrumentation Tool
With the release of Visual Studio 2022 version 17.6 we are shipping our new and improved Instrumentation Tool in the Performance Profiler. Unlike the CPU Usage tool, the Instrumentation tool gives exact timing and call counts which can be super useful in spotting blocked time and average function time. To show off the tool let’s use it to improve performance in Visual Studio.
Wait, I thought Visual Studio already had an Instrumentation tool?
If you were thinking, “Doesn’t Visual Studio already have an Instrumentation Tool” then you were thinking right! So, what’s… new? Well, a whole list of things
- Faster and less resources: The tool is significantly faster and uses less disk space, clone the repo and check out the measurements for yourself. Sample App: ScabbleFinderDotNet
- Enhanced targeting for .NET: The tool has better targeting for .NET scenarios so you can scope your instrumentation all the way down to specific functions to lower overhead and get better data.
- Flame Graph: The flame graph helps you graphically see where the most amount of time is spent in your application and quickly narrow on areas to improve.
- Better error handling: The tool helps with common issues such as missing the “/profiler” linker flag for C++ projects. If you do hit issues that you can’t figure out, we are ready to help over at Developer Community.
Let’s go mine for performance!
To start, I have a console application called AnalyzerBench which takes in a diagsession from the Performance Profiler and runs our analysis backend on it then exits. This lets me have a repeatable benchmark that I can measure and see the effects of my changes. The diagsession I have is from tracing every allocation of Visual Studio startup, over 4.3 million allocations, with the .NET Object Allocation tool. Running the Instrumentation tool from the Performance Profiler (Alt+F2) I get the following dialog:
This lets me select what projects I want to instrument, allowing me to target my instrumentation to reduce overhead. In my case I want to see our analysis for the .NET Allocation tool, so I select our DataWarehouse and the DotNetAllocAnalyzer projects but not AnalyzerBench since I don’t care about that. If you are unsure of what you want to profile, the CPU Usage tool is great for getting a general idea of where time is being spent and then you can investigate deeper with the Instrumentation tool targeting specific areas. After running the tool, I see the following:
The top functions show us which functions are taking the most amount of time and the hot path shows us the most expensive code path. My go-to is to open the details panel and switch over to the flame graph which shows the following.
Looking at the flame graph I see a
System.Threading.Monitor.Enter taking ~20% of the time which is interesting. By right clicking on the node, I can cross reference to where this is happening in the call tree.
Turns out our
Monitor.Enter function is showing as a hot function and its parent
ImportDataSource is taking ~17% of the total time. In the call tree you’ll notice there are a few new columns with more hidden in the context menu of the header. Since the instrumentation tool gives us exact call counts, we can calculate stats like min, max, and average function time. Not only does Instrumentation give us exact call counts, but it also measures wall clock time. This lets us see non-CPU bound issues like contention. In this case, based on the call count we can see that there are three calls to
ImportDataSource and they spend about 5s on average waiting on the lock. What is really happening is that the first call gets the lock and the other two wait ~8s for the first data source to complete before they can import. This means 2 thread pool threads are synchronously blocked and can’t do any work which could lead to thread pool starvation.
To fix this we could investigate using some parallel data structures so that the lock isn’t needed or change the method signature so that it is async and use SemaphoreSlim.WaitAsync so that we at least don’t block thread pool threads. As both changes are a bit more involved, we can add a TODO to the code and come back later.
Switching back to the flame graph, the next thing jumping out at me is
List.Sort, from a glance it looks like we are spending ~20% of our time sorting some data. Again, right-clicking on the node and cross referencing over to the call tree we can see our details stats. This shows we are calling sort over 24K times, spending 20 seconds sorting data!
In this code we are sorting the allocations for each unique type which is needed to quickly filter when selecting time ranges on the graph in the profiler. For the most part, though these allocations should be sorted as we get a callback for each allocation and write it to our diagsession file. If there are many allocations at the same time on different threads they might be out of order, but this is probably something we can check for when adding to the list and then only sort the unsorted lists here. Making this change and rerunning the instrumentation tool we can see that now we have nearly removed all the time spent here, and in the case of this trace we didn’t have to sort anything as all of the allocations were already sorted in the file.
/// Add an allocation instance to the current type
/// <param name="allocationObject">allocation object</param>
internal void AddAllocation(AllocationObject allocationObject)
if (this.Allocations.Count > 0)
this.allocationsSorted &= allocationObject.AllocTimeNs >= this.Allocations[this.Allocations.Count - 1].AllocTimeNs;
/// Finalizes the data for fast retrieval
public void FinalizeData()
this.allocationsSorted = true;
A small change to track the sort state of the lists here has trimmed 20 seconds off a 110 second trace resulting in ~20% performance improvement.
The new Instrumentation tool is awesome (at least I think so 😊) and a little performance investigation can go a long way. In less than an hour of profiling and poking at the code I improved load performance of the .NET Allocation tool by ~20%. Let me know what you found while profiling your code and what factor of improvements you were able to achieve with the new Instrumentation tool!