Summary
With the release of Visual Studio 2022 version 17.6 we are shipping our new and improved Instrumentation Tool in the Performance Profiler. Unlike the CPU Usage tool, the Instrumentation tool gives exact timing and call counts which can be super useful in spotting blocked time and average function time. To show off the tool let’s use it to improve performance in Visual Studio.
Wait, I thought Visual Studio already had an Instrumentation tool?
If you were thinking, “Doesn’t Visual Studio already have an Instrumentation Tool” then you were thinking right! So, what’s… new? Well, a whole list of things
- Faster and less resources: The tool is significantly faster and uses less disk space, clone the repo and check out the measurements for yourself. Sample App: ScabbleFinderDotNet
- Enhanced targeting for .NET: The tool has better targeting for .NET scenarios so you can scope your instrumentation all the way down to specific functions to lower overhead and get better data.
- Flame Graph: The flame graph helps you graphically see where the most amount of time is spent in your application and quickly narrow on areas to improve.
- Better error handling: The tool helps with common issues such as missing the “/profiler” linker flag for C++ projects. If you do hit issues that you can’t figure out, we are ready to help over at Developer Community.
Let’s go mine for performance!
To start, I have a console application called AnalyzerBench which takes in a diagsession from the Performance Profiler and runs our analysis backend on it then exits. This lets me have a repeatable benchmark that I can measure and see the effects of my changes. The diagsession I have is from tracing every allocation of Visual Studio startup, over 4.3 million allocations, with the .NET Object Allocation tool. Running the Instrumentation tool from the Performance Profiler (Alt+F2) I get the following dialog:
This lets me select what projects I want to instrument, allowing me to target my instrumentation to reduce overhead. In my case I want to see our analysis for the .NET Allocation tool, so I select our DataWarehouse and the DotNetAllocAnalyzer projects but not AnalyzerBench since I don’t care about that. If you are unsure of what you want to profile, the CPU Usage tool is great for getting a general idea of where time is being spent and then you can investigate deeper with the Instrumentation tool targeting specific areas. After running the tool, I see the following:
The top functions show us which functions are taking the most amount of time and the hot path shows us the most expensive code path. My go-to is to open the details panel and switch over to the flame graph which shows the following.
Looking at the flame graph I see a System.Threading.Monitor.Enter
taking ~20% of the time which is interesting. By right clicking on the node, I can cross reference to where this is happening in the call tree.
Turns out our Monitor.Enter
function is showing as a hot function and its parent ImportDataSource
is taking ~17% of the total time. In the call tree you’ll notice there are a few new columns with more hidden in the context menu of the header. Since the instrumentation tool gives us exact call counts, we can calculate stats like min, max, and average function time. Not only does Instrumentation give us exact call counts, but it also measures wall clock time. This lets us see non-CPU bound issues like contention. In this case, based on the call count we can see that there are three calls to ImportDataSource
and they spend about 5s on average waiting on the lock. What is really happening is that the first call gets the lock and the other two wait ~8s for the first data source to complete before they can import. This means 2 thread pool threads are synchronously blocked and can’t do any work which could lead to thread pool starvation.
To fix this we could investigate using some parallel data structures so that the lock isn’t needed or change the method signature so that it is async and use SemaphoreSlim.WaitAsync so that we at least don’t block thread pool threads. As both changes are a bit more involved, we can add a TODO to the code and come back later.
Switching back to the flame graph, the next thing jumping out at me is List.Sort
, from a glance it looks like we are spending ~20% of our time sorting some data. Again, right-clicking on the node and cross referencing over to the call tree we can see our details stats. This shows we are calling sort over 24K times, spending 20 seconds sorting data!
In this code we are sorting the allocations for each unique type which is needed to quickly filter when selecting time ranges on the graph in the profiler. For the most part, though these allocations should be sorted as we get a callback for each allocation and write it to our diagsession file. If there are many allocations at the same time on different threads they might be out of order, but this is probably something we can check for when adding to the list and then only sort the unsorted lists here. Making this change and rerunning the instrumentation tool we can see that now we have nearly removed all the time spent here, and in the case of this trace we didn’t have to sort anything as all of the allocations were already sorted in the file.
/// <summary>
/// Add an allocation instance to the current type
/// </summary>
/// <param name="allocationObject">allocation object</param>
internal void AddAllocation(AllocationObject allocationObject)
{
if (this.Allocations.Count > 0)
{
this.allocationsSorted &= allocationObject.AllocTimeNs >= this.Allocations[this.Allocations.Count - 1].AllocTimeNs;
}
this.Allocations.Add(allocationObject);
}
/// <summary>
/// Finalizes the data for fast retrieval
/// </summary>
public void FinalizeData()
{
if (!this.allocationsSorted)
{
this.Allocations.Sort(TypeObject.comparer);
this.allocationsSorted = true;
}
}
A small change to track the sort state of the lists here has trimmed 20 seconds off a 110 second trace resulting in ~20% performance improvement.
Conclusion
The new Instrumentation tool is awesome (at least I think so 😊) and a little performance investigation can go a long way. In less than an hour of profiling and poking at the code I improved load performance of the .NET Allocation tool by ~20%. Let me know what you found while profiling your code and what factor of improvements you were able to achieve with the new Instrumentation tool!
The old instrumentation didn’t seem to be compatible with ASP.Net solutions. Is this one able to support them?
Is it able to cope with microservice based solutions where multiple services run in Linux docker containers?
Yep, the new Instrumentation tool supports ASP.NET solutions. It doesn’t support Linux docker containers yet, though we are looking into ways of supporting that.
Hopefully, you will be able to support that soon.
It does often seem as though support for features in .Net / Visual Studio, such as instrumentation and hot reload, are always catching up with where I was 3 years ago.
Nik,
Working well, but I would love to be able to save classes/methods/projects that I’ve selected as it is painful navigating thru large solutions, also maybe a search feature with filters, especially with Razor views.
Thanks for making this product great!
Yep, we have been dealing with the same pain 🙂 We just didn’t have enough time to squeeze that feature in but is something we are trying to do for Update 7.
Howdy Nik,
I switched over to the preview channel and got 17.6 pre-release 4. I hit alt-f2 I don’t get the ability to select projects, just the same options as the stable channel. I just get Target Type (start up project, attach to executable, etc, etc)
What am I missing?
Andy
Nik,
Disregard, I just didn’t select the Instrumentation, but once I did I got the prompt. I haven’t used these tools in a while, found so much low hanging fruit, reduced startup time by 35 seconds because of some poorly written code for registering services. Looking forward to seeing how Instrumentation works against a lot of poorly written LINQ!
Now I have another issue, the .NET Core Kestrel is automatically exiting after attempting to serve first page when I use instrumentation.
That’s awesome, glad to hear you got it working. Nice job trimming the low hanging fruit! It’s always so enlightening to see what is actually happening and how a few small tweaks can make such a big difference.
Regarding the app existing after serving the first page can you file a ticket through the VS report an issue. We can then dig into it and see what is happening.
FYI, filed and found a workaround
https://developercommunity.visualstudio.com/t/Performance-Profiler—Instrumentation-c/10348261
We had an ENV variable set
“ASPNETCORE_HOSTINGSTARTUPASSEMBLIES”: “Microsoft.AspNetCore.Mvc.Razor.RuntimeCompilation”,
Thanks Andy! Nice job tracking down the issue 🙂
Does not work on the ARM64 version of Visual Studio 2022 version 17.6! VSInstr.exe is completely missing.
Hey Ferdinand,
Yeah, sorry about that. ARM64 support for C++ is still in progress. I’m hoping we can have a fix out soon for it.
I hope sooner than later. Because sample based performance diagnostics are not supported on Apple M1 with Parallels and Visual Studio 2022 arm64 because of absence of support of the apple PMU hardware on the m1.