{"id":19725,"date":"2018-10-23T14:04:49","date_gmt":"2018-10-23T21:04:49","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/dotnet\/?p=19725"},"modified":"2019-02-19T19:18:43","modified_gmt":"2019-02-20T02:18:43","slug":"net-core-source-code-analysis-with-intel-vtune-amplifier","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/net-core-source-code-analysis-with-intel-vtune-amplifier\/","title":{"rendered":".NET Core Source Code Analysis with Intel\u00ae VTune\u2122 Amplifier"},"content":{"rendered":"<blockquote><p>This post was written by\u00a0<a href=\"https:\/\/github.com\/vkvenkat\">Varun Venkatesan<\/a>, <a href=\"https:\/\/github.com\/litian2025\">Li Tian<\/a>, <a href=\"https:\/\/github.com\/dp7\">Denis Pravdin<\/a>, who are engineers at Intel.\u00a0They are excited to share .NET Core-specific enhancements that Intel has made to VTune Amplifier 2019.\u00a0<span>You can use this tool to use to make .NET Core applications faster on Intel processors.<\/span><\/p><\/blockquote>\n<p><b><span style=\"font-size: 12.0pt;color: #1f497d\">Update (2019.01.14)<\/span><\/b><span style=\"font-size: 12.0pt;color: #1f497d\">: <\/span><span style=\"font-size: 12.0pt;color: black\"><a href=\"https:\/\/software.intel.com\/en-us\/vtune-amplifier-help-whats-new\">VTune\u2122 Amplifier 2019 Update 2<\/a><\/span><span style=\"font-size: 12.0pt;color: #1f497d\"> is now available and includes support for Tiered Compilation on Windows and Linux. <\/span><span style=\"font-size: 12.0pt;color: black\"><a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2018\/08\/02\/tiered-compilation-preview-in-net-core-2-1\/\">Tiered Compilation<\/a><\/span><span style=\"font-size: 12.0pt;color: #1f497d\"> is expected to be turned on by default in future .NET Core releases. It can be turned on by setting the COMPlus_TieredCompilation environment variable from .NET Core 2.1 onwards. We recommend that .NET Core developers move to the <\/span><span style=\"font-size: 12.0pt;color: black\"><a href=\"https:\/\/software.intel.com\/en-us\/vtune-amplifier-help-whats-new\">latest version<\/a><\/span><span style=\"font-size: 12.0pt;color: #1f497d\"> of VTune\u2122 Amplifier for Tiered Compilation profiling to avoid seeing unresolved managed modules and functions in the profile. <\/span><span style=\"font-size: 12.0pt;color: black\"><\/span><\/p>\n<p><span style=\"font-size: 12.0pt;color: #1f497d\">\u00a0<\/span><span style=\"font-size: 12.0pt;color: black\"><\/span>Last year in the <span>.NET blog<\/span>, we discussed .NET Core Performance Profiling with Intel\u00ae VTune\u2122 Amplifier 2018 including profiling Just-In-Time (JIT) compiled .NET Core code on Microsoft Windows* and Linux* operating systems. This year <span><a href=\"https:\/\/software.intel.com\/en-us\/intel-vtune-amplifier-2019-release-notes-what-s-new\">Intel VTune\u2122 Amplifier 2019<\/a><\/span> was launched on September 12th, 2018 with improved source code analysis for .NET Core applications. It includes .NET Core support for profiling a remote Linux target and analyzing the results on a Windows host. We will walk you through a few scenarios to see how these new VTune Amplifier features can be used to optimize .NET Core applications.<\/p>\n<p>Note that VTune Amplifier is a commercial product. In some cases, you may be eligible to obtain a free copy of VTune Amplifier under specific terms. To see if you qualify, please refer to <span><a href=\"https:\/\/software.intel.com\/en-us\/qualify-for-free-software\">https:\/\/software.intel.com\/en-us\/qualify-for-free-software<\/a><\/span> and choose download options at <span><a href=\"https:\/\/software.intel.com\/en-us\/vtune\/choose-download\">https:\/\/software.intel.com\/en-us\/vtune\/choose-download<\/a><\/span>.<\/p>\n<h2>Background<\/h2>\n<p>Before this release, source code analysis on VTune Amplifier hotspots for JIT compiled .NET Core code was not supported on Linux and limited support on Windows. Hotspot functions were only available at the assembly-level and not at source-level, as shown in the figure below.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot1.png\" alt=\"\" width=\"882\" height=\"748\" class=\"aligncenter size-full wp-image-19735\" \/><\/p>\n<p>VTune Amplifier 2019 addresses this issue and provides full source code analysis for JIT compiled code on both Windows and Linux. It also supports remote profiling a Linux target from a Windows host. Let\u2019s see how these features work using sample .NET Core applications on local Linux host, local Windows host and remote Linux profiling with Windows host analysis.<\/p>\n<p>Here is the hardware\/software configuration for the test system:<\/p>\n<ul>\n<li>Processor: Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz<\/li>\n<li>Memory: 32 GB<\/li>\n<li>Ubuntu* 16.04 LTS (64-bit)<\/li>\n<li>Microsoft Windows 10 Pro Version 1803 (64-bit)<\/li>\n<li>.NET Core SDK 2.1.401<\/li>\n<\/ul>\n<h2>Profiling .NET Core applications on a local Linux host<\/h2>\n<p>Let\u2019s create a sample .NET Core application on Linux that multiplies two matrices using the code available <span><a href=\"https:\/\/gist.github.com\/vkvenkat\/f4beadb3fb178a70010002d7753980b2\">here<\/a><\/span>. Following is the C# source code snippet of interest:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/sample1.png\" alt=\"\" width=\"617\" height=\"350\" class=\"aligncenter size-full wp-image-19745\" \/><\/p>\n<p>Now let\u2019s refer to the instructions from our earlier <span><a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2017\/10\/23\/net-core-performance-profiling-with-intel-vtune-amplifier-2018\/\">.NET blog<\/a><\/span> to build and run this application using the .NET Core command-line interface (CLI). Next let\u2019s use VTune Amplifier to profile this application using the Launch Application target type and the Hardware Event-Based Sampling mode as detailed in the following picture.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/launch_application.png\" alt=\"\" width=\"1000\" height=\"729\" class=\"aligncenter size-full wp-image-19755\" \/><\/p>\n<p>Here are the hotspots under the Process\/Module\/Function\/Thread\/Call Stack grouping:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot2.png\" alt=\"\" width=\"764\" height=\"261\" class=\"aligncenter size-full wp-image-19765\" \/><\/p>\n<p>Now let\u2019s take a look at the source-level hotspots for the Program::Multiply function, which is a major contributor to overall CPU time.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot3.png\" alt=\"\" width=\"961\" height=\"683\" class=\"aligncenter size-full wp-image-19775\" \/><\/p>\n<p>The above figure shows that most of the time is being spent in line 62 which performs matrix arithmetic operations. This source-assembly mapping helps both .NET Core application and compiler developers to identify their source-level hotspots and determine optimization opportunities.<\/p>\n<p>Now, let\u2019s use the new source code analysis feature to examine the assembly snippets corresponding to the highlighted source line.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot4.png\" alt=\"\" width=\"1008\" height=\"784\" class=\"aligncenter size-full wp-image-19785\" \/><\/p>\n<p>From the above profile, it is clear that reducing the time spent in matrix arithmetic operations would help lower overall application time. One of the possible optimizations here would be to replace the rectangular array data structure used to represent individual matrices with jagged arrays. The C# source code snippet below shows how to do this (complete code is available <span><a href=\"https:\/\/gist.github.com\/vkvenkat\/f35b8ff9552e33c6a11453bcea7d25fa\">here<\/a><\/span>).<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/sample2.png\" alt=\"\" width=\"632\" height=\"354\" class=\"aligncenter size-full wp-image-19795\" \/><\/p>\n<p>Here is the updated list of hotspot functions from VTune Amplifier:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot5.png\" alt=\"\" width=\"782\" height=\"243\" class=\"aligncenter size-full wp-image-19805\" \/><\/p>\n<p>We can see that the overall application time has reduced by about 21%<sup>1<\/sup> (from 16.660 s to 13.175 s).<\/p>\n<p>The following figure shows the source-assembly mapping for the Program::Multiply function. We see that there is a corresponding reduction in CPU time for the highlighted source line which performs matrix arithmetic operations. Note that the size of the JIT generated code has been reduced too.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot6.png\" alt=\"\" width=\"1002\" height=\"718\" class=\"aligncenter size-full wp-image-19815\" \/><\/p>\n<p>This is a brief description about the feature on Linux. Similar analysis with the matrix multiplication samples above could be done on Windows and we leave that as an exercise for you to try. Now, let\u2019s use a different example to see how source code analysis works on Windows.<\/p>\n<h2>Profiling .NET Core applications on a local Windows host<\/h2>\n<p>Let\u2019s create a sample .NET Core application on Windows that reverses an integer array using the code available <span><a href=\"https:\/\/gist.github.com\/vkvenkat\/960b9c4704f8e63066833bb4c6e65db8\">here<\/a><\/span>. Following is the C# source code snippet of interest:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/sample3.png\" alt=\"\" width=\"529\" height=\"229\" class=\"aligncenter size-full wp-image-19825\" \/><\/p>\n<p>Now let\u2019s refer to the instructions from our earlier <span><a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2017\/10\/23\/net-core-performance-profiling-with-intel-vtune-amplifier-2018\/\">.NET blog<\/a><\/span> to build and run this application using the .NET Core command-line interface (CLI). Next let\u2019s use VTune Amplifier to profile this application using the Launch Application target type and the Hardware Event-Based Sampling mode as detailed in the following picture. Additionally, we need to provide the source file location on Windows using the Search Sources\/Binaries button before profiling.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot7.png\" alt=\"\" width=\"977\" height=\"709\" class=\"aligncenter size-full wp-image-19835\" \/><\/p>\n<p>Here are the hotspots under the Process\/Module\/Function\/Thread\/Call Stack grouping:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot8.png\" alt=\"\" width=\"797\" height=\"245\" class=\"aligncenter size-full wp-image-19845\" \/><\/p>\n<p>Now let\u2019s take a look at the source-level hotspots for the Program::IterativeReverse function, which is a major contributor to overall CPU time.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot9.png\" alt=\"\" width=\"908\" height=\"935\" class=\"aligncenter size-full wp-image-19855\" \/><\/p>\n<p>The above figure shows that most of the time is being spent in line 48 which performs array element re-assignment. Now, let\u2019s use the new source code analysis feature to examine the assembly snippets corresponding to the highlighted source line.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot10.png\" alt=\"\" width=\"1093\" height=\"933\" class=\"aligncenter size-full wp-image-19865\" \/><\/p>\n<p>One of the possible optimizations here would be to reverse the integer array by using recursion, rather than iterating over the array contents. The C# source code snippet below shows how to do this (complete code is available <span><a href=\"https:\/\/gist.github.com\/vkvenkat\/02a7541e095b1cc5775f88ce9f69e931\">here<\/a><\/span>).<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/sample4.png\" alt=\"\" width=\"592\" height=\"277\" class=\"aligncenter size-full wp-image-19875\" \/><\/p>\n<p>Here is the updated list of hotspot functions from VTune Amplifier:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot11.png\" alt=\"\" width=\"801\" height=\"244\" class=\"aligncenter size-full wp-image-19885\" \/><\/p>\n<p>We can see that the overall application time has reduced by about 42%<sup>2<\/sup> (from 13.095 s to 7.600 s).<\/p>\n<p>The following figure shows the source-assembly mapping for the Program::RecursiveReverse function.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot151.png\" alt=\"\" width=\"1056\" height=\"933\" class=\"aligncenter size-full wp-image-19955\" \/><\/p>\n<p>As we can see, the reduction in time is reflected in the source lines above, giving developers a clear picture on how their application performs.<\/p>\n<h2>Profiling .NET Core applications on a remote Linux target and analyzing the results on a Windows host<\/h2>\n<p>Sometimes .NET Core developers may need to collect performance data on remote target systems and later finalize the data on a different machine in order to work around resource constraints on the target system or to reduce overhead when finalizing the collected data. VTune Amplifier 2019 has added .NET Core support to collect profiling data from a remote Linux target system and analyze the results on a Windows host system. This section illustrates how to leverage this capability using the matrix multiplication .NET Core application discussed earlier (source code is available <span><a href=\"https:\/\/gist.github.com\/vkvenkat\/f4beadb3fb178a70010002d7753980b2\">here<\/a><\/span>).<\/p>\n<p>First let\u2019s publish the sample application for an x64 target type on either the host or the target with: dotnet publish \u2013c Release \u2013r linux-x64. Then we need to copy the entire folder with sources and binaries to the other machine. Next let\u2019s setup a password-less SSH access to the target with PuTTY, using instructions <span><a href=\"https:\/\/software.intel.com\/en-us\/vtune-amplifier-help-configuring-ssh-access-for-remote-collection\">here<\/a><\/span>. We also need to set \/proc\/sys\/kernel\/perf_event_paranoid and \/proc\/sys\/kernel\/kptr_restrict to 0 in the target system to enable driverless profiling so that user does not need to install target packages, while VTune Amplifier automatically installs the appropriate collectors on the target system.<\/p>\n<p>echo 0 | sudo tee \/proc\/sys\/kernel\/perf_event_paranoid<\/p>\n<p>echo 0 | sudo tee \/proc\/sys\/kernel\/kptr_restrict<\/p>\n<p>&nbsp;<\/p>\n<p>Now let\u2019s use VTune Amplifier on the host machine to start remote profiling the application run on the target. First we need to set the profiling target to Remote Linux (SSH) and provide the necessary details to establish an SSH connection with the target. VTune Amplifier automatically installs the appropriate collectors on the target system in the \/tmp\/vtune_amplifier_&lt;version&gt;.&lt;package_num&gt; directory.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/config.png\" alt=\"\" width=\"350\" height=\"104\" class=\"aligncenter size-full wp-image-19905\" \/><\/p>\n<p>Then let\u2019s select the Launch Application target type and the Hardware Event-Based Sampling modes. Additionally, we need to provide the binary and source file locations on Windows using the Search Sources\/Binaries button before profiling.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/config_analysis.png\" alt=\"\" width=\"955\" height=\"876\" class=\"aligncenter size-full wp-image-19915\" \/><\/p>\n<p>Here are the hotspots under the Process\/Module\/Function\/Thread\/Call Stack grouping:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot13.png\" alt=\"\" width=\"789\" height=\"286\" class=\"aligncenter size-full wp-image-19925\" \/><\/p>\n<p>Let\u2019s look at source code analysis in action by selecting one of the hotspot functions.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/hotspot14.png\" alt=\"\" width=\"1159\" height=\"732\" class=\"aligncenter size-full wp-image-19935\" \/><\/p>\n<p>The support for remote profiling would enable developers collect low-overhead profiling data on resource-constrained target platforms and then analyze this information on the host.<\/p>\n<h2><strong>Summary<\/strong><\/h2>\n<p>The Source Code Analysis feature can be a useful value addition to the .NET Core community, especially for developers interested in performance optimization as they can get insights into hotspots at the source code and assembly levels and then work on targeted optimizations. We continue to look for additional .NET Core scenarios that could benefit from feature enhancements of VTune Amplifier. Let us know in the comments below if you have any suggestions in mind.<\/p>\n<h2><strong>References<\/strong><\/h2>\n<p>VTune Amplifier Product page: <span><a href=\"https:\/\/software.intel.com\/en-us\/intel-vtune-amplifier-xe\">https:\/\/software.intel.com\/en-us\/intel-vtune-amplifier-xe<\/a><\/span><\/p>\n<p>For more details on using the VTune Amplifier, see the product <span><a href=\"https:\/\/software.intel.com\/en-us\/vtune-amplifier-help\">online help<\/a><\/span>.<\/p>\n<p>For more complete information about compiler optimizations, see our\u00a0<span><a href=\"https:\/\/software.intel.com\/en-us\/articles\/optimization-notice#opt-en\">Optimization Notice<\/a><\/span>.<\/p>\n<p>&nbsp;<\/p>\n<p>No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.<\/p>\n<p>Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.<\/p>\n<p>This document contains information on products, services and\/or processes in development.\u00a0 All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.<\/p>\n<p>The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.<\/p>\n<p>Copies of documents which have an order number and are referenced in this document may be obtained by calling\u00a0<span>1-800-548-4725<\/span>\u00a0or by visiting\u00a0<span><strong><a href=\"http:\/\/www.intel.com\/design\/literature.htm\">www.intel.com\/design\/literature.htm<\/a><\/strong><\/span>.<\/p>\n<p>Intel, the Intel logo, Intel Core, VTune are trademarks of Intel Corporation in the U.S. and\/or other countries.<\/p>\n<p>*Other names and brands may be claimed as the property of others<\/p>\n<p>\u00a9 Intel Corporation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post was written by\u00a0Varun Venkatesan, Li Tian, Denis Pravdin, who are engineers at Intel.\u00a0They are excited to share .NET Core-specific enhancements that Intel has made to VTune Amplifier 2019.\u00a0You can use this tool to use to make .NET Core applications faster on Intel processors. Update (2019.01.14): VTune\u2122 Amplifier 2019 Update 2 is now available [&hellip;]<\/p>\n","protected":false},"author":336,"featured_media":21771,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,196],"tags":[30,31,34,66,117,121],"class_list":["post-19725","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-dotnet-core","tag-announcement","tag-asp-net","tag-asp-net-web-api","tag-dotnetnative","tag-releases","tag-ryujit"],"acf":[],"blog_post_summary":"<p>This post was written by\u00a0Varun Venkatesan, Li Tian, Denis Pravdin, who are engineers at Intel.\u00a0They are excited to share .NET Core-specific enhancements that Intel has made to VTune Amplifier 2019.\u00a0You can use this tool to use to make .NET Core applications faster on Intel processors. Update (2019.01.14): VTune\u2122 Amplifier 2019 Update 2 is now available [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/19725","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/336"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=19725"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/19725\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/21771"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=19725"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=19725"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=19725"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}