{"id":22201,"date":"2019-03-05T12:31:18","date_gmt":"2019-03-05T19:31:18","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=22201"},"modified":"2019-03-06T19:05:29","modified_gmt":"2019-03-07T02:05:29","slug":"collecting-net-core-linux-container-cpu-traces-from-a-sidecar-container","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/collecting-net-core-linux-container-cpu-traces-from-a-sidecar-container\/","title":{"rendered":"Collecting .NET Core Linux Container CPU Traces from a Sidecar Container"},"content":{"rendered":"<h3>Introduction<\/h3>\n<p>In recent years, containerization has gained popularity in DevOps due to its valuable capacities, including more efficient resource utilization and better agility. Microsoft and Docker have been working together to create a great experience for running .NET applications inside containers. See the following blog posts for more information:<\/p>\n<ul>\n<li><a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2017\/05\/25\/using-net-and-docker-together\/\">Using .NET and Docker Together<\/a><\/li>\n<li><a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/using-net-and-docker-together-dockercon-2018-update\/\">Using .NET and Docker Together \u2013 DockerCon 2018 Update<\/a><\/li>\n<\/ul>\n<p>When there\u2019s a performance problem, analyzing the problem often requires detailed information about what was happening at the time. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/master\/Documentation\/project-docs\/linux-performance-tracing.md\">Perfcollect<\/a> is the recommended tool for gathering .NET Core performance data on Linux. The .NET Team introduced <code>EventPipe<\/code> feature in .NET Core 2.0 and has been continuously improving the usability of the feature for the end users. The goal of <code>EventPipe<\/code> is to make it very easy to profile .NET Core applications.<\/p>\n<p>However, currently <code>EventPipe<\/code> has limitations:<\/p>\n<ul>\n<li>Only the managed parts of call stacks are collected. If a performance issue is in native code, or in the .NET Core runtime it can only trace it to the boundary.<\/li>\n<li>it does not work for earlier versions of .NET Core.<\/li>\n<\/ul>\n<p>In these cases <code>perfcollect<\/code> is still the preferred tools. Containers bring challenges in using <code>perfcollect<\/code>. There are several ways to use <code>perfcollect<\/code> to gather performance traces .NET Core application running in a Linux container, each has its cons:<\/p>\n<ul>\n<li>Collecting from the host\n<ul>\n<li>Process Ids and file system in the host don&#8217;t match those in the containers.<\/li>\n<li><code>perfcollect<\/code> cannot find container\u2019s files under host paths (what is <code>\/usr\/share\/dotnet\/<\/code>?)<\/li>\n<li>Some container operating systems (for example, CoreOS) don&#8217;t support installing common packages\/tools, for examples, <code>linux-tools<\/code> and <code>lttng<\/code> which are required by <code>perfcollect<\/code> tool.<\/li>\n<\/ul>\n<\/li>\n<li>Collecting from the container running the application.\n<ul>\n<li>Installation of profiling tools bloat the container and increase the attack surface.<\/li>\n<li>Profiling affects the application performance in the same container (for example, its resource consumption is counted against quota).<\/li>\n<li><code>perf<\/code> tool needs capabilities to run from a container, which defeats the security features of containers.<\/li>\n<\/ul>\n<\/li>\n<li>Collecting from another &#8220;sidecar&#8221; container running on the same host.\n<ul>\n<li>Possible environment mismatches between sidecar container and application container.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Tim Gross published <a href=\"http:\/\/blog.0x74696d.com\/posts\/debugging-python-containers-in-production\/\">a blog post on debugging python containers in production<\/a>. His approach is to run tools inside another (sidecar) container on the same host as the application container. The idea can be applied to profiling\/debugging .NET Core Linux containers. This approach has the following benefits:<\/p>\n<ul>\n<li>Application containers don\u2019t need elevated privileges.<\/li>\n<li>Application container images remain mostly unchanged. They are not bloated by tool packages that are not required to run applications.<\/li>\n<li>Profiling doesn&#8217;t consume application container resources, which are usually throttled by a quota.<\/li>\n<li>Sidecar container can be built as close to the application container as possible, so tools used by <code>perfcollect<\/code>, such as <code>crossgen<\/code> and <code>objcopy<\/code>, could operate on files of the same versions at the same paths, even they are in different containers.<\/li>\n<\/ul>\n<p>This article only describes the manual\/one-off performance investigation scenario. However, with additional effort the approach could work in an automated way and\/or under an orchestrator. See <a href=\"https:\/\/developer.ibm.com\/recipes\/tutorials\/profiling-applications-deployed-on-kubernetes-with-sidecar-injector\/\">this tutorial<\/a> for an example on profiling with a sidecar container in a Kubernetes environment.<\/p>\n<p><strong>Note<\/strong>: tracing .NET Core events using <code>LTTng<\/code> is not supported in this sidecar approach due to how <code>LTTng<\/code> works (using shared memory and CPU buffer) so this approach cannot be used to collect events from the .NET Core runtime.<\/p>\n<p>The rest of this doc gives a step-by-step guide of using a sidecar container to collect CPU trace of an ASP.NET application running in a Linux container.<\/p>\n<h3>Building Container Images<\/h3>\n<ol>\n<li>We use <a href=\"https:\/\/gist.github.com\/jeremymeng\/a9a610bc108ae3fe57c90fa973187082#file-dockerfile\">a single Dockerfile<\/a> and the multi-stage builds feature introduced in Docker 17.05 to build the application and sidecar container images.\n<script src=\"https:\/\/gist.github.com\/jeremymeng\/a9a610bc108ae3fe57c90fa973187082.js\"><\/script><a href=\"https:\/\/github.com\/jeremymeng\/\">The sample project<\/a> used is the output of <code>dotnet new webapi<\/code>. A <code>dotnet restore<\/code> step is used to download matching version of <code>crossgen<\/code> from nuget.org. This step is just a convenient way to download matching <code>crossgen<\/code>. It does adds time to the docker build process. If this becomes a concern, there are other ways to add <code>crossgen<\/code> too, for example, copying a pre-downloaded version from a cached location. However, we must ensure that the cached <code>crossgen<\/code> is from the same version of the .NET Core runtime because <code>crossgen<\/code> doesn&#8217;t always work properly across versions. In the future, the .NET team might make improvements in this area to make the experience better, for example, shipping a stable <code>crossgen<\/code> tool that works across different versions.In the example, the most important packages are:<\/p>\n<ul>\n<li><code>linux-tools<\/code>,<\/li>\n<li><code>lttng-tools<\/code>,<\/li>\n<li><code>liblttng-ust-dev<\/code>,<\/li>\n<li><code>zip<\/code>,<\/li>\n<li><code>curl<\/code>,<\/li>\n<li><code>binutils<\/code> (for <code>objcopy<\/code>\/<code>objdump<\/code> commands),<\/li>\n<li><code>procps<\/code> (for <code>ps<\/code> command).<\/li>\n<\/ul>\n<p>The <code>perfcollect<\/code> script is downloaded and saved to <code>\/tools<\/code> directory. Other tools (<code>gdb<\/code>\/<code>vim<\/code>\/<code>emacs-nox<\/code>\/etc.) can be installed as needed for diagnosing and debugging purposes.<\/li>\n<li>Build the application image with the following command:\n<pre>user@host ~\/project\/webapi $ docker build . --target application -f Dockerfile -t application\r\n<\/pre>\n<\/li>\n<li>Build the sidecar image with the following command:\n<pre>user@host ~\/project\/webapi $ docker build . --target sidecar -f Dockerfile -t sidecar\r\n<\/pre>\n<\/li>\n<\/ol>\n<h3>Running Docker Containers<\/h3>\n<ol start=\"4\">\n<li>Use a shared docker volume for <code>\/tmp<\/code>.The Linux <code>perf<\/code> tool needs to access the <code>perf*.map<\/code> files that are generated by the .NET Core application. By default, containers are isolated thus the <code>*.map<\/code> files generated inside the application container are not visible to <code>perf<\/code> tool running inside of the sidecar container. We need to make these <code>*.map<\/code> files available to <code>perf<\/code> tool running inside the sidecar.In this example, a shared docker volume is mapped to the <code>\/tmp<\/code> directory of both the application container and the sidecar container. Since both of their <code>\/tmp<\/code> directories are backed by the same volume, the sidecar container can access files written into the <code>tmp<\/code> directory by the application container.Run the application container with a name (<code>application<\/code> in this example) since it\u2019s easier to refer to the container using its name. Map the <code>\/tmp<\/code> folder to a volume named <code>shared-tmp<\/code>. Docker will create the volume if it does not exist yet.\n<pre>user@host ~\/project\/webapi $ docker run -p 80:80 -v shared-tmp:\/tmp --name application application\r\n<\/pre>\n<p>Volume mount might not be desirable in some cases. Another option is to run the application container without the <code>-v<\/code> options and then use <code>docker cp<\/code> commands to copy the <code>\/tmp\/perf*.map<\/code> files from the running application container to the running sidecar container\u2019s <code>\/tmp<\/code> folder before starting the perfcollect tool.<\/li>\n<li>Run the sidecar using the <code>pid<\/code> and <code>net<\/code> namespaces of the application container, and with <code>\/tmp<\/code> mapped to the same host folder for tmp. Give this container a name (<code>sidecar<\/code> in this example).Linux namespaces isolate containers and make resources they are using invisible to other containers by default, however we can make docker containers to share namespaces using the options like <code>--pid<\/code>, <code>--net<\/code>, etc. Here\u2019s <a href=\"https:\/\/en.wikipedia.org\/wiki\/Linux_namespaces\">a wiki link<\/a> to read more about Linux namespaces.The following command lets the <code>sidecar<\/code> container share the same <code>pid<\/code> and <code>net<\/code> namespaces with the application container so that it is allowed to debug or profile processes in the application container from the sidecar container. The <code>--cap-add ALL --privileged<\/code> switches grant the sidecar container permissions to collect performance traces.\n<pre>user@host ~\/project\/webapi $ docker run -it --pid=container:application --net=container:application -v shared-tmp:\/tmp --cap-add ALL --privileged --name sidecar sidecar<\/pre>\n<\/li>\n<\/ol>\n<h3>Collection CPU Performance Traces<\/h3>\n<ol start=\"6\">\n<li>Inside the sidecar container, collect CPU traces for the <code>dotnet<\/code> process (or your .NET Core application process if it is published as self-contained), which usually has PID of 1, but may vary depending on what else you are running in the <code>application<\/code> container before running the application.\n<pre>root@7eb78f190ed7:\/tools# ps -aux\r\n<\/pre>\n<p>Output should be similar to the following:<\/p>\n<pre>USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND\r\nroot          1  1.1  0.5 7511164 82576 pts\/0   SLsl+ 18:25   0:03 dotnet webapi.dll\r\nroot        104  0.0  0.0  18304  3332 pts\/0    Ss   18:28   0:00 bash\r\nroot        198  0.0  0.0  34424  2796 pts\/0    R+   18:31   0:00 ps -aux\r\n<\/pre>\n<p>In this example, the <code>dotnet<\/code> process has PID of 1 so when running the <code>perfcollect<\/code> script, pass the PID of the 1 to the <code>-pid<\/code> option.<\/p>\n<pre>root@7eb78f190ed7:\/tools# .\/perfcollect collect sample -nolttng -pid 1\r\n<\/pre>\n<p>By using the <code>-pid 1<\/code> option <code>perfcollect<\/code> only captures performance data for the <code>dotnet<\/code> process. Remove the <code>-pid<\/code> option to collect performance data for all processes.<\/p>\n<p>Now generate some requests to the webapi service so that it is consuming CPU. This can be done manually using <code>curl<\/code>, or with load testing tool like Apache Benchmarking from another machine.<\/p>\n<pre>user@another-host ~\/test $ ab -n 200 -c 10 http:\/\/10.1.0.4\/api\/values\r\n<\/pre>\n<p>Press <code>Ctrl+C<\/code> to stop collecting after the service has processed some requests.<\/li>\n<li>After collection is stopped, view the report using the following command:\n<pre>root@7eb78f190ed7:\/tools# .\/perfcollect view sample.trace.zip\r\n<\/pre>\n<\/li>\n<li>Verify that the trace includes the map files by listing contents in the zip file.\n<pre>root@7eb78f190ed7:\/tools# unzip -l sample.trace.zip\r\n<\/pre>\n<p>You should see <code>perf-1.map<\/code> and <code>perfinfo-1.map<\/code> in the zip, along with other <code>*.maps<\/code> files.<\/p>\n<p>If anything went wrong during the collection, check out <code>perfcollect.log<\/code> file inside the zip for more details.<\/p>\n<pre>root@7eb78f190ed7:\/tools# unzip sample.trace.zip sample.trace\/perfcollect.log\r\nroot@7eb78f190ed7:\/tools# tail -100 sample.trace\/perfcollect.log\r\n<\/pre>\n<p>Messages like the following near the end of the log file indicate that you hit <a href=\"https:\/\/github.com\/dotnet\/corefx-tools\/issues\/84\">a known issue<\/a>. Please check out the <a href=\"https:\/\/github.com\/jeremymeng\/#potential-issues\">Potential Issues<\/a> section for a workaround.<\/p>\n<pre>Running \/usr\/bin\/perf_4.9 script -i perf.data.merged -F comm,pid,tid,cpu,time,period,event,ip,sym,dso,trace &gt; perf.data.txt\r\n'trace' not valid for hardware events. Ignoring.\r\n'trace' not valid for software events. Ignoring.\r\n'trace' not valid for unknown events. Ignoring.\r\n'trace' not valid for unknown events. Ignoring.\r\nSamples for 'cpu-clock' event do not have CPU attribute set. Cannot print 'cpu' field.\r\n\r\nRunning \/usr\/bin\/perf_4.9 script -i perf.data.merged -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace &gt; perf.data.txt\r\n  Error: Couldn't find script `comm,pid,tid,cpu,time,event,ip,sym,dso,trace'\r\n\r\n See perf script -l for available scripts.\r\n<\/pre>\n<\/li>\n<li>On the host, retrieve the trace from the running sidecar container.\n<pre>user@host ~\/project\/webapi $ docker cp sidecar:\/tools\/sample.trace.zip .\/\r\n<\/pre>\n<\/li>\n<li>Transfer the trace from the host machine to a Windows machine for further investigation using <a href=\"https:\/\/github.com\/Microsoft\/perfview\">PerfView<\/a>.PerfView supports analyzing <code>perfcollect<\/code> traces from Linux. Open <code>sample.trace.zip<\/code> then follow the usual workflow of working with PerfView.<img decoding=\"async\" class=\"alignnone size-full wp-image-22259\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/03\/perfview-linux-trace.png\" alt=\"screenshot showing collected cpu traces from Linux being analyzed in PerfView\" width=\"1448\" height=\"695\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/03\/perfview-linux-trace.png 1448w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/03\/perfview-linux-trace-300x144.png 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/03\/perfview-linux-trace-768x369.png 768w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/03\/perfview-linux-trace-1024x491.png 1024w\" sizes=\"(max-width: 1448px) 100vw, 1448px\" \/>\nFor more information on analyzing CPU traces from Linux using PerfView, see <a href=\"https:\/\/blogs.msdn.microsoft.com\/vancem\/2016\/02\/20\/analyzing-cpu-traces-from-linux-with-perfview\/\">this blog post<\/a> and <a href=\"https:\/\/channel9.msdn.com\/Series\/PerfView-Tutorial\">Channel 9 series<\/a> by Vance Morrison.<\/li>\n<\/ol>\n<h3>Potential Issues<\/h3>\n<ul>\n<li>In some configurations, the collected <code>cpu-clock<\/code> events don&#8217;t have the <code>cpu<\/code> field. This causes a failure at the <code>.\/perfcollect collect<\/code> step when the script tries to merge trace data. Here is a workaround:Open <code>perfcollect<\/code> in an editor, find the line that contains &#8220;<code>-F<\/code>&#8221; (capital F), then remove &#8220;<code>cpu<\/code>&#8221; from the <code>$perfcmd<\/code> lines:\n<pre class=\"lang:default decode:true \">LogAppend \"Running $perfcmd script -i $mergedFile -F comm,pid,tid,time,period,event,ip,sym,dso,trace &gt; $outputDumpFile\"\r\n$perfcmd script -i $mergedFile -F comm,pid,tid,time,period,event,ip,sym,dso,trace &gt; $outputDumpFile 2&gt;&gt;$logFile\r\nLogAppend<\/pre>\n<p>After applying the workaround and collecting the traces, be aware of <a href=\"https:\/\/github.com\/Microsoft\/perfview\/issues\/806\">a known PerfView issue<\/a> when viewing the traces whose cpu field is missing. This issue has been fixed already and will be available in the future releases of PerfView.<\/li>\n<li>If there are problems resolving .NET symbols, you can also use two additional settings. Note that this affects the application start-up performance.\n<pre>COMPlus_ZapDisable=1\r\nCOMPlus_ReadyToRun=0\r\n<\/pre>\n<p>Setting <code>COMPlus_ZapDisabl=1<\/code> tells the .NET Core runtime to not use the precompiled framework code. All the code will be Just-in-Time compiled thus <code>crossgen<\/code> is no longer needed, which means the steps to run <code>dotnet restore -r linux-x64<\/code> and copy <code>crossgen<\/code> in the Dockerfile can be removed. For more details, check out the relevant section at <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/master\/Documentation\/project-docs\/linux-performance-tracing.md#resolving-framework-symbols\">Performance Tracing on Linux<\/a>.<\/li>\n<\/ul>\n<h3>Conclusion<\/h3>\n<p>This document describes a sidecar approach to collect CPU performance trace for .NET Core application running inside of a container. The step-by-step guide here describes a manual\/on-demand investigation. However, most of steps above may be automated by container orchestrator or infrastructure.<\/p>\n<h3>References and Useful Links<\/h3>\n<ol>\n<li><a href=\"https:\/\/www.usenix.org\/conference\/lisa17\/conference-program\/presentation\/gregg\">Linux Container Performance Analysis<\/a>, talk by Brendan Gregg, inventor of FlameGraph<\/li>\n<li>Examples and hands-on labs for Linux tracing tools workshops by Sasha Goldshtein <a href=\"https:\/\/github.com\/goldshtn\/linux-tracing-workshop\">goldshtn\/linux-tracing-workshop<\/a>.<\/li>\n<li>Debugging and Profiling .NET Core Apps on Linux, slides from Sasha Goldshtein <a href=\"https:\/\/assets.ctfassets.net\/9n3x4rtjlya6\/1qV39g0tAEC2OSgok0QsQ6\/fbfface3edac8da65fd380cc05a1a028\/Sasha-Goldshtein_Debugging-and-profiling-NET-Core-apps-on-Linux.pdf\">assets.ctfassets.net\/9n3x4rtjlya6\/1qV39g0tAEC2OSgok0QsQ6\/fbfface3edac8da65fd380cc05a1a028\/Sasha-Goldshtein_Debugging-and-profiling-NET-Core-apps-on-Linux.pdf<\/a><\/li>\n<li>Debugging Python Containers in Production <a href=\"http:\/\/blog.0x74696d.com\/posts\/debugging-python-containers-in-production\/\">http:\/\/blog.0x74696d.com\/posts\/debugging-python-containers-in-production<\/a><\/li>\n<li>perfcollect source code <a href=\"https:\/\/github.com\/dotnet\/corefx-tools\/blob\/master\/src\/performance\/perfcollect\/perfcollect\">dotnet\/corefx-tools:src\/performance\/perfcollect\/perfcollect@<code>master<\/code><\/a><\/li>\n<li>Documentation on Performance Tracing on Linux for .NET Core. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/master\/Documentation\/project-docs\/linux-performance-tracing.md\">dotnet\/coreclr:Documentation\/project-docs\/linux-performance-tracing.md@<code>master<\/code><\/a><\/li>\n<li>PerfView tutorials on Channel9 <a href=\"https:\/\/channel9.msdn.com\/Series\/PerfView-Tutorial\">channel9.msdn.com\/Series\/PerfView-Tutorial<\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>It is challenging to collect performance traces of ASP.NET Core applications running inside Linux containers.  This blog post describes an approach that use a sidecar container to collect CPU trace of an ASP.NET application running in a Linux container.<\/p>\n","protected":false},"author":1622,"featured_media":22296,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[196,197],"tags":[9,32,1696,60,92,1697],"class_list":["post-22201","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet-core","category-aspnet","tag-net-core","tag-asp-net-core","tag-container","tag-docker","tag-linux","tag-perfcollect"],"acf":[],"blog_post_summary":"<p>It is challenging to collect performance traces of ASP.NET Core applications running inside Linux containers.  This blog post describes an approach that use a sidecar container to collect CPU trace of an ASP.NET application running in a Linux container.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/22201","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/1622"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=22201"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/22201\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/22296"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=22201"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=22201"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=22201"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}