Introduction
In recent years, containerization has gained popularity in DevOps due to its valuable capacities, including more efficient resource utilization and better agility. Microsoft and Docker have been working together to create a great experience for running .NET applications inside containers. See the following blog posts for more information:
When there’s a performance problem, analyzing the problem often requires detailed information about what was happening at the time. Perfcollect is the recommended tool for gathering .NET Core performance data on Linux. The .NET Team introduced EventPipe
feature in .NET Core 2.0 and has been continuously improving the usability of the feature for the end users. The goal of EventPipe
is to make it very easy to profile .NET Core applications.
However, currently EventPipe
has limitations:
- Only the managed parts of call stacks are collected. If a performance issue is in native code, or in the .NET Core runtime it can only trace it to the boundary.
- it does not work for earlier versions of .NET Core.
In these cases perfcollect
is still the preferred tools. Containers bring challenges in using perfcollect
. There are several ways to use perfcollect
to gather performance traces .NET Core application running in a Linux container, each has its cons:
- Collecting from the host
- Process Ids and file system in the host don’t match those in the containers.
perfcollect
cannot find container’s files under host paths (what is/usr/share/dotnet/
?)- Some container operating systems (for example, CoreOS) don’t support installing common packages/tools, for examples,
linux-tools
andlttng
which are required byperfcollect
tool.
- Collecting from the container running the application.
- Installation of profiling tools bloat the container and increase the attack surface.
- Profiling affects the application performance in the same container (for example, its resource consumption is counted against quota).
perf
tool needs capabilities to run from a container, which defeats the security features of containers.
- Collecting from another “sidecar” container running on the same host.
- Possible environment mismatches between sidecar container and application container.
Tim Gross published a blog post on debugging python containers in production. His approach is to run tools inside another (sidecar) container on the same host as the application container. The idea can be applied to profiling/debugging .NET Core Linux containers. This approach has the following benefits:
- Application containers don’t need elevated privileges.
- Application container images remain mostly unchanged. They are not bloated by tool packages that are not required to run applications.
- Profiling doesn’t consume application container resources, which are usually throttled by a quota.
- Sidecar container can be built as close to the application container as possible, so tools used by
perfcollect
, such ascrossgen
andobjcopy
, could operate on files of the same versions at the same paths, even they are in different containers.
This article only describes the manual/one-off performance investigation scenario. However, with additional effort the approach could work in an automated way and/or under an orchestrator. See this tutorial for an example on profiling with a sidecar container in a Kubernetes environment.
Note: tracing .NET Core events using LTTng
is not supported in this sidecar approach due to how LTTng
works (using shared memory and CPU buffer) so this approach cannot be used to collect events from the .NET Core runtime.
The rest of this doc gives a step-by-step guide of using a sidecar container to collect CPU trace of an ASP.NET application running in a Linux container.
Building Container Images
- We use a single Dockerfile and the multi-stage builds feature introduced in Docker 17.05 to build the application and sidecar container images.
The sample project used is the output of
dotnet new webapi
. Adotnet restore
step is used to download matching version ofcrossgen
from nuget.org. This step is just a convenient way to download matchingcrossgen
. It does adds time to the docker build process. If this becomes a concern, there are other ways to addcrossgen
too, for example, copying a pre-downloaded version from a cached location. However, we must ensure that the cachedcrossgen
is from the same version of the .NET Core runtime becausecrossgen
doesn’t always work properly across versions. In the future, the .NET team might make improvements in this area to make the experience better, for example, shipping a stablecrossgen
tool that works across different versions.In the example, the most important packages are:linux-tools
,lttng-tools
,liblttng-ust-dev
,zip
,curl
,binutils
(forobjcopy
/objdump
commands),procps
(forps
command).
The
perfcollect
script is downloaded and saved to/tools
directory. Other tools (gdb
/vim
/emacs-nox
/etc.) can be installed as needed for diagnosing and debugging purposes. - Build the application image with the following command:
user@host ~/project/webapi $ docker build . --target application -f Dockerfile -t application
- Build the sidecar image with the following command:
user@host ~/project/webapi $ docker build . --target sidecar -f Dockerfile -t sidecar
Running Docker Containers
- Use a shared docker volume for
/tmp
.The Linuxperf
tool needs to access theperf*.map
files that are generated by the .NET Core application. By default, containers are isolated thus the*.map
files generated inside the application container are not visible toperf
tool running inside of the sidecar container. We need to make these*.map
files available toperf
tool running inside the sidecar.In this example, a shared docker volume is mapped to the/tmp
directory of both the application container and the sidecar container. Since both of their/tmp
directories are backed by the same volume, the sidecar container can access files written into thetmp
directory by the application container.Run the application container with a name (application
in this example) since it’s easier to refer to the container using its name. Map the/tmp
folder to a volume namedshared-tmp
. Docker will create the volume if it does not exist yet.user@host ~/project/webapi $ docker run -p 80:80 -v shared-tmp:/tmp --name application application
Volume mount might not be desirable in some cases. Another option is to run the application container without the
-v
options and then usedocker cp
commands to copy the/tmp/perf*.map
files from the running application container to the running sidecar container’s/tmp
folder before starting the perfcollect tool. - Run the sidecar using the
pid
andnet
namespaces of the application container, and with/tmp
mapped to the same host folder for tmp. Give this container a name (sidecar
in this example).Linux namespaces isolate containers and make resources they are using invisible to other containers by default, however we can make docker containers to share namespaces using the options like--pid
,--net
, etc. Here’s a wiki link to read more about Linux namespaces.The following command lets thesidecar
container share the samepid
andnet
namespaces with the application container so that it is allowed to debug or profile processes in the application container from the sidecar container. The--cap-add ALL --privileged
switches grant the sidecar container permissions to collect performance traces.user@host ~/project/webapi $ docker run -it --pid=container:application --net=container:application -v shared-tmp:/tmp --cap-add ALL --privileged --name sidecar sidecar
Collection CPU Performance Traces
- Inside the sidecar container, collect CPU traces for the
dotnet
process (or your .NET Core application process if it is published as self-contained), which usually has PID of 1, but may vary depending on what else you are running in theapplication
container before running the application.root@7eb78f190ed7:/tools# ps -aux
Output should be similar to the following:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 1.1 0.5 7511164 82576 pts/0 SLsl+ 18:25 0:03 dotnet webapi.dll root 104 0.0 0.0 18304 3332 pts/0 Ss 18:28 0:00 bash root 198 0.0 0.0 34424 2796 pts/0 R+ 18:31 0:00 ps -aux
In this example, the
dotnet
process has PID of 1 so when running theperfcollect
script, pass the PID of the 1 to the-pid
option.root@7eb78f190ed7:/tools# ./perfcollect collect sample -nolttng -pid 1
By using the
-pid 1
optionperfcollect
only captures performance data for thedotnet
process. Remove the-pid
option to collect performance data for all processes.Now generate some requests to the webapi service so that it is consuming CPU. This can be done manually using
curl
, or with load testing tool like Apache Benchmarking from another machine.user@another-host ~/test $ ab -n 200 -c 10 http://10.1.0.4/api/values
Press
Ctrl+C
to stop collecting after the service has processed some requests. - After collection is stopped, view the report using the following command:
root@7eb78f190ed7:/tools# ./perfcollect view sample.trace.zip
- Verify that the trace includes the map files by listing contents in the zip file.
root@7eb78f190ed7:/tools# unzip -l sample.trace.zip
You should see
perf-1.map
andperfinfo-1.map
in the zip, along with other*.maps
files.If anything went wrong during the collection, check out
perfcollect.log
file inside the zip for more details.root@7eb78f190ed7:/tools# unzip sample.trace.zip sample.trace/perfcollect.log root@7eb78f190ed7:/tools# tail -100 sample.trace/perfcollect.log
Messages like the following near the end of the log file indicate that you hit a known issue. Please check out the Potential Issues section for a workaround.
Running /usr/bin/perf_4.9 script -i perf.data.merged -F comm,pid,tid,cpu,time,period,event,ip,sym,dso,trace > perf.data.txt 'trace' not valid for hardware events. Ignoring. 'trace' not valid for software events. Ignoring. 'trace' not valid for unknown events. Ignoring. 'trace' not valid for unknown events. Ignoring. Samples for 'cpu-clock' event do not have CPU attribute set. Cannot print 'cpu' field. Running /usr/bin/perf_4.9 script -i perf.data.merged -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace > perf.data.txt Error: Couldn't find script `comm,pid,tid,cpu,time,event,ip,sym,dso,trace' See perf script -l for available scripts.
- On the host, retrieve the trace from the running sidecar container.
user@host ~/project/webapi $ docker cp sidecar:/tools/sample.trace.zip ./
- Transfer the trace from the host machine to a Windows machine for further investigation using PerfView.PerfView supports analyzing
perfcollect
traces from Linux. Opensample.trace.zip
then follow the usual workflow of working with PerfView. For more information on analyzing CPU traces from Linux using PerfView, see this blog post and Channel 9 series by Vance Morrison.
Potential Issues
- In some configurations, the collected
cpu-clock
events don’t have thecpu
field. This causes a failure at the./perfcollect collect
step when the script tries to merge trace data. Here is a workaround:Openperfcollect
in an editor, find the line that contains “-F
” (capital F), then remove “cpu
” from the$perfcmd
lines:LogAppend "Running $perfcmd script -i $mergedFile -F comm,pid,tid,time,period,event,ip,sym,dso,trace > $outputDumpFile" $perfcmd script -i $mergedFile -F comm,pid,tid,time,period,event,ip,sym,dso,trace > $outputDumpFile 2>>$logFile LogAppend
After applying the workaround and collecting the traces, be aware of a known PerfView issue when viewing the traces whose cpu field is missing. This issue has been fixed already and will be available in the future releases of PerfView.
- If there are problems resolving .NET symbols, you can also use two additional settings. Note that this affects the application start-up performance.
COMPlus_ZapDisable=1 COMPlus_ReadyToRun=0
Setting
COMPlus_ZapDisabl=1
tells the .NET Core runtime to not use the precompiled framework code. All the code will be Just-in-Time compiled thuscrossgen
is no longer needed, which means the steps to rundotnet restore -r linux-x64
and copycrossgen
in the Dockerfile can be removed. For more details, check out the relevant section at Performance Tracing on Linux.
Conclusion
This document describes a sidecar approach to collect CPU performance trace for .NET Core application running inside of a container. The step-by-step guide here describes a manual/on-demand investigation. However, most of steps above may be automated by container orchestrator or infrastructure.
References and Useful Links
- Linux Container Performance Analysis, talk by Brendan Gregg, inventor of FlameGraph
- Examples and hands-on labs for Linux tracing tools workshops by Sasha Goldshtein goldshtn/linux-tracing-workshop.
- Debugging and Profiling .NET Core Apps on Linux, slides from Sasha Goldshtein assets.ctfassets.net/9n3x4rtjlya6/1qV39g0tAEC2OSgok0QsQ6/fbfface3edac8da65fd380cc05a1a028/Sasha-Goldshtein_Debugging-and-profiling-NET-Core-apps-on-Linux.pdf
- Debugging Python Containers in Production http://blog.0x74696d.com/posts/debugging-python-containers-in-production
- perfcollect source code dotnet/corefx-tools:src/performance/perfcollect/perfcollect@
master
- Documentation on Performance Tracing on Linux for .NET Core. dotnet/coreclr:Documentation/project-docs/linux-performance-tracing.md@
master
- PerfView tutorials on Channel9 channel9.msdn.com/Series/PerfView-Tutorial
I’m no expert and am not sure it would work in this case, but looking at the lttng docs, isn’t it possible to set up a remote relayd target to collect trace information?
https://lttng.org/blog/2016/03/07/tutorial-remote-tracing/
It’s possible to collect the lttng traces remotely but we would be missing the nice managed call stacks that `perfcollect` script helps put together.
Thank you for this blog post, I’ve been trying profile a container for a few days and until now haven’t found working instructions.