Collecting .NET Core Linux Container CPU Traces from a Sidecar Container
In recent years, containerization has gained popularity in DevOps due to its valuable capacities, including more efficient resource utilization and better agility. Microsoft and Docker have been working together to create a great experience for running .NET applications inside containers. See the following blog posts for more information:
When there’s a performance problem, analyzing the problem often requires detailed information about what was happening at the time. Perfcollect is the recommended tool for gathering .NET Core performance data on Linux. The .NET Team introduced
EventPipe feature in .NET Core 2.0 and has been continuously improving the usability of the feature for the end users. The goal of
EventPipe is to make it very easy to profile .NET Core applications.
EventPipe has limitations:
- Only the managed parts of call stacks are collected. If a performance issue is in native code, or in the .NET Core runtime it can only trace it to the boundary.
- it does not work for earlier versions of .NET Core.
In these cases
perfcollect is still the preferred tools. Containers bring challenges in using
perfcollect. There are several ways to use
perfcollect to gather performance traces .NET Core application running in a Linux container, each has its cons:
- Collecting from the host
- Process Ids and file system in the host don’t match those in the containers.
perfcollectcannot find container’s files under host paths (what is
- Some container operating systems (for example, CoreOS) don’t support installing common packages/tools, for examples,
lttngwhich are required by
- Collecting from the container running the application.
- Installation of profiling tools bloat the container and increase the attack surface.
- Profiling affects the application performance in the same container (for example, its resource consumption is counted against quota).
perftool needs capabilities to run from a container, which defeats the security features of containers.
- Collecting from another “sidecar” container running on the same host.
- Possible environment mismatches between sidecar container and application container.
Tim Gross published a blog post on debugging python containers in production. His approach is to run tools inside another (sidecar) container on the same host as the application container. The idea can be applied to profiling/debugging .NET Core Linux containers. This approach has the following benefits:
- Application containers don’t need elevated privileges.
- Application container images remain mostly unchanged. They are not bloated by tool packages that are not required to run applications.
- Profiling doesn’t consume application container resources, which are usually throttled by a quota.
- Sidecar container can be built as close to the application container as possible, so tools used by
perfcollect, such as
objcopy, could operate on files of the same versions at the same paths, even they are in different containers.
This article only describes the manual/one-off performance investigation scenario. However, with additional effort the approach could work in an automated way and/or under an orchestrator. See this tutorial for an example on profiling with a sidecar container in a Kubernetes environment.
Note: tracing .NET Core events using
LTTng is not supported in this sidecar approach due to how
LTTng works (using shared memory and CPU buffer) so this approach cannot be used to collect events from the .NET Core runtime.
The rest of this doc gives a step-by-step guide of using a sidecar container to collect CPU trace of an ASP.NET application running in a Linux container.
Building Container Images
- We use a single Dockerfile and the multi-stage builds feature introduced in Docker 17.05 to build the application and sidecar container images.
The sample project used is the output of
dotnet new webapi. A
dotnet restorestep is used to download matching version of
crossgenfrom nuget.org. This step is just a convenient way to download matching
crossgen. It does adds time to the docker build process. If this becomes a concern, there are other ways to add
crossgentoo, for example, copying a pre-downloaded version from a cached location. However, we must ensure that the cached
crossgenis from the same version of the .NET Core runtime because
crossgendoesn’t always work properly across versions. In the future, the .NET team might make improvements in this area to make the experience better, for example, shipping a stable
crossgentool that works across different versions.In the example, the most important packages are:
perfcollectscript is downloaded and saved to
/toolsdirectory. Other tools (
emacs-nox/etc.) can be installed as needed for diagnosing and debugging purposes.
- Build the application image with the following command:1user@host ~/project/webapi $ docker build . --target application -f Dockerfile -t application
- Build the sidecar image with the following command:1user@host ~/project/webapi $ docker build . --target sidecar -f Dockerfile -t sidecar
Running Docker Containers
- Use a shared docker volume for
perftool needs to access the
perf*.mapfiles that are generated by the .NET Core application. By default, containers are isolated thus the
*.mapfiles generated inside the application container are not visible to
perftool running inside of the sidecar container. We need to make these
*.mapfiles available to
perftool running inside the sidecar.In this example, a shared docker volume is mapped to the
/tmpdirectory of both the application container and the sidecar container. Since both of their
/tmpdirectories are backed by the same volume, the sidecar container can access files written into the
tmpdirectory by the application container.Run the application container with a name (
applicationin this example) since it’s easier to refer to the container using its name. Map the
/tmpfolder to a volume named
shared-tmp. Docker will create the volume if it does not exist yet.1user@host ~/project/webapi $ docker run -p 80:80 -v shared-tmp:/tmp --name application application
Volume mount might not be desirable in some cases. Another option is to run the application container without the
-voptions and then use
docker cpcommands to copy the
/tmp/perf*.mapfiles from the running application container to the running sidecar container’s
/tmpfolder before starting the perfcollect tool.
- Run the sidecar using the
netnamespaces of the application container, and with
/tmpmapped to the same host folder for tmp. Give this container a name (
sidecarin this example).Linux namespaces isolate containers and make resources they are using invisible to other containers by default, however we can make docker containers to share namespaces using the options like
--net, etc. Here’s a wiki link to read more about Linux namespaces.The following command lets the
sidecarcontainer share the same
netnamespaces with the application container so that it is allowed to debug or profile processes in the application container from the sidecar container. The
--cap-add ALL --privilegedswitches grant the sidecar container permissions to collect performance traces.1user@host ~/project/webapi $ docker run -it --pid=container:application --net=container:application -v shared-tmp:/tmp --cap-add ALL --privileged --name sidecar sidecar
Collection CPU Performance Traces
- Inside the sidecar container, collect CPU traces for the
dotnetprocess (or your .NET Core application process if it is published as self-contained), which usually has PID of 1, but may vary depending on what else you are running in the
applicationcontainer before running the application.1root@7eb78f190ed7:/tools# ps -aux
Output should be similar to the following:1234USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDroot 1 1.1 0.5 7511164 82576 pts/0 SLsl+ 18:25 0:03 dotnet webapi.dllroot 104 0.0 0.0 18304 3332 pts/0 Ss 18:28 0:00 bashroot 198 0.0 0.0 34424 2796 pts/0 R+ 18:31 0:00 ps -aux
In this example, the
dotnetprocess has PID of 1 so when running the
perfcollectscript, pass the PID of the 1 to the
-pidoption.1root@7eb78f190ed7:/tools# ./perfcollect collect sample -nolttng -pid 1
By using the
perfcollectonly captures performance data for the
dotnetprocess. Remove the
-pidoption to collect performance data for all processes.
Now generate some requests to the webapi service so that it is consuming CPU. This can be done manually using
curl, or with load testing tool like Apache Benchmarking from another machine.1user@another-host ~/test $ ab -n 200 -c 10 http://10.1.0.4/api/values
Ctrl+Cto stop collecting after the service has processed some requests.
- After collection is stopped, view the report using the following command:1root@7eb78f190ed7:/tools# ./perfcollect view sample.trace.zip
- Verify that the trace includes the map files by listing contents in the zip file.1root@7eb78f190ed7:/tools# unzip -l sample.trace.zip
You should see
perfinfo-1.mapin the zip, along with other
If anything went wrong during the collection, check out
perfcollect.logfile inside the zip for more details.12root@7eb78f190ed7:/tools# unzip sample.trace.zip sample.trace/perfcollect.logroot@7eb78f190ed7:/tools# tail -100 sample.trace/perfcollect.log1234567891011Running /usr/bin/perf_4.9 script -i perf.data.merged -F comm,pid,tid,cpu,time,period,event,ip,sym,dso,trace > perf.data.txt'trace' not valid for hardware events. Ignoring.'trace' not valid for software events. Ignoring.'trace' not valid for unknown events. Ignoring.'trace' not valid for unknown events. Ignoring.Samples for 'cpu-clock' event do not have CPU attribute set. Cannot print 'cpu' field.Running /usr/bin/perf_4.9 script -i perf.data.merged -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace > perf.data.txtError: Couldn't find script `comm,pid,tid,cpu,time,event,ip,sym,dso,trace'See perf script -l for available scripts.
- On the host, retrieve the trace from the running sidecar container.1user@host ~/project/webapi $ docker cp sidecar:/tools/sample.trace.zip ./
- Transfer the trace from the host machine to a Windows machine for further investigation using PerfView.PerfView supports analyzing
perfcollecttraces from Linux. Open
sample.trace.zipthen follow the usual workflow of working with PerfView.
For more information on analyzing CPU traces from Linux using PerfView, see this blog post and Channel 9 series by Vance Morrison.
- In some configurations, the collected
cpu-clockevents don’t have the
cpufield. This causes a failure at the
./perfcollect collectstep when the script tries to merge trace data. Here is a workaround:Open
perfcollectin an editor, find the line that contains “
-F” (capital F), then remove “
cpu” from the
$perfcmdlines:123LogAppend "Running $perfcmd script -i $mergedFile -F comm,pid,tid,time,period,event,ip,sym,dso,trace > $outputDumpFile"$perfcmd script -i $mergedFile -F comm,pid,tid,time,period,event,ip,sym,dso,trace > $outputDumpFile 2>>$logFileLogAppend
After applying the workaround and collecting the traces, be aware of a known PerfView issue when viewing the traces whose cpu field is missing. This issue has been fixed already and will be available in the future releases of PerfView.
- If there are problems resolving .NET symbols, you can also use two additional settings. Note that this affects the application start-up performance.12COMPlus_ZapDisable=1COMPlus_ReadyToRun=0
COMPlus_ZapDisabl=1tells the .NET Core runtime to not use the precompiled framework code. All the code will be Just-in-Time compiled thus
crossgenis no longer needed, which means the steps to run
dotnet restore -r linux-x64and copy
crossgenin the Dockerfile can be removed. For more details, check out the relevant section at Performance Tracing on Linux.
This document describes a sidecar approach to collect CPU performance trace for .NET Core application running inside of a container. The step-by-step guide here describes a manual/on-demand investigation. However, most of steps above may be automated by container orchestrator or infrastructure.
References and Useful Links
- Linux Container Performance Analysis, talk by Brendan Gregg, inventor of FlameGraph
- Examples and hands-on labs for Linux tracing tools workshops by Sasha Goldshtein goldshtn/linux-tracing-workshop.
- Debugging and Profiling .NET Core Apps on Linux, slides from Sasha Goldshtein assets.ctfassets.net/9n3x4rtjlya6/1qV39g0tAEC2OSgok0QsQ6/fbfface3edac8da65fd380cc05a1a028/Sasha-Goldshtein_Debugging-and-profiling-NET-Core-apps-on-Linux.pdf
- Debugging Python Containers in Production http://blog.0x74696d.com/posts/debugging-python-containers-in-production
- perfcollect source code dotnet/corefx-tools:src/performance/perfcollect/perfcollect@
- Documentation on Performance Tracing on Linux for .NET Core. dotnet/coreclr:Documentation/project-docs/linux-performance-tracing.md@
- PerfView tutorials on Channel9 channel9.msdn.com/Series/PerfView-Tutorial