{"id":25661,"date":"2019-12-09T02:33:32","date_gmt":"2019-12-09T09:33:32","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=25661"},"modified":"2021-09-29T12:16:04","modified_gmt":"2021-09-29T19:16:04","slug":"gc-perf-infrastructure-part-1","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/gc-perf-infrastructure-part-1\/","title":{"rendered":"GC Perf Infrastructure &#8211; Part 1"},"content":{"rendered":"<p>We <a href=\"https:\/\/github.com\/dotnet\/performance\/tree\/master\/src\/benchmarks\/gc\" rel=\"noopener noreferrer\" target=\"_blank\">open sourced<\/a> our new GC Perf Infrastructure! It\u2019s now part of the dotnet performance repo. I\u2019ve been meaning to write about it \u2018cause some curious minds had been asking when they could use it after I <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/gc-perf-infrastructure-part-0\/\" rel=\"noopener noreferrer\" target=\"_blank\">blogged about it last time<\/a> but didn\u2019t get around to it till now.<\/p>\n<p>First of all, let me point out that the target audience of this infra, aside from the obvious (ie, those who make performance changes to the GC), are folks need to do in-depth analysis of GC\/managed memory performance and\/or to build automation around it. So it assumes you already have a fair amount of knowledge what to look for with the analysis.<\/p>\n<p>Secondly, there are a lot of moving parts in the infra and since it\u2019s still under development I wouldn\u2019t be surprised if you hit problems when you try to use it. Please be patient with us as we work through the issues! We don\u2019t have a whole lot of resources so we may not be able to get to them right away. And of course if you want to contribute it would be most appreciated. I know many people who are reading this are passionate about perf analysis and have done a ton of work to build\/improve perf analysis for .NET, whether in your own tooling or other people\u2019s. And contributing to perf analysis is a fantastic way to learn about GC tuning if you are looking to start somewhere. So I would strongly encourage you to contribute!<\/p>\n<p><strong>Topology<\/strong><\/p>\n<p>We discussed whether we wanted to open source this in its own repo and the conclusion we wouldn\u2019t mostly just due to logistics reasons so this became part of the perf repo under the \u201c<a href=\"https:\/\/github.com\/dotnet\/performance\/tree\/master\/src\/benchmarks\/gc\" rel=\"noopener noreferrer\" target=\"_blank\">src\/benchmarks\/gc<\/a>\u201d directory (which I\u2019ll refer to as the root directory). It doesn&#8217;t depend on anything outside of this directory which means you don\u2019t need to build anything outside of it if you just want to use the GC perf infra part.<\/p>\n<p>The <a href=\"https:\/\/github.com\/dotnet\/performance\/blob\/master\/src\/benchmarks\/gc\/README.md\" rel=\"noopener noreferrer\" target=\"_blank\">readme.md in the root directory<\/a> describes the general workflow and basic usage. More documentation can be found in the <a href=\"https:\/\/github.com\/dotnet\/performance\/tree\/master\/src\/benchmarks\/gc\/docs\" rel=\"noopener noreferrer\" target=\"_blank\">docs<\/a> directory.<\/p>\n<p>There are 2 major components of the infra \u2013<\/p>\n<p><font color=\"blue\">Running perf benchmarks<\/font><\/p>\n<p>This runs our own perf benchmarks \u2013 this is for folks who need to actually make perf changes to the GC. It provides the following functionalities \u2013<\/p>\n<ul style=\"list-style-type: square;\">\n<li>\n    Specifying different commandline args to generate different perf characteristics in the tests, eg, different surv ratios for SOH\/LOH and different pinning ratios;\n  <\/li>\n<\/ul>\n<ul style=\"list-style-type: square;\">\n<li>\n    Specifying builds to compare against;\n  <\/li>\n<\/ul>\n<ul style=\"list-style-type: square;\">\n<li>\n    Specifying different environments, eg, different env vars to specify GC configs, running in containers or high memory load situations;\n  <\/li>\n<\/ul>\n<ul style=\"list-style-type: square;\">\n<li>\n    Specifying different options to collect traces with, eg, GCCollectOnly or ThreadTime.\n  <\/li>\n<\/ul>\n<p>You specify all these in what we call a bench file (it\u2019s a .yaml file but really could be anything \u2013 we just chose .yaml). We also provide configurations for the basic perf scenarios so when you make changes those should be run to make sure things don\u2019t regress.<\/p>\n<p>You don\u2019t have to run our tests \u2013 you could run whatever you like as long as you can specify it as a commandline program, and still take advantage of the rest of what we provide like running in a container.<\/p>\n<p>This is documented in the readme and I will be talking about this in more detail in one of the future blog entries.<\/p>\n<p>Source for this is in the <a href=\"https:\/\/github.com\/dotnet\/performance\/tree\/master\/src\/benchmarks\/gc\/src\/exec\" rel=\"noopener noreferrer\" target=\"_blank\">exec<\/a> dir.<\/p>\n<p><font color=\"blue\">Analyzing perf<\/font><\/p>\n<p>This can be used without the running part at all. If you already collected perf traces, you can use this to analyze them. I\u2019d imagine more folks would be interested in this than the running part so I\u2019ll devote more content to analysis. In the last GC perf infra post I already talked about things you could do using Jupyter Notebook (I&#8217;ll be showing more examples with the actual code in the upcoming blog entries). This time I\u2019ll focus on actually setting things up and using the commands we provide. Feel free to try it out now that it\u2019s out there.<\/p>\n<p>Source for this is in the <a href=\"https:\/\/github.com\/dotnet\/performance\/tree\/master\/src\/benchmarks\/gc\/src\/analysis\" rel=\"noopener noreferrer\" target=\"_blank\">analysis<\/a> dir.<\/p>\n<p><strong>Analysis setup<\/strong><\/p>\n<p>After you clone the dotnet performance repo, you\u2019ll see the readme in the gc infra root dir. Setup is detailed in that doc. If you just want the analysis piece you don\u2019t need to do all of the setup steps there. The only steps you need are &#8211;<\/p>\n<ul style=\"list-style-type: square;\">\n<li>\n    Install python. 3.7 is the minimal required version and recommended version. 3.8 has <a href=\"https:\/\/github.com\/jupyter\/notebook\/issues\/4613\" rel=\"noopener noreferrer\" target=\"_blank\">problems<\/a> with Jupyter Notebook. I wanted to point this out because 3.8 is the latest release version on python\u2019s page.\n  <\/li>\n<\/ul>\n<ul style=\"list-style-type: square;\">\n<li>\n    Install the python libraries needed \u2013 you can install this via \u201cpy -m pip install -r src\/requirements.txt\u201d as the readme says and if no errors occur, great; but you might get errors with pythonnet which is mandatory for analysis. In fact installing pythonnet can be so troublesome that we devoted <a href=\"https:\/\/github.com\/dotnet\/performance\/blob\/master\/src\/benchmarks\/gc\/docs\/pythonnet.md\" rel=\"noopener noreferrer\" target=\"_blank\">a whole doc<\/a> just for it. I hope one day there are enough good c# charting libraries and c# works in Jupyter Notebook inside VSCode so we no longer need pythonnet.\n  <\/li>\n<\/ul>\n<ul style=\"list-style-type: square;\">\n<li>\n    Build the c# analysis library by running \u201cdotnet publish\u201d in the src\\analysis\\managed-lib dir.\n  <\/li>\n<\/ul>\n<p><strong>Specify what to analyze<\/strong><\/p>\n<p>Let\u2019s say you\u2019ve collected an ETW trace (this can be from .NET or .NET Core) and want to analyze it, you\u2019ll need to tell the infra which process is of interest to you (on Linux you collect the events for the process of interest with dotnet-trace but since the infra works on both Windows and Linux this is the same step you\u2019d perform). Specifying the process to analyze means simply writing a .yaml file that we call the \u201ctest status file\u201d. From the readme, the test status file you write just for analysis only needs these 3 lines \u2013 <font face=\"courier\"><\/p>\n<p><span>success: true <\/span><\/p>\n<p><span>trace_file_name: x.etl # A relative path. Should generally match the name of this file. <\/span><\/p>\n<p><span>process_id: 1234 # If you don\u2019t know this, use the <\/span><code>print-processes<\/code><span> command for a list <\/span><\/p>\n<p><\/font><\/p>\n<p>You might wonder why you need to specify the \u201csuccess: true\u201d line at all \u2013 this is simply because the infra can also be used to analyze the results of running tests with it and when you run lots of tests and analyze their results in automation we\u2019d look for this line and only analyze the ones that succeeded.<\/p>\n<p>You may already know the PID of the process you want to analyze via other tools like PerfView but we aim to have the infra used standalone without having to run other tools so there\u2019s a command that prints out the PIDs of processes a trace contains.<\/p>\n<p>We really wanted to have the infra provide meaningful built-in help so when you wonder how to do something you could generally find it in its help. To get the list of all commands simply ask for the top level help in the root dir \u2013 <font face=\"courier\"><\/font><\/p>\n<p><span>C:\\perf\\src\\benchmarks\\gc>py . help<\/span><\/p>\n<p><span>Read <\/span><code>README.md<\/code><span> first. For help with an individual command, use <\/span><code>py . command-name --help<\/code><span>. (You can also pass <\/span><code>--help --hidden<\/code><span> to see hidden arguments.)<\/span><\/p>\n<p><span>run commands<\/span><\/p>\n<p><span>[omitted]<\/span><\/p>\n<p><span>analysis commands<\/span><\/p>\n<p><span>Commands for analyzing test results (trace files). To compare a small number of configs, use <\/span><code>diff<\/code><span>. To compare many, use <\/span><code>chart-configs<\/code><span>. For detailed analysis of a single trace, use <\/span><code>analyze-single<\/code><span> or <\/span><code>chart-individual-gcs<\/code><span>.<\/span><\/p>\n<p><span>analyze-single: Given a single trace, print run metrics and optionally metrics for individual GCs.<\/span><\/p>\n<p><span>analyze-single-gc: Print detailed info about a single GC within a single trace.<\/span><\/p>\n<p><span>[more output omitted and did some formatting of the output]<\/span><\/p>\n<p>\u00a0<\/p>\n<p>(I apologize for the formatting &#8211; it amazes me that that we don&#8217;t seem to have a decent html editing program for blogging and writing a blog mostly consists of manually writing html ourselves which is really painful)<\/p>\n<p>As the top level help says you can get help with specific commands. So we\u2019ll follow that suggestion and do <font face=\"courier\"><\/p>\n<p><span>C:\\perf\\src\\benchmarks\\gc&gt;py . help print-processes <\/span><\/p>\n<p><span>Print all process PIDs and names from a trace file. <\/span><\/p>\n<p><figure><\/figure>\n<\/p>\n<table border=\"1\">\n<thead>\n<tr>\n<th>\n        <span> arg name<\/span>\n      <\/th>\n<th>\n        <span> arg type<\/span>\n      <\/th>\n<th>\n        <span> description<\/span>\n      <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n        <span> \u2013name-regex<\/span>\n      <\/td>\n<td>\n        <span> any string<\/span>\n      <\/td>\n<td>\n        <span> Regular expression used to filter processes by their name<\/span>\n      <\/td>\n<\/tr>\n<tr>\n<td>\n        <span> \u2013hide-threads<\/span>\n      <\/td>\n<td>\n        <span> true or false<\/span>\n      <\/td>\n<td>\n        <span> Don\u2019t show threads for each process<\/span>\n      <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span>[more output omitted; I also did some formatting to get rid of some columns so the lines are not too long]<\/span>\n<\/font><\/p>\n<p>As an example, I purposefully chose a test that I know is unsuitable to be run with Server GC \u2018cause it only has one thread so I\u2019m expecting to see some heap imbalance. I know the imbalance will occur when we mark older generation objects holding onto young gen objects so I\u2019ll use the chart-individual-gcs command to show me how long each heap took to mark those. <font face=\"courier\"><\/font><\/p>\n<p><span>C:\\perf\\src\\benchmarks\\gc>py . chart-individual-gcs C:\\traces\\fragment\\fragment.yaml \u2013x-single-gc-metric Index \u2013y-single-heap-metrics MarkOlderMSec <\/span><\/p>\n<p><span>This will show 8 heaps. Consider passing <\/span><code>--show-n-heaps<\/code><span>. <\/span><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-time.jpg\" alt=\"markold-time\" width=\"1293\" height=\"474\" class=\"alignnone size-full wp-image-25707\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-time.jpg 1293w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-time-300x110.jpg 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-time-1024x375.jpg 1024w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-time-768x282.jpg 768w\" sizes=\"(max-width: 1293px) 100vw, 1293px\" \/><\/p>\n<p><span>Sure enough one of the heaps always takes significantly longer to mark young gen objects referenced by older gen objects, and to make sure it\u2019s not because of some other factors I also looked at how much is promoted per heap &#8211;<\/span><\/p>\n<p><font face=\"courier\"><\/font><\/p>\n<p><span>C:\\perf\\src\\benchmarks\\gc>py . chart-individual-gcs C:\\traces\\fragment\\fragment.yaml \u2013x-single-gc-metric Index \u2013y-single-heap-metrics MarkOlderPromotedMB <\/span><\/p>\n<p><span>This will show 8 heaps. Consider passing <\/span><code>--show-n-heaps<\/code><span>. <\/span><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-promoted.jpg\" alt=\"markold-promoted\" width=\"1312\" height=\"518\" class=\"alignnone size-full wp-image-25709\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-promoted.jpg 1312w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-promoted-300x118.jpg 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-promoted-1024x404.jpg 1024w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/12\/markold-promoted-768x303.jpg 768w\" sizes=\"(max-width: 1312px) 100vw, 1312px\" \/> This confirms the theory \u2013 it\u2019s because we marked significantly more with one heap which caused that heap to spend significantly longer in marking.<\/p>\n<p>This trace was taken with the latest version of the desktop CLR. In the current version of coreclr we are able to handle this situation better but I\u2019ll save that for another day since today I wanted to focus on tooling.<\/p>\n<p>There\u2019s an <a href=\"https:\/\/github.com\/dotnet\/performance\/blob\/master\/src\/benchmarks\/gc\/docs\/example.md\" rel=\"noopener noreferrer\" target=\"_blank\">example.md<\/a> that shows examples of using some of the commands. Note that the join analysis is not checked in just yet \u2013 the PR is out and I wanted to spend more time on the CR before merging it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We open sourced our new GC Perf Infrastructure! It\u2019s now part of the dotnet performance repo. I\u2019ve been meaning to write about it \u2018cause some curious minds had been asking when they could use it after I blogged about it last time but didn\u2019t get around to it till now. First of all, let me [&hellip;]<\/p>\n","protected":false},"author":3542,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,196,195,3008,3009],"tags":[4],"class_list":["post-25661","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-dotnet-core","category-dotnet-framework","category-maoni","category-performance","tag-net"],"acf":[],"blog_post_summary":"<p>We open sourced our new GC Perf Infrastructure! It\u2019s now part of the dotnet performance repo. I\u2019ve been meaning to write about it \u2018cause some curious minds had been asking when they could use it after I blogged about it last time but didn\u2019t get around to it till now. First of all, let me [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/25661","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/3542"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=25661"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/25661\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=25661"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=25661"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=25661"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}