{"id":705,"date":"2019-04-03T00:27:22","date_gmt":"2019-04-03T00:27:22","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/maoni\/?p=705"},"modified":"2019-09-23T15:48:35","modified_gmt":"2019-09-23T22:48:35","slug":"making-cpu-configuration-better-for-gc-on-machines-with-64-cpus","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/making-cpu-configuration-better-for-gc-on-machines-with-64-cpus\/","title":{"rendered":"Making CPU configuration better for GC on machines with &gt; 64 CPUs"},"content":{"rendered":"<p>If you are running Windows on a machine with > 64 CPUs, you\u2019ll need to use this feature called the CPU groups for your process to be able to use more than 64 CPUs. At some point in the far distant past, people thought having more than 64 processors on a machine was inconceivable so they used a 64-bit number for the processor mask. And when 64-proc machines became available, Windows invented this <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/desktop\/ProcThread\/processor-groups\">CPU group<\/a> concept for such machines which says processors are now belong to different CPU groups where each group has no more than 64 procs. Eg, on a machine with 96 procs you will see 2 groups with 48 procs each. When a process starts, it always starts in a single CPU group. The only way to use processors from other groups is for this process to set its thread affinity so it will run on processors from other groups. When machines with > 64 procs became available, we added a config called <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/framework\/configure-apps\/file-schema\/runtime\/gccpugroup-element\">GCCpuGroup<\/a>. The default for it is 0 (ie, enabled=false) meaning without this config, if you are using Server GC, the process would create at most N Server GC threads\/heaps where N is the # of processors in the single group it started in. When this is set to 1, GC will create Server GC threads that span all active processors in all available CPU groups on the machine. Then our runtime started to run on Linux and with that we introduced an OS layer that GC calls via GCToOSInterface which abstracted away the OS functionalities GC needed, eg, VirtualAlloc and processor affinity. And we had our Linux OS layer simulated the Windows behavior. For the most part this was desired but there\u2019s one thing that became more and more a thorny point \u2013 it\u2019s the CPU group concept. Linux does not have this concept \u2013 if you have > 64 processors you will get to use 64 procs without doing anything special. And we had to write all this code in our Linux OS layer to group processors into CPU groups. Recently we decided to pull the plug on this and no longer have Linux simulate the Windows behavior \u2018cause we like the Linux behavior better for this particular case. I\u2019ve been working with <a href=\"https:\/\/github.com\/janvorli\">Jan Vorlicek <\/a>on this (he\u2019s doing all the work). The main pieces of this work are the following \u2013 1) GC will no longer have the concept of CPU groups. Checks like this: <script src=\"https:\/\/gist.github.com\/Maoni0\/e204385057ec57e7aee5bf2879f6dd99.js\"><\/script> will be removed. If the process needs to use > 64 procs it will be handled by the OS layer automatically. In fact, we are even thinking about changing the default of the GCCpuGroup from 0 to 1 so on coreclr on Windows you will no longer need to specify this config to have your process using more than 64 processors. As always, we welcome your feedback on this. 2) Previously, I explained the <font color=\"blue\">GCHeapAffinitizeMask<\/font> config in my blog. Since that\u2019s also a 64-bit number (on 64-bit OSs), it was designed for the single CPU group case. We are adding a new config, <font color=\"blue\">GCHeapAffinitizeRanges<\/font>, that allows you to specify processor ranges instead of a mask and it allows you to specify more than 64 procs if you wish. From Jan\u2019s <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/23537\">PR<\/a>:<\/p>\n<pre>\/\/ Unix:\n\/\/  The cpu index ranges is a comma separated list of indices or ranges of indices (e.g. 1-5).\n\/\/  Example 1,3,5,7-9,12\n\/\/ Windows:\n\/\/  The cpu index ranges is a comma separated list of group-annotated indices or ranges of indices.\n\/\/  The group number always prefixes index or range and is followed by colon.\n\/\/  Example 0:1,0:3,0:5,1:7-9,1:12\n<\/pre>\n<p>We need different formats for Windows and Linux because Windows simply does not expose global indices of processors &#8211; they have to be relative to the group they belong to. Previously on Windows, if you specified an affinity mask it would simply be ignored when you also specified to use CPU groups. With the new<\/p>\n<p><font color=\"blue\">GCHeapAffinitizeRanges<\/font> config you will be able to specify any processors on the machine, whether it has more than 64 procs or not, and whether you want to have your process use more than 64 procs or not. On Windows, if we do change the default of <font color=\"blue\">GCCpuGroup<\/font> to 1, it means you will automatically using processors in all CPU groups. And if you want the previous behavior you can just set <font color=\"blue\">GCCpuGroup<\/font> to 0. So our proposed new behavior would be \u2013 \n*   When <font color=\"blue\">GCCpuGroup<\/font> is 0, we read the <font color=\"blue\">GCHeapAffinitizeMask<\/font> config; \n*   When <font color=\"blue\">GCCpuGroup<\/font> is 1, we read the <font color=\"blue\">GCHeapAffinitizeRanges<\/font> config. \n*   <font color=\"blue\">GCCpuGroup<\/font> will be default to 1 and only be applicable on Windows.  Specifically &#8211;<\/p>\n<p><font color=\"blue\">On Windows<\/font> <font face=\"Lucida Console\"><\/p>\n<ul>\n<li>\n    When <font color=\"blue\">GCCpuGroup<\/font> is 1, <font color=\"blue\">GCHeapAffinitizeRanges<\/font> will be used to pick the processors to run on and <font color=\"blue\">GCHeapAffinitizeMask<\/font> is ignored.\n  <\/li>\n<li>\n    When <font color=\"blue\">GCCpuGroup<\/font> is 0 (default), <font color=\"blue\">GCHeapAffinitizeMask<\/font> will be used to pick the processors within the CPU group the process runs in, and <font color=\"blue\">GCHeapAffinitizeRanges<\/font> is ignored\n  <\/li>\n<\/ul>\n<p><\/font><\/p>\n<p><font color=\"blue\">On Linux<\/font> <font face=\"Lucida Console\"><\/p>\n<ul>\n  Our Linux OS layer will not have the CPU group concept anymore (all code there will be removed). So the <font color=\"blue\">GCCpuGroup<\/font> config is ignored on Linux, which *also* means the <font color=\"blue\">GCHeapAffinitizeMask<\/font> config is ignored. Note that on 2.2 and prior, our Linux OS layer only supported running on the first 64 procs that you process is allowed to run on. So essentially it was run with always <font color=\"blue\">GCCpuGroup<\/font> as false. The new behavior will always automatically use > 64 procs and you can use the <font color=\"blue\">GCHeapAffinitizeRanges<\/font> config to specify <= 64 procs if you wish.\n<\/ul>\n<p><\/font> Curious minds will also notice in the vicinity of the<\/p>\n<p><font color=\"blue\">GCCpuGroup<\/font> config in src\\clrconfigvalues.h there\u2019s a config called <font color=\"blue\">GCNumaAware<\/font>. This was added the same time when we added the <font color=\"blue\">GCCpuGroup<\/font> config. By default we always enabled NUMA. This means we allocate memory on the proper NUMA node and when we do heap balancing we will try to balance allocations to heaps that live on the same NUMA node first before we look at heaps on remote nodes. I\u2019m thinking of just getting rid of this config altogether \u2013 we had it for testing when we made GC NUMA aware years ago but I don\u2019t see any reason why anyone would want to be NUMA unware so there\u2019s no need for it anymore.<\/p>\n<p>EDIT on 04\/04\/2019 &#8211; we needed 2 different formats for on Windows and Linux for the GCHeapAffinitizeRanges config.\nEDIT on 09\/23\/2019 &#8211; on Windows we kept the default as disabled for GCCpuGroup.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you are running Windows on a machine with > 64 CPUs, you\u2019ll need to use this feature called the CPU groups for your process to be able to use more than 64 CPUs. At some point in the far distant past, people thought having more than 64 processors on a machine was inconceivable so [&hellip;]<\/p>\n","protected":false},"author":3542,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3009],"tags":[3011],"class_list":["post-705","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-performance","tag-maoniposts"],"acf":[],"blog_post_summary":"<p>If you are running Windows on a machine with > 64 CPUs, you\u2019ll need to use this feature called the CPU groups for your process to be able to use more than 64 CPUs. At some point in the far distant past, people thought having more than 64 processors on a machine was inconceivable so [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/705","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/3542"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=705"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/705\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=705"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=705"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=705"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}