{"id":406,"date":"2017-01-05T13:27:46","date_gmt":"2017-01-05T05:27:46","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/seteplia\/?p=406"},"modified":"2017-01-05T13:27:46","modified_gmt":"2017-01-05T05:27:46","slug":"understanding-different-gc-modes-with-concurrency-visualizer","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/understanding-different-gc-modes-with-concurrency-visualizer\/","title":{"rendered":"Understanding different GC modes with Concurrency Visualizer"},"content":{"rendered":"<p align=\"justify\">In this post I\u2019m going to visualize what exactly happens during Garbage Collection (GC) and how different GC modes can significantly affect application performance. <\/p>\n<p align=\"justify\">I assume that the reader is familiar with garbage collection basics. If this isn\u2019t the case I encourage you to spend 15 minutes to fill this gap, for instance from the following article \u2013 <a href=\"https:\/\/msdn.microsoft.com\/library\/ee787088(v=vs.110).aspx\">\u201cFundamentals of Garbage Collection\u201d<\/a> or from a chapter in your favorite book on C#\/.NET (*). <\/p>\n<p align=\"justify\">The Garbage Collector in the CLR is a very complicated, configurable and self-tuning creature that may change behavior based on application needs. To satisfy different memory usage requirements the GC has some options to configure how it operates. There are two main modes: Workstation mode (designed to minimize delays) and Server mode (designed for maximum application throughput). The GC also supports one of two \u201csub-modes\u201d &#8211; concurrent or non-concurrent (**). <\/p>\n<h4 align=\"justify\">Workstation GC vs. Server GC<\/h4>\n<p align=\"justify\">Workstation GC is designed for desktop applications to minimize the time spent in GC. In this case GC will happen more frequently but with shorter pauses in application threads. Server GC is optimized for application throughput in favor of longer GC pauses. Memory consumption will be higher, but application can process greater volume of data without triggering garbage collection. <\/p>\n<p align=\"justify\">All managed objects are stored in segments. There is one segment for young generations (called the ephemeral segment) and many segments for generation 2 and large object heap. When the ephemeral segment is full, CLR will allocate a new one. But before that, GC will happen. The size of the segment varies depending on whether a system is 32- or 64-bit, and on the type of the garbage collector. Workstation GC uses smaller segments and Server GC uses bigger segment, although the size depends on the number of CPU cores. Smaller the segments are more frequently GC will occur. Workstation GC is used by default in all managed apps and is best suited for UI applications. Server GC could be turned on by the CLR host or configured in <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/ms229357(v=vs.110).aspx\">&lt;gcServer&gt;<\/a> element in the application configuration file and intended for server applications. <\/p>\n<p align=\"justify\">GC flavors like \u2018concurrent\u2019 or \u2018non-concurrent\u2019 may help fine tune the garbage collection to gain maximum performance and\/or responsiveness for your application. Concurrent mode reduces the overall time spent in GC because the mark phase for 2<sup>nd<\/sup> generation happens in dedicated thread in parallel with application threads. In this mode, GC suspends user threads for shorter amount of time but will use slightly more memory.  <\/p>\n<p align=\"justify\">Concurrent Workstation GC is best suited for UI applications and non-concurrent Workstation GC should be used for lightweight server processes or for server apps on single-core machines. <\/p>\n<p align=\"justify\">To visualize the GC, I\u2019ll be using a tool called Concurrency Visualizer, observing a simple console application. Concurrency Visualizer is a <a href=\"https:\/\/marketplace.visualstudio.com\/items?itemName=VisualStudioProductTeam.ConcurrencyVisualizerforVisualStudio2015\">Visual Studio extension<\/a> that shows various threading aspects of the application, like lock contention, thread synchronization, input-output operations, GC pauses and other. The app is simply allocates byte arrays. Some arrays are kept in the internal lists and some of them are eligible for garbage collection immediately. <\/p>\n<p align=\"justify\">Now, let\u2019s take a look at each mode in more details using <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/dd537632.aspx\">Concurrency Visualizer<\/a>. <\/p>\n<h4 align=\"justify\">Workstation GC: non-concurrent mode<\/h4>\n<p align=\"justify\">There are a few reasons for GC to happen: Generation 0 is full or Gen0 budget is reached, <b>GC.Collect<\/b> was called, or the system memory is low. We are only interested in the first option.  <\/p>\n<p align=\"justify\">Here is a very rough algorithm for workstation non-concurrent GC: <\/p>\n<ol>\n<li>\n<div align=\"justify\">Application thread allocates an object and GC can\u2019t fulfill the request. GC is started.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR suspends all managed threads.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR collects the garbage in the thread that triggered the GC.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR resumes all application threads once GC is done.<\/div>\n<\/li>\n<\/ol>\n<p align=\"justify\">For testing purposes, I\u2019m using a laptop with a Core i7 processor. The sample application is using 8 threads to do its job, but I will show fewer threads for the sake of simplicity. <\/p>\n<p align=\"justify\">Steps #1 and 2: CLR suspends all managed threads: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0028.jpg\"><img decoding=\"async\" title=\"clip_image002[8]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image002[8]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0028_thumb.jpg\" width=\"640\" height=\"233\"><\/a><\/p>\n<p align=\"justify\">Above, we see GC was triggered by thread <b>2948<\/b> and it waits for the CLR to suspend all managed threads. After that, the thread will collect the garbage and (as we will see in a moment) compact the heap. Note, heap compaction isn\u2019t happening for every GC. The CLR tries to maximize GC performance and compacts the heap only when garbage\/survivor ratio is high and compaction is useful. <\/p>\n<p align=\"justify\">Step #3: garbage collection: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0048.jpg\"><img decoding=\"async\" title=\"clip_image004[8]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image004[8]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0048_thumb.jpg\" width=\"640\" height=\"244\"><\/a><\/p>\n<p align=\"justify\">While GC is in progress, all managed threads are suspended waiting for GC: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0068.jpg\"><img decoding=\"async\" title=\"clip_image006[8]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image006[8]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0068_thumb.jpg\" width=\"640\" height=\"221\"><\/a><\/p>\n<p align=\"justify\">This example shows GC for Gen0, but in non-concurrent workstation GC, the process is the same for older generations as well. It just takes more time. <\/p>\n<p align=\"justify\">Now let\u2019s look at more sophisticated mode: concurrent Workstation GC. <\/p>\n<h4 align=\"justify\">Workstation GC: concurrent mode<\/h4>\n<p align=\"justify\">In concurrent (or background) mode, the CLR creates a dedicated high-priority thread <b>for Gen2 collection<\/b>. In this case, the first phase of the garbage collection, mark phase, is happening in parallel with application threads. During this phase the application is still running, so user threads can allocate new objects and even trigger GC for young generations. <\/p>\n<p align=\"justify\">This is the main difference between old Concurrent GC available in pre .NET 4.0 era and the new Background GC. Concurrent GC also had a dedicated worker thread for Gen2 collection, but unlike Background GC if a user thread triggered GC, the thread was blocked while current GC is in progress. Background GC allows ephemeral collection in the middle of the background one. Background GC supersedes Concurrent GC and the same key is used to turn it on. In .NET 4.0+ there is no way to use Concurrent GC any more. <\/p>\n<p align=\"justify\">Here is how GC looks like for background Workstation GC: <\/p>\n<ol>\n<li>\n<div align=\"justify\">Application thread allocates an object and GC can\u2019t fulfill the request. GC is started.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR suspends all managed threads.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR collects Gen0 and Gen1.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR starts background collection and resumes all managed threads.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">Background thread marks all reachable objects in memory and suspends application threads for sweep or compact phase.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR resumes all application threads once GC is done.<\/div>\n<\/li>\n<\/ol>\n<p align=\"justify\">This is a very basic description and set of steps could differ based on some heuristics, like the degree of heap fragmentation or if there are any GC requests during background collection. <\/p>\n<p align=\"justify\">In the following case GC was triggered by thread <b>12600<\/b>, and the thread waits till all the threads are suspended: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0086.jpg\"><img decoding=\"async\" title=\"clip_image008[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image008[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0086_thumb.jpg\" width=\"640\" height=\"284\"><\/a><\/p>\n<p align=\"justify\">Then the thread <b>12600<\/b> collects Gen0 and Gen1: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0106.jpg\"><img decoding=\"async\" title=\"clip_image010[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image010[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0106_thumb.jpg\" width=\"640\" height=\"267\"><\/a><\/p>\n<p align=\"justify\">Then GC starts background collection for Gen2: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0126.jpg\"><img decoding=\"async\" title=\"clip_image012[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image012[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0126_thumb.jpg\" width=\"640\" height=\"284\"><\/a><\/p>\n<p align=\"justify\">And thread <b>15972<\/b> starts background collection: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0146.jpg\"><img decoding=\"async\" title=\"clip_image014[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image014[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0146_thumb.jpg\" width=\"640\" height=\"249\"><\/a><\/p>\n<p align=\"justify\">After the mark phase, the background thread suspends the worker threads until GC is done: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0166.jpg\"><img decoding=\"async\" title=\"clip_image016[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image016[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0166_thumb.jpg\" width=\"640\" height=\"266\"><\/a><\/p>\n<p align=\"justify\">The background thread sweeps the heap: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0186.jpg\"><img decoding=\"async\" title=\"clip_image018[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image018[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0186_thumb.jpg\" width=\"640\" height=\"256\"><\/a><\/p>\n<p align=\"justify\">And releases free segments while application threads are running: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0206.jpg\"><img decoding=\"async\" title=\"clip_image020[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image020[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0206_thumb.jpg\" width=\"640\" height=\"267\"><\/a><\/p>\n<h4 align=\"justify\">Server GC<\/h4>\n<p align=\"justify\">Server GC has a few very important aspects that affect garbage collection: <\/p>\n<p align=\"justify\">1. Server GC is using bigger segments (few times bigger than for workstation GC). <\/p>\n<p align=\"justify\">2. The CLR creates 1 managed heap per core. This means that for an 8 core machine, the CLR will allocate 8 <b>distinct<\/b> managed heaps. <\/p>\n<p align=\"justify\">3. GC happens in dedicated threads: one thread per managed heap. <\/p>\n<p align=\"justify\">Server GC trades memory in favor of throughput. Larger heaps mean that memory saturation happens less frequently, but once it happens, the CLR needs to do more work to traverse the heap. As a result, the application consumes more memory and GC will happen less frequently, but every GC will take longer period of time even for collecting Gen0 and Gen1. <\/p>\n<p align=\"justify\">To speed up the GC CLR is uses a dedicated high priority thread even for ephemeral collection. In the case of background GC the CLR will create yet another set of threads (one per core) for background analysis. Managed applications with background server GC will use 16 additional threads for an 8 core machine! <\/p>\n<p align=\"justify\">Now let\u2019s take a look at Server GC with Background mode. (<b>Note:<\/b> Background Server GC is available only from .NET Framework 4.5). Because the number of threads is so high I\u2019ll show only a part of them. <\/p>\n<p align=\"justify\">The basic workflow for Background Server GC is as following: <\/p>\n<ol>\n<li>\n<div align=\"justify\">Application thread allocates an object and GC can\u2019t fulfill the request. GC is started.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR suspends all managed threads.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR collects Gen0 and Gen1 in dedicated GC worker threads.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR suspends GC worker threads and starts background collection. All the managed threads are resumed.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">Background threads mark all reachable objects in memory and suspend application threads for sweep or compact phase.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">CLR resumes GC worker threads to sweep the heap.<\/div>\n<\/li>\n<li>\n<div align=\"justify\">Application threads wait for GC to finish.<\/div>\n<\/li>\n<\/ol>\n<p align=\"justify\">The following screenshots shows the 3 groups of threads:  <\/p>\n<p align=\"justify\">\u00b7 First 4 threads are foreground GC threads responsible for collecting its own heap. <\/p>\n<p align=\"justify\">\u00b7 Second 4 threads are application worker threads. <\/p>\n<p align=\"justify\">\u00b7 Last 4 threads are dedicated for background GC. <\/p>\n<p align=\"justify\">The screenshot below shows that application threads are suspended waiting for foreground GC to finish: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0226.jpg\"><img decoding=\"async\" title=\"clip_image022[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image022[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0226_thumb.jpg\" width=\"640\" height=\"298\"><\/a><\/p>\n<p align=\"justify\">GC Worker threads are doing Gen0\/Gen1 collection: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0246.jpg\"><img decoding=\"async\" title=\"clip_image024[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image024[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0246_thumb.jpg\" width=\"640\" height=\"295\"><\/a><\/p>\n<p align=\"justify\">GC triggers a background collection: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0266.jpg\"><img decoding=\"async\" title=\"clip_image026[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image026[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0266_thumb.jpg\" width=\"640\" height=\"307\"><\/a><\/p>\n<p align=\"justify\">Then the CLR resumes GC worker threads to compact the heap:<br><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0286.jpg\"><img decoding=\"async\" title=\"clip_image028[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image028[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0286_thumb.jpg\" width=\"640\" height=\"312\"><\/a><\/p>\n<p align=\"justify\">Meanwhile application threads are blocked waiting for GC to finish: <\/p>\n<p align=\"justify\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0306.jpg\"><img decoding=\"async\" title=\"clip_image030[6]\" style=\"border-top: 0px;border-right: 0px;border-bottom: 0px;padding-top: 0px;padding-left: 0px;border-left: 0px;padding-right: 0px\" border=\"0\" alt=\"clip_image030[6]\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image0306_thumb.jpg\" width=\"640\" height=\"302\"><\/a><\/p>\n<p align=\"justify\">As you can see, background Server GC is more complicated that the workstation GC. It requires more resources and more complicated cross-thread collaboration. <\/p>\n<h4 align=\"justify\">Server GC vs. Workstation GC in a real application<\/h4>\n<p align=\"justify\">GC has significant effect on any performance critical managed application. Allocations are very cheap, but garbage collection is not. Different GC modes are more suitable for different kinds of apps; and a basic understanding of how GC works can help you pick the right mode. Just switching from one GC mode to another could increase end-to-end application performance significantly. <\/p>\n<p align=\"justify\">In my spare time I work on a Roslyn analyzer project called <a href=\"https:\/\/github.com\/SergeyTeplyakov\/ErrorProne.NET\/\">ErrorProne.NET<\/a>. The tool helps find some common errors in C# programs like invalid format strings or suspicious\/invalid exception handling. Like every analyzer, ErrorProne.NET could be integrated in Visual Studio but in some cases console mode (CLI, Command Line Interface) is more preferable.  <\/p>\n<p align=\"justify\">To validate newly created rules and to check performance, I\u2019m constantly running ErrorProne.NET on different open-source projects, like <a href=\"https:\/\/github.com\/DotNetAnalyzers\/StyleCopAnalyzers\">StylecopAnalyzers<\/a> or the <a href=\"https:\/\/github.com\/dotnet\/roslyn\/\">Roslyn<\/a> codebase itself. To do that I\u2019m using a console application that opens the solution, runs all the analyzers and prints a report in a human readable form. <\/p>\n<p align=\"justify\">By default every console application uses Background Workstation GC and recently I\u2019ve decided to check what will happen if I\u2019ll switch to Server GC. Here is what I\u2019ve got by running my app with different GC modes and \u2018Prefer 32bit\u2019 flag enabled. I\u2019ve used <a href=\"https:\/\/channel9.msdn.com\/Series\/PerfView-Tutorial\">PerfView<\/a> to collect this information:<\/p>\n<table cellspacing=\"0\" cellpadding=\"0\" border=\"1\">\n<tbody>\n<tr>\n<td valign=\"top\" width=\"124\">\n<p><b>GC Mode<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"64\">\n<p><b>E2E time (ms)<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"84\">\n<p><b>Total GC Pause (ms)<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"59\">\n<p><b>% Time paused for GC<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"51\">\n<p><b>Gen0<\/b><b> Count<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p><b>Gen1 Count<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p><b>Gen2 Count<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"88\">\n<p><b>Total Allocations (Mb)<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"61\">\n<p><b>Max GC Heap Size (Mb)<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"124\">\n<p>Workstation GC<\/p>\n<\/td>\n<td valign=\"top\" width=\"64\">\n<p>132 765<\/p>\n<\/td>\n<td valign=\"top\" width=\"84\">\n<p>46 118<\/p>\n<\/td>\n<td valign=\"top\" width=\"59\">\n<p>35.1%<\/p>\n<\/td>\n<td valign=\"top\" width=\"51\">\n<p>1674<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>1439<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>35<\/p>\n<\/td>\n<td valign=\"top\" width=\"88\">\n<p>174 691<\/p>\n<\/td>\n<td valign=\"top\" width=\"61\">\n<p>1 561<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"124\">\n<p>Background Workstation GC<\/p>\n<\/td>\n<td valign=\"top\" width=\"64\">\n<p>132 008<\/p>\n<\/td>\n<td valign=\"top\" width=\"84\">\n<p>39 798<\/p>\n<\/td>\n<td valign=\"top\" width=\"59\">\n<p>30.4%<\/p>\n<\/td>\n<td valign=\"top\" width=\"51\">\n<p>2109<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>1451<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>65<\/p>\n<\/td>\n<td valign=\"top\" width=\"88\">\n<p>225 554<\/p>\n<\/td>\n<td valign=\"top\" width=\"61\">\n<p>1 676<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"124\">\n<p>Server GC<\/p>\n<\/td>\n<td valign=\"top\" width=\"64\">\n<p>102 553<\/p>\n<\/td>\n<td valign=\"top\" width=\"84\">\n<p>9 026<\/p>\n<\/td>\n<td valign=\"top\" width=\"59\">\n<p>9.1%<\/p>\n<\/td>\n<td valign=\"top\" width=\"51\">\n<p>28<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>130<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>8<\/p>\n<\/td>\n<td valign=\"top\" width=\"88\">\n<p>17 959<\/p>\n<\/td>\n<td valign=\"top\" width=\"61\">\n<p>1 667<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"124\">\n<p>Background Server GC<\/p>\n<\/td>\n<td valign=\"top\" width=\"64\">\n<p><b>99 867<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"84\">\n<p><b>8 040<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"59\">\n<p><b>8.5%<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"51\">\n<p>23<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>148<\/p>\n<\/td>\n<td valign=\"top\" width=\"46\">\n<p>9<\/p>\n<\/td>\n<td valign=\"top\" width=\"88\">\n<p>16 610<\/p>\n<\/td>\n<td valign=\"top\" width=\"61\">\n<p>1 724<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p align=\"justify\">&nbsp; <\/p>\n<p align=\"justify\">This table clearly shows the huge difference between Server GC and Workstation GC for this application: just by switching from default Workstation GC to Server GC End-to-End time dropped by 30%! There are two reasons for this: number of managed heaps and segment size. Bigger segments allow allocating more objects in ephemeral segment and drastically reduces number of Gen0 and Gen1 collections: from 3500 to 200. Lower number of garbage collections significantly reduced the total allocations size (from 170Gb to 17Gb). <\/p>\n<p align=\"justify\">Another interesting data point is the number of GC pauses that took longer than 200ms and mean GC duration for different workstation GC flavors:<\/p>\n<table cellspacing=\"0\" cellpadding=\"0\" border=\"1\">\n<tbody>\n<tr>\n<td valign=\"top\" width=\"208\">\n<p><b>GC Mode<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"208\">\n<p><b>Number of GC pause &gt; 200ms<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p><b>Gen0 (ms)<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p><b>Gen1 (ms)<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p><b>Gen2 (ms)<\/b><\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p><b>All (ms)<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"208\">\n<p>Workstation GC<\/p>\n<\/td>\n<td valign=\"top\" width=\"208\">\n<p>11<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>6.8<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>16.2<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>322.6<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>14.7<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"208\">\n<p>Background Workstation GC<\/p>\n<\/td>\n<td valign=\"top\" width=\"208\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>7.9<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>15.4<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>12.7<\/p>\n<\/td>\n<td valign=\"top\" width=\"52\">\n<p>11.0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p align=\"justify\">&nbsp;<\/p>\n<p align=\"justify\">The table shows that Background mode reasonably reduces amount of long GC pauses by reducing Gen2 collection time. <\/p>\n<h4 align=\"justify\">Conclusion<\/h4>\n<p align=\"justify\">Not everyone is working on high performance managed applications. In many cases GC can efficiently do its job without any human intervention. Application performance is unaffected and the main concern of the developer lies elsewhere.  <\/p>\n<p align=\"justify\">But this isn\u2019t always the case. <\/p>\n<p align=\"justify\">Many of us are working on system-level or high performance software written in C# like games, database servers or web-servers with huge load. In this case, good understanding of what GC does is crucial. It is very important to understand the behavior for different GC modes, what a memory segment is, and why a GC pause in one case could be way higher than in another. <\/p>\n<p align=\"justify\">I hope this post helped you to build a mental model in your head for the various GC modes and gave enough information to take GC seriously. <\/p>\n<h4 align=\"justify\">Additional resources<\/h4>\n<ul>\n<li>\n<div align=\"justify\"><a href=\"https:\/\/msdn.microsoft.com\/library\/ee787088(v=vs.110).aspx\">Fundamentals of Garbage Collection<\/a><\/div>\n<\/li>\n<li>\n<div align=\"justify\"><a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/dd537632.aspx\">Concurrency Visualizer<\/a><\/div>\n<\/li>\n<li>\n<div align=\"justify\"><a href=\"https:\/\/raw.githubusercontent.com\/dotnet\/coreclr\/master\/src\/gc\/gc.cpp\">gc.cpp<\/a> at coreclr repo<\/div>\n<\/li>\n<li>\n<div align=\"justify\"><a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2012\/07\/20\/the-net-framework-4-5-includes-new-garbage-collector-enhancements-for-client-and-server-apps\/\">The .NET Framework 4.5 includes new garbage collector enhancements for client and server apps<\/a><\/div>\n<\/li>\n<li>\n<div align=\"justify\">Using GC Efficiently <a href=\"https:\/\/blogs.msdn.microsoft.com\/maoni\">by Maoni Stephens<\/a>: <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/using-gc-efficiently-part-1\/\">Part 1<\/a>, <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/using-gc-efficiently-part-2\/\">Part 2<\/a>, <a href=\"https:\/\/blogs.msdn.microsoft.com\/maoni\/2004\/12\/19\/using-gc-efficiently-part-3\/\">Part 3<\/a>, <a href=\"https:\/\/blogs.msdn.microsoft.com\/maoni\/2005\/05\/06\/using-gc-efficiently-part-4\/\">Part 4<\/a>.<\/div>\n<\/li>\n<li>\n<div align=\"justify\"><a href=\"http:\/\/mattwarren.org\/2016\/06\/20\/Visualising-the-dotNET-Garbage-Collector\/\">Visualizing the .NET Garbage Collector<\/a> by Matt Warren<\/div>\n<\/li>\n<\/ul>\n<p align=\"justify\">&#8212;&#8211; <\/p>\n<p align=\"justify\">(*) If you want to look behind the curtain and understand CLR internals, I would suggest to look at the amazing <a href=\"http:\/\/www.amazon.com\/CLR-via-Pro-Developer-Jeffrey-Richter\/dp\/0735627045\/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1268296852&amp;sr=1-1\">\u201cCLR via C#\u201d<\/a> by Jeffrey Richter. But you can use other good books, such as <a href=\"http:\/\/www.amazon.com\/5-0-Nutshell-The-Definitive-Reference\/dp\/1449320104\/ref=dp_ob_title_bk\">\u201cC# In a Nutshell\u201d<\/a> by Joe Albahari. If you need an even deeper dive into this topic I would suggest <a href=\"http:\/\/www.amazon.com\/Pro-NET-Performance-Optimize-Applications\/dp\/1430244585\/ref=pd_sim_b_8?ie=UTF8&amp;refRID=1VGXFT5MV4SZ9H0PVBF8\">\u201cPro .NET Performance\u201d<\/a> by Sasha Goldshtein or <a href=\"http:\/\/www.amazon.com\/Under-Hood-NET-Memory-Management\/dp\/1906434751\/ref=pd_sim_b_26?ie=UTF8&amp;refRID=049QWSYG3ENNKTJBFB5N\">\u201cUnder the Hood of .NET Memory Management\u201d<\/a> by Chris Farrel. <\/p>\n<p align=\"justify\">(**) The terminology is a bit unfortunate here. From the beginning CLR has Concurrent mode that allowed Gen2 collection in a separate thread. But later (in .NET 4.0) Concurrent GC was superseded by Background GC with slightly different implementation. There is just one configuration that turns \u201cconcurrent\u201d GC on, but the actual behavior would be different based on .NET Framework version.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post I\u2019m going to visualize what exactly happens during Garbage Collection (GC) and how different GC modes can significantly affect application performance. I assume that the reader is familiar with garbage collection basics. If this isn\u2019t the case I encourage you to spend 15 minutes to fill this gap, for instance from the [&hellip;]<\/p>\n","protected":false},"author":4004,"featured_media":37840,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[6696,6698],"tags":[6695],"class_list":["post-406","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-concurrency","category-gc","tag-seteplia"],"acf":[],"blog_post_summary":"<p>In this post I\u2019m going to visualize what exactly happens during Garbage Collection (GC) and how different GC modes can significantly affect application performance. I assume that the reader is familiar with garbage collection basics. If this isn\u2019t the case I encourage you to spend 15 minutes to fill this gap, for instance from the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/406","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/4004"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=406"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/406\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/37840"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=406"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=406"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=406"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}