Difference Between Perf Data Reported by Different Tools – 4

maoni

Maoni

.NET CLR Memory\% Time in GC counter and !runaway on thread(s) doing GC.


The 2 common ways people use to look at the time spent in GC are the % Time in GC performance counter under .NET CLR Memory, and the CPU time displayed by the !runaway debugger command in cdb/windbg. What do they mean exactly? % Time in GC is calculated like this:


When the nthGC starts (ie, after the managed threads are suspended and before the GC work starts), we record the timestamp at that time. Let’s call this TA(n);


When nth GC ends (ie, after the GC work is done and before we resume the managed threads), we record another timestamp. Let’s call this TB(n);


So Time spent in this GC is TB(n) – TA(n). And the time since the last GC ended is TB(n) – TB(n-1). So % Time in GC is (TB(n) – TA(n)) / (TB(n) – TB(n-1)).


Since we only record the timestamps we don’t actually discount the time when the thread was switched out – for example, if you are on a single proc machine and another process has a thread of the same priority (as the thread that’s doing GC) that’s also ready to run it may take away some time from the thread that’s doing the GC. None the less it’s a good approximation.


One common scenario where it’s not a good approximation is when paging occurs. In this case you will see a very high % Time in GC but really the time that’s actually spent doing GC work is low ‘cause most of the time is spent doing IO. To verify if you hit this case you can look at the Memory\Pages/sec to see how much it’s paging.


!runaway is more accurate in the sense that it does record the actual time spent on the threads. However I did observe 2 common mistakes when using !runaway to look at time spent in GC.


1) mistake “time spent on the GC thread(s)” as “time spent in GC”.


Let’s take Server GC as an example. When a GC is needed the GC threads first perform the suspension work. Obviously this takes time. Sometimes it can take quite a bit time if you have many threads or some threads get stuck and it takes a long time to suspend them.


2) using the current !runaway output to judge how much time GC has taken without taking into count that some user threads have died. Besides looking at the User/Kernel Mode time you may want to also look at Elapsed Time. The following output is from one of the issues I looked at:


Elapsed Time


  Thread       Time


   0:2094      0 days 11:57:10.406


   1:2098      0 days 11:57:10.152


   2:2044      0 days 11:57:09.898


   3:27cc      0 days 11:57:09.882


   4:20c0      0 days 11:57:09.597


  13:810       0 days 11:57:09.565


  12:21d4      0 days 11:57:09.565


  11:2308      0 days 11:57:09.565


  10:1d24      0 days 11:57:09.565


   9:23e0      0 days 11:57:09.565


   8:24d0      0 days 11:57:09.565


   7:2168      0 days 11:57:09.565


   6:2134      0 days 11:57:09.565


   5:2124      0 days 11:57:09.565


  14:570       0 days 11:57:08.582


  15:10fc      0 days 11:57:00.000


  16:1ac4      0 days 11:56:58.256


  17:1900      0 days 11:56:58.176


  18:1624      0 days 11:56:57.494


  19:11f8      0 days 10:35:01.881


  21:1620      0 days 0:41:45.017


  22:1cc8      0 days 0:37:30.496


  23:2de0      0 days 0:29:01.820


  24:2e10      0 days 0:29:01.711


  25:2d88      0 days 0:22:26.988


  26:1668      0 days 0:19:26.175


  27:2cb0      0 days 0:16:16.814


  29:2d1c      0 days 0:11:53.779


  28:1de8      0 days 0:11:53.779


  30:2ed4      0 days 0:11:35.466


  32:2030      0 days 0:11:22.544


  31:2084      0 days 0:11:22.544


  33:12b0      0 days 0:07:53.510


  35:28e4      0 days 0:03:13.801


  34:2654      0 days 0:03:13.801


  36:2d68      0 days 0:02:31.988


 


The orange lines are GC threads. They were created about 12 hours ago. The green lines are user threads and they were created 2 mins to 40mins ago. So naturally at the current state these user threads couldn’t’ve spent much time.

maoni
Maoni Stephens

Follow Maoni   

No Comments.