Finalization implementation details

maoni

Maoni

Years ago I wrote a document on making finalization scanning concurrent. At the time there was an internal team that was using finalization as a way to resurrect objects and putting them back in their cache. While we’ve always advised to folks that finalization was for releasing native resources I couldn’t fault this team for using it the way they did. And of course finalization scanning was taking quite some time because well, there were so many finalizable objects to scan so I wanted to make this part concurrent. As part of the on-going latency reduction effort we’ve finally put this on the agenda. Of course between the time I wrote that doc and now, things have changed quite a bit and there are new considerations I have to make so I’m still thinking about the design. While I’m thinking about it I thought it would be interesting to explain some of the finalization implementation details so that’s what I’ll do in this blog entry. I’m in the process of working on a design doc for concurrent finalization scanning that I will be checking into the coreclr repo.

First of all I wanted to clarify some terminology. I’ve seen some confusing terms in various articles and books. So instead of using something that’s interpreted differently by different people I will just use the terms used in the GC code.

Internally we have a CFinalize class that manages the finalization. The only consumers of this class are the GC itself and of course the finalizer thread. And there’s only one finalizer thread (over the years there were talks about having multiple finalizer threads but there wasn’t enough justification; I have heard that there’s usage that actually depends on the fact there’s only one finalizer thread. While we have no plans to make this multiple I would strong suggest against doing that). The finalizer thread runs at THREAD_PRIORITY_HIGHEST. And this is an important detail as it will have performance implications that I will mention below.

The CFinalize class has an array which we use to track all finalizable objects. And we divide this array into a few parts that I will call segs (as in segments, not to be confused with the GC heap segments). The array looks like this – from lower addresses to higher addresses:

Gen2 seg | Gen1 seg | Gen0 seg | Critical Finalizer seg | Finalizer seg | Free seg

(I’m just calling each portion a “seg” so it’s easier to refer to them)

The genX segs are to record finalizable objects that are still live, ie, either in the generation that GC hasn’t collected so all objects in those generations are live by definition, or is held live by user code. When GC discovers an object to be dead, it then promotes the object which means the object is now tracked by either the Critical Finalizer seg or the Finalizer seg which I will refer to as CF/F segs (this is what’s usually referred to as f-reachable).

When the finalizer thread actually runs finalizers they would pick them off from the CF/F segs. And the ones whose finalizers have been run are now tracked in the Free seg (although saying tracked can be misleading as currently we just treat this seg as “we no longer care about the objects in this seg” and the seg is really just to indicate the free slots we have in the array).

Within GC, each heap has its own CFinalize instance which is called finalize_queue. For lifetime tracking, GC does the following with finalize_queue:

  • The CF/F segs are considered as a source of roots so they will be scanned to indicate which objects should be live. This is what CFinalize::GcScanRoots does. Because the finalizer thread runs at high priority, it’s very likely that there’s nothing here to scan because when we were done with the GC that discovered these objects and moved them to these segs, the finalizer thread was allowed to run and would’ve quickly finished running the finalizers (unless of course the finalizers were blocked for some reason – so there you go, another reason why it’s bad to have your finalizers block which means GC will need to scan the CF/F segs again).

Of course there are other sources of roots like the stack or GC handles. After we are done marking all the objects held live, directly or indirectly by all those roots, we now have the full knowledge of object lifetime.

  • Then we scan the genX segs to see which objects tracked there are dead, in other words, not promoted (!IsPromoted(obj)), and for those objects we need to promote them so the finalizers can run. So these newly promoted objects are promoted due to finalization. This is what CFinalize::ScanForFinalization does. There are exceptions to this – if it’s a WeakReference or `WeakReference<T>`, we just run what the finalizer is supposed to do (remember finalizers are written in managed code and run on the finalizer thread – we wouldn’t be running the finalizer the normal way when the EE is suspended) and *not* promote those objects ’cause now we have no reason to. This is a nice perf shortcut. The other exception is when the finalizer is suppressed.

There’s another relevant thing to mention which is the short and long weak handles. These are the handle types we use in the runtime. They are exposed as GCHandleType.Weak and GCHandleType.WeakTrackResurrection respectively. Before we do ScanForFinalization, we null the target of short weak handles if the target object is not promoted. After ScanForFinalization, we do that for long weak handles. So what distinguishes these 2 handle types is the promotion due to finalization.

maoni
Maoni Stephens

Follow Maoni   

3 comments

  • Avatar
    Robert Haken

    Hi Maoni,
    thank you for a great post!

    Question: What is the difference in between Critical Finalizer seg and Finalizer seg? Under which circumstances go objects to which seg?

    Looking forward to seeing you in Prague on Wed. 😉

  • Avatar
    Serge Baltic

    In this scheme, where is the perf impact of making an object finalizable concentrated? Supposing we’re not doing Suppress Finalize for it, and that its finalizer method isn’t complex. An extra set of roots in CF/F? But they’re not supposed to have huge unique graphs under them… Managing the lists? How bad after all?

    Same for having a weak ref, what’s the overhead? No CF/F membership and no finalizer queue and no finalizer thread, if we compare to finalizable objects, so how bad is it to have a bunch of weak refs?

Leave a comment