Finalization implementation details
Years ago I wrote a document on making finalization scanning concurrent. At the time there was an internal team that was using finalization as a way to resurrect objects and putting them back in their cache. While we’ve always advised to folks that finalization was for releasing native resources I couldn’t fault this team for using it the way they did. And of course finalization scanning was taking quite some time because well, there were so many finalizable objects to scan so I wanted to make this part concurrent. As part of the on-going latency reduction effort we’ve finally put this on the agenda. Of course between the time I wrote that doc and now, things have changed quite a bit and there are new considerations I have to make so I’m still thinking about the design. While I’m thinking about it I thought it would be interesting to explain some of the finalization implementation details so that’s what I’ll do in this blog entry. I’m in the process of working on a design doc for concurrent finalization scanning that I will be checking into the coreclr repo.
First of all I wanted to clarify some terminology. I’ve seen some confusing terms in various articles and books. So instead of using something that’s interpreted differently by different people I will just use the terms used in the GC code.
Internally we have a CFinalize class that manages the finalization. The only consumers of this class are the GC itself and of course the finalizer thread. And there’s only one finalizer thread (over the years there were talks about having multiple finalizer threads but there wasn’t enough justification; I have heard that there’s usage that actually depends on the fact there’s only one finalizer thread. While we have no plans to make this multiple I would strong suggest against doing that). The finalizer thread runs at
THREAD_PRIORITY_HIGHEST. And this is an important detail as it will have performance implications that I will mention below.
CFinalize class has an array which we use to track all finalizable objects. And we divide this array into a few parts that I will call segs (as in segments, not to be confused with the GC heap segments). The array looks like this – from lower addresses to higher addresses:
Gen2 seg | Gen1 seg | Gen0 seg | Critical Finalizer seg | Finalizer seg | Free seg
(I’m just calling each portion a “seg” so it’s easier to refer to them)
The genX segs are to record finalizable objects that are still live, ie, either in the generation that GC hasn’t collected so all objects in those generations are live by definition, or is held live by user code. When GC discovers an object to be dead, it then promotes the object which means the object is now tracked by either the Critical Finalizer seg or the Finalizer seg which I will refer to as CF/F segs (this is what’s usually referred to as f-reachable).
When the finalizer thread actually runs finalizers they would pick them off from the CF/F segs. And the ones whose finalizers have been run are now tracked in the Free seg (although saying tracked can be misleading as currently we just treat this seg as “we no longer care about the objects in this seg” and the seg is really just to indicate the free slots we have in the array).
Within GC, each heap has its own
CFinalize instance which is called
finalize_queue. For lifetime tracking, GC does the following with
- The CF/F segs are considered as a source of roots so they will be scanned to indicate which objects should be live. This is what
CFinalize::GcScanRootsdoes. Because the finalizer thread runs at high priority, it’s very likely that there’s nothing here to scan because when we were done with the GC that discovered these objects and moved them to these segs, the finalizer thread was allowed to run and would’ve quickly finished running the finalizers (unless of course the finalizers were blocked for some reason – so there you go, another reason why it’s bad to have your finalizers block which means GC will need to scan the CF/F segs again).
Of course there are other sources of roots like the stack or GC handles. After we are done marking all the objects held live, directly or indirectly by all those roots, we now have the full knowledge of object lifetime.
- Then we scan the genX segs to see which objects tracked there are dead, in other words, not promoted (
!IsPromoted(obj)), and for those objects we need to promote them so the finalizers can run. So these newly promoted objects are promoted due to finalization. This is what
CFinalize::ScanForFinalizationdoes. There are exceptions to this – if it’s a
WeakReferenceor `WeakReference<T>`, we just run what the finalizer is supposed to do (remember finalizers are written in managed code and run on the finalizer thread – we wouldn’t be running the finalizer the normal way when the EE is suspended) and *not* promote those objects ’cause now we have no reason to. This is a nice perf shortcut. The other exception is when the finalizer is suppressed.
There’s another relevant thing to mention which is the short and long weak handles. These are the handle types we use in the runtime. They are exposed as
GCHandleType.WeakTrackResurrection respectively. Before we do
ScanForFinalization, we null the target of short weak handles if the target object is not promoted. After
ScanForFinalization, we do that for long weak handles. So what distinguishes these 2 handle types is the promotion due to finalization.