{"id":228296,"date":"2021-08-05T11:01:08","date_gmt":"2021-08-05T18:01:08","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/java\/?p=228296"},"modified":"2021-08-26T17:00:37","modified_gmt":"2021-08-27T00:00:37","slug":"introducing-microsoft-gctoolkit","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/java\/introducing-microsoft-gctoolkit\/","title":{"rendered":"Introducing Microsoft GCToolkit"},"content":{"rendered":"<p><span data-contrast=\"none\">Microsoft\u2019s Java Engineering Group is excited to announce we have open-sourced the <\/span><a href=\"https:\/\/github.com\/microsoft\/gctoolkit\"><span data-contrast=\"none\">Microsoft GCToolkit<\/span><\/a><span data-contrast=\"none\"> on GitHub. GCToolkit is a set of libraries for analyzing Java garbage collection (GC) log files. The toolkit parses GC log files into discrete events and provides an API for aggregating data from those events. This allows the user to create arbitrary and complex analyses of the state of managed memory in the Java Virtual Machine (JVM) represented by the garbage collection log. In this blog post, I will introduce some of the key features to help you get the most from this project.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Managed memory in the Java Virtual Machine (JVM) is comprised of 3 main pieces,\u00a0memory\u00a0buffers known as Java heap, allocators which perform the work of getting data into Java heap, and garbage collection (GC). While GC is responsible for recovering\u00a0memory in Java heap that is no longer in use, the term is often used as a euphemism for memory management and tuning GC or tuning the collector are often used with the understanding that it refers to tuning the JVM\u2019s memory management subsystem.\u00a0<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">More importantly, it has long been known that a suboptimal configuration collector will result in your application requiring more CPU and memory while at the same time, degrade your end-users experience. In other words, poorly tuned often equates to a more expensive runtime and unhappy users. The challenge is that to optimally tune GC, one needs to create a delicate balance between several concerns all of which are not easily seen without the assistance of tooling. GCToolKit has been helpful in making this easier. So what is GCToolkit? Let&#8217;s take a tour to find out.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h3>GCToolkit Modules\u00a0<\/h3>\n<p><span data-contrast=\"none\">GCToolkit is made up of 3 Java modules that cover the API, GC log file parsers, and a messaging backplane based on Vert.x. The API module is the entry point into GCToolkit. It hides the details of using the parser and Vert.x to analyze a GC log file into a few method calls. The parser module is a collection of regular expressions and code that has been developed over many years to be the most robust GC log parser available. The Vert.x-based messaging backplane makes use of 2 message buses. The first message bus streams from a DataSource. The current implementation is to stream log lines from the GC log file. The listeners on this bus are the parsers that convert the data from the data source into events that represent either a GC cycle or safe point. These events are then published on an event bus. Listeners on the event bus are then able to receive and process events that are of interest to them. <\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h3>Aggregators and Aggregations\u00a0<\/h3>\n<p><span data-contrast=\"none\">The parser emits discrete JVM events (GC cycle events or safe point events) which makes it possible to write code to capture and analyze the data from those events. What data you want to analyze and what kind of analysis you want to perform are up to you. GCToolkit provides a simple Aggregator\/Aggregation framework for capturing and analyzing GC log file data.\u00a0<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">The code that\u00a0captures\u00a0an event\u00a0is called\u00a0an Aggregator, and the code that analyzes the data\u00a0is called\u00a0an Aggregation.\u00a0An Aggregator can capture several different events for the purpose of feeding the\u00a0analysis.\u00a0For example,\u00a0one may want to capture pause events for the purpose of\u00a0analyzing heap\u00a0occupancy. The Aggregator captures the event,\u00a0extracts the relevant data, and passes\u00a0the data to the Aggregation. The Aggregation\u00a0collates the data into meaningful analyses, for example,\u00a0total heap occupancy after GC.\u00a0\u00a0\u00a0\u00a0<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h3>Example<\/h3>\n<p><span data-contrast=\"none\">Let us\u00a0make this real by looking at an example that reports on total heap occupancy after a GC cycle has\u00a0been completed.\u00a0The following code\u00a0is\u00a0a minimal implementation that makes use of the key elements of the API.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"prettyprint\">public class Main {\u00a0\r\n\u00a0\u00a0\u00a0 public static void\u00a0main(String[]\u00a0args) throws Exception {\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0var\u00a0path =\u00a0Path.of(args[0]);\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0var\u00a0logFile\u00a0= new\u00a0SingleGCLogFile(path);\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0var\u00a0gcToolKit\u00a0= new\u00a0GCToolKit();\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0var\u00a0jvm\u00a0=\u00a0gcToolKit.analyze(logFile);\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0var\u00a0results =\u00a0jvm.getAggregation(HeapOccupancyAfterCollectionSummary.class);\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0System.out.println(results.toString());\u00a0\r\n\u00a0\u00a0\u00a0 }\u00a0<br \/>}<\/pre>\n<p><span data-contrast=\"none\">The flow is to first create a DataSource. In this case the DataSource is a GCLogFIle and more specifically, all the data is contained in a single file. The next step is to create an instance of GCToolkit. This starts the process of constructing all the trussing needed to support the processing of the DataSource. Once we have an instance of GCToolkit, we can use it by calling the analyze method with the DataSource as a parameter. What is returned to us is a JavaVirtualMachine. This is our API which we can interrogate for the state and configuration of the JVM. In this case, we are asking for the Aggregation that is associated with the HeapOccupancyAfterCollectionSummary Aggregator. Finally, we can process the results. For this simple example, the results are printed to the terminal. But the data could be rendered as a graph, a table, or some other more human-friendly format.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">There is a little bit of magic here in that neither the Aggregator nor the HeapOccupancyAfterCollectionSummary Aggregation appears in the sample. These classes are provided to the API via Java\u2019s module system discovery services. Let us start by first looking at the implementations before moving on to understand how GCToolkit discovers and makes use of them.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"prettyprint\">@Aggregates({EventSource.G1GC,EventSource.GENERATIONAL,EventSource.ZGC})\u00a0\r\npublic class HeapOccupancyAfterCollection extends Aggregator&lt;HeapOccupancyAfterCollectionAggregation&gt; {\u00a0\r\n\r\n\u00a0\u00a0\u00a0 public HeapOccupancyAfterCollection(HeapOccupancyAfterCollectionAggregation aggregation) {\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 super(aggregation);\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 register(GenerationalGCPauseEvent.class, this::extractHeapOccupancy);\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0register(G1GCPauseEvent.class, this::extractHeapOccupancy);\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 register(ZGCCycle.class,this::extractHeapOccupancy);\u00a0\r\n\u00a0\u00a0\u00a0 }\u00a0\r\n\r\n\u00a0\u00a0\u00a0 private void extractHeapOccupancy(GenerationalGCPauseEvent event) {\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 aggregation()<br \/>                .addDataPoint(event.getGarbageCollectionType(), <br \/>                              event.getDateTimeStamp(), <br \/>                              event.getHeap().getOccupancyAfterCollection()); <br \/>\u00a0\u00a0\u00a0 } <br \/><br \/> \u00a0\u00a0 private void extractHeapOccupancy(G1GCPauseEvent event) {\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 aggregation()<br \/>                .addDataPoint(event.getGarbageCollectionType(), <br \/>                              event.getDateTimeStamp(), <br \/>                              event.getHeap().getOccupancyAfterCollection());\u00a0\r\n\u00a0\u00a0\u00a0\u00a0}\u00a0\r\n\r\n\u00a0\u00a0\u00a0 private void extractHeapOccupancy(ZGCCycle event) {\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 aggregation()<br \/>                .addDataPoint(event.getGarbageCollectionType(), <br \/>                              event.getDateTimeStamp(), <br \/>                              event.getLive().getReclaimEnd());\u00a0\r\n\u00a0\u00a0\u00a0 }\u00a0\r\n}<\/pre>\n<p><span data-contrast=\"none\">The listing above starts with the\u00a0@Aggregates annotation which indicates the event sources this aggregator will work with. As can be seen here, this aggregation is designed to work with G1GC, the older generational collectors,\u00a0and ZGC.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">In the constructor, the specific events that this Aggregator will work with are registered with the corresponding\u00a0consumer\u00a0method. All GC pause events report on heap occupancy before and after the collection phase.\u00a0Thus,\u00a0we can\u00a0register for\u00a0the super class instead of each individual event. Finally, the individual methods harvest the data of interest and then pass it along to\u00a0the\u00a0Aggregation, which\u00a0acts as a view on the incoming events. In this\u00a0case,\u00a0the Aggregation is a\u00a0HeapOccupancyAfterCollectionAggregation, which\u00a0is an interface that defines a single method,\u00a0addDataPoint.\u00a0Notice that the parameter to the\u00a0HeapOccupancyAfterCollection\u00a0 aggregator\u00a0constructor is an interface. This allows the Aggregator to populate an Aggregation that is specific to your particular use case.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">The following is an implementation of\u00a0HeapOccupancyAfterCollectionSummary.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"prettyprint\">@Collates(HeapOccupancyAfterCollection.class)\u00a0\r\npublic class HeapOccupancyAfterCollectionSummary implements HeapOccupancyAfterCollectionAggregation {\u00a0\r\n\r\n\u00a0\u00a0\u00a0 private HashMap&lt;GarbageCollectionTypes, XYDataSet&gt; aggregations = new HashMap&lt;&gt;();\u00a0\r\n\r\n\u00a0\u00a0\u00a0 public void addDataPoint(GarbageCollectionTypes gcType, DateTimeStamp timeStamp, long heapOccupancy) {\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0var dataSet = aggregations.computeIfAbsent(gcType, k -&gt; new XYDataSet());\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dataSet.add(timeStamp.getTimeStamp(),heapOccupancy);\u00a0\r\n\u00a0\u00a0\u00a0 }\u00a0\r\n\r\n\u00a0\u00a0\u00a0 public HashMap&lt;GarbageCollectionTypes, XYDataSet&gt; get() {\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 return aggregations;\u00a0\r\n\u00a0\u00a0\u00a0 }\u00a0\r\n}<\/pre>\n<p><span data-contrast=\"none\">The implementation starts with\u00a0the\u00a0@Collates annotation. This tells the API that this implementation is intended to work with\u00a0HeapOccupancyAfterCollection. The rest of the implementation collects the data in a form that is suitable for its intended use. For\u00a0example,\u00a0XYDataSet\u00a0is intended to support the rendering of an X-Y scatter plot.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">The final magic is the \u2018provides Aggregation with HeapOccupancyAfterCollectionSummary&#8217; in the sample\u2019s module-info. This makes the sample service provider. When GCToolkit is instantiated, it looks for any module that provides the Aggregation service. Thus, the Aggregator\/Aggregation are automatically loaded and used when called for by the GCToolkit analyze method. GCToolkit also provides API to programmatically register Aggregation classes if you choose not to use the service provider paradigm. <\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h3>Making it a Module\u00a0<\/h3>\n<p><span data-contrast=\"none\">Finally, the module-info.java provides the HeapOccupancyAfterCollectionSummary implementation for Aggregation.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"prettyprint\">module com.microsoft.gctoolkit.sample {\u00a0\r\n\u00a0\u00a0\u00a0 requires gctoolkit.api;\u00a0\r\n\u00a0\u00a0\u00a0 requires gctoolkit.vertx;\u00a0\r\n\u00a0\u00a0\u00a0 requires java.logging;\r\n\r\n\u00a0\u00a0\u00a0 exports com.microsoft.gctoolkit.sample.aggregation to gctoolkit.vertx;\r\n\r\n\u00a0\u00a0\u00a0 provides Aggregation with HeapOccupancyAfterCollectionSummary;\u00a0\r\n}\u00a0<\/pre>\n<p><span data-contrast=\"none\">As can be seen here, the sample module requires each of the 3 GCToolkit modules. The module exports the aggregation package to the gctoolkit.vertx module. The dependency is a work-around for a known bug that has been reported and is scheduled to be fixed.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Finally, let us get this app to run. A sample shell script is included with the project to demonstrate how to run the app from the command line. The command-line sets the paths to the modules using the -p parameter and the main class using the -m to specify the main. For this GC log, the output will look like this:<\/span><\/p>\n<blockquote>\n<pre class=\"prettyprint\">$ .\/sample.sh\u00a0\r\nCollected 3 different collection types.\u00a0<\/pre>\n<\/blockquote>\n<h3><span style=\"color: inherit; font-family: inherit; font-size: 1.75rem;\">Contribute!<\/span><\/h3>\n<p><span data-contrast=\"auto\">If you\u2019re interested in contributing, or you just want to follow\u00a0along, do join us\u00a0at\u00a0\u00a0<\/span><a href=\"https:\/\/github.com\/microsoft\/gctoolkit\/discussions\"><span data-contrast=\"none\">github.com\/microsoft\/gctoolkit\/discussions<\/span><\/a><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><em>Microsoft Java Engineering Group\u2019s Tooling Team\u00a0<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft\u2019s Java Engineering Group is excited to announce we have open-sourced the Microsoft GCToolkit on GitHub. GCToolkit is a set of libraries for analyzing Java garbage collection (GC) log files. The toolkit parses GC log files into discrete events and provides an API for aggregating data from those events. This allows the user to create [&hellip;]<\/p>\n","protected":false},"author":9458,"featured_media":227205,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,8],"tags":[],"class_list":["post-228296","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-java","category-open-source"],"acf":[],"blog_post_summary":"<p>Microsoft\u2019s Java Engineering Group is excited to announce we have open-sourced the Microsoft GCToolkit on GitHub. GCToolkit is a set of libraries for analyzing Java garbage collection (GC) log files. The toolkit parses GC log files into discrete events and provides an API for aggregating data from those events. This allows the user to create [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/posts\/228296","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/users\/9458"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/comments?post=228296"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/posts\/228296\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/media\/227205"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/media?parent=228296"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/categories?post=228296"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/java\/wp-json\/wp\/v2\/tags?post=228296"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}