{"id":1503,"date":"2013-05-27T23:59:00","date_gmt":"2013-05-27T23:59:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2013\/05\/27\/profile-guided-optimization-pgo-under-the-hood\/"},"modified":"2021-09-29T14:26:39","modified_gmt":"2021-09-29T14:26:39","slug":"profile-guided-optimization-pgo-under-the-hood","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/profile-guided-optimization-pgo-under-the-hood\/","title":{"rendered":"Profile Guided Optimization (PGO) \u2013 Under the Hood"},"content":{"rendered":"<p><span style=\"font-size: small;\"><span style=\"font-family: times new roman,times;\">To introduce myself I am <a title=\"mailto:aasthan@microsoft.com\" href=\"http:\/\/blogs.msdn.com\/controlpanel\/blogs\/posteditor.aspx\/Ankit Asthana\">Ankit Asthana<\/a> and I am the program manager for the backend C++ compiler. In my <a href=\"http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2013\/04\/08\/profile-guided-optimization-pgo.aspx\">last few blogs<\/a> I provided an introduction to what PGO is all about along with case studies which covered how <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/vstudio\/e7k32f4k.aspx\">Profile Guided Optimization (PGO)<\/a> is used to make real world applications such as <a href=\"http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2013\/05\/08\/speeding-up-php-performance-for-your-application-using-profile-guided-optimization-pgo.aspx\">SAP NetWeaver<\/a> and <a href=\"http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2013\/05\/08\/speeding-up-php-performance-for-your-application-using-profile-guided-optimization-pgo.aspx\">Windows PHP faster<\/a>. <\/span><span style=\"font-family: times new roman,times;\">In this blog I would like to talk about how PGO works under the hood and helps produce faster code. So let us take a look at how PGO helps the compiler backend in building an optimized version of an application. Once we know how PGO optimizations work, we can understand how PGO makes applications faster. So let&#8217;s get started!<\/span><\/span><\/p>\n<h2>How does PGO help build leaner and faster native applications?<\/h2>\n<p><span style=\"font-family: Times New Roman; font-size: small;\">Carrying on from my<a href=\"http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2013\/04\/04\/how-to-build-faster-and-high-performing-native-applications-using-pgo.aspx\"> first<\/a> blog let us take a look at the following code snippets again:<\/span><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2766.052813_0703_ProfileGuid1.png\"><img decoding=\"async\" class=\"size-full wp-image-28748 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2766.052813_0703_ProfileGuid1.png\" alt=\"Image 2766 052813 0703 ProfileGuid1\" width=\"447\" height=\"338\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2766.052813_0703_ProfileGuid1.png 447w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2766.052813_0703_ProfileGuid1-300x227.png 300w\" sizes=\"(max-width: 447px) 100vw, 447px\" \/><\/a><\/p>\n<p><span style=\"font-family: Times New Roman; font-size: small;\">To recall, PGO helps optimize the application by leveraging profile data collected from running performance centric user scenarios. PGO&#8217;izing an application requires three basic steps <strong>(Instrument, Train and Optimize)<\/strong>. Instrumenting an application means building with special compiler\/link flags which insert probes into the generated code. These probes then when hit during the training phase collect data providing information on which branch was taken in the <strong>whichBranchIsTaken<\/strong> code snippet or what is the usual value of <strong>*p in the devirtualization<\/strong> code snippet as an example. This data collected by probes in the training phase is dumped into a database file <strong>*.pgd<\/strong> which is then supplied to the compiler as an input for the optimize phase of PGO. <\/span><\/p>\n<p><span style=\"font-family: Times New Roman; font-size: small;\">During the optimize phase of PGO, the data collected in the database file is used as an input for a list of optimizations (table 1, below). Although the database file is used as an input for many optimizations, inlining and layout based PGO optimizations provide majority of the performance gains so let us take a look at how the training data collected helps in better inlining and layout based decisions.<\/span><\/p>\n<table style=\"border-collapse: collapse;\" border=\"0\">\n<colgroup>\n<col style=\"width: 208px;\" \/>\n<col style=\"width: 208px;\" \/>\n<col style=\"width: 208px;\" \/><\/colgroup>\n<tbody valign=\"top\">\n<tr style=\"background: white;\">\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-bottom: solid #c9c9c9 1.5pt; border-right: none;\"><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 1.5pt; border-right: none;\"><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 1.5pt;\"><\/td>\n<\/tr>\n<tr style=\"height: 25px; background: #ededed;\">\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-bottom: solid #c9c9c9 0.25pt; border-right: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Full and Partial Inlining<\/span><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 0.25pt; border-right: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Function Layout<\/span><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Speed and Size decision<\/span><\/td>\n<\/tr>\n<tr style=\"height: 27px;\">\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-bottom: solid #c9c9c9 0.25pt; border-right: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Basic Block Layout<\/span><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 0.25pt; border-right: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Code Separation<\/span><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Virtual Call Expansion<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px; background: #ededed;\">\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-bottom: solid #c9c9c9 0.25pt; border-right: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Switch Expansion<\/span><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 0.25pt; border-right: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Data Separation<\/span><\/td>\n<td style=\"padding-left: 7px; padding-right: 7px; border-top: none; border-left: none; border-bottom: solid #c9c9c9 0.25pt;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">Loop Unrolling<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"margin-left: 72pt;\"><strong>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Table 1:<\/strong>\u00a0Some optimizations gaining from PGO<\/p>\n<p><span style=\"font-family: times new roman,times; font-size: small;\">Inlining decisions for PGO are based upon call graph path profiling. Simply put, the basic mantra behind using call graph path profiling for inlining decisions is to understand the behavior of a function being called from a specific call path. This is important as the behavior of a function call from one-call path may be drastically different from another call path. Having information on which call path is hotter helps in better inlining decisions as the optimizer only inlines frequent call paths hence minimizing the code bloat due to inlining but still gaining performance by inlining hotter call paths. Take a look at the example given below:<\/span><\/p>\n<p><span style=\"font-size: 8pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4403.052813_0703_ProfileGuid2.png\"><img decoding=\"async\" class=\"size-full wp-image-28750 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4403.052813_0703_ProfileGuid2.png\" alt=\"Image 4403 052813 0703 ProfileGuid2\" width=\"328\" height=\"189\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4403.052813_0703_ProfileGuid2.png 328w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4403.052813_0703_ProfileGuid2-300x173.png 300w\" sizes=\"(max-width: 328px) 100vw, 328px\" \/><\/a>\n<\/span><span style=\"font-size: 8pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Figure 1a: Sequence of calls to function &#8216;bar&#8217; from &#8216;goo&#8217;, &#8216;foo&#8217; and &#8216;bat&#8217; <\/span><\/p>\n<p><span style=\"font-size: 10pt;\"><span style=\"font-family: times new roman,times; font-size: small;\">Example depicted in figure 1a above illustrates function calls being made to function &#8216;bar&#8217; from function &#8216;goo&#8217;, &#8216;foo&#8217; and &#8216;bat&#8217;. The number on the edge denotes how many times function &#8216;bar&#8217; is called from function &#8216;goo&#8217;, &#8216;foo&#8217; and &#8216;bat&#8217; respectively. So the edge from function &#8216;goo&#8217; to function &#8216;bar&#8217; denotes that function &#8216;bar&#8217; was called 10 times from function &#8216;goo&#8217; for a given PGO training session. Now call graph path profiling is all about finding out the behavior of a function call from a specific call path. So figure 1a will be further broken down into figure 1b.<\/span>\n<\/span><\/p>\n<p><span style=\"font-family: Times New Roman; font-size: 8pt;\">\u00a0\u00a0\u00a0<a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/5482.052813_0703_ProfileGuid3.png\"><img decoding=\"async\" class=\"aligncenter wp-image-28751 size-full\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/5482.052813_0703_ProfileGuid3.png\" alt=\"Image 5482 052813 0703 ProfileGuid3\" width=\"310\" height=\"132\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/5482.052813_0703_ProfileGuid3.png 310w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/5482.052813_0703_ProfileGuid3-300x128.png 300w\" sizes=\"(max-width: 310px) 100vw, 310px\" \/><\/a><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: Times New Roman; font-size: 8pt;\">Figure 1b: Sequence of calls to function &#8216;bar&#8217; from &#8216;goo&#8217;, &#8216;foo&#8217; and &#8216;bat&#8217;\nSequence of calls to function &#8216;baz&#8217; from &#8216;bar&#8217;<\/span><\/p>\n<p><span style=\"font-family: Times New Roman; font-size: small;\">Analyzing figure 1b it becomes obvious that the major benefit will come from inlining function &#8216;bar&#8217; into &#8216;bat&#8217; given the higher frequency &#8216;i.e. 100&#8217; of calls made from &#8216;bat&#8217; to &#8216;bar&#8217;. Additionally it also is somewhat obvious that the other major advantage will come from inlining &#8216;baz&#8217; into &#8216;bar&#8217; given the high frequency of calls made to function &#8216;baz&#8217; from call paths &#8216;goo-bar&#8217; and &#8216;foo-bar&#8217;. The impact of pgo-inlining for the above scenario is shown below in figure 1c. <\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: Times New Roman; font-size: 10pt;\">. <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2287.052813_0703_ProfileGuid4.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-28756\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2287.052813_0703_ProfileGuid4.png\" alt=\"Image 2287 052813 0703 ProfileGuid4\" width=\"406\" height=\"165\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2287.052813_0703_ProfileGuid4.png 406w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2287.052813_0703_ProfileGuid4-300x122.png 300w\" sizes=\"(max-width: 406px) 100vw, 406px\" \/><\/a>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0<\/span><span style=\"font-family: Times New Roman; font-size: 8pt;\">\nFigure 1c: Impact of PGO inlining, &#8216;bar&#8217; is inlined into &#8216;bat&#8217; and &#8216;baz&#8217; is inlined into &#8216;bar&#8217;. <\/span><\/p>\n<p><span style=\"font-family: Times New Roman; font-size: 10pt;\">Inlining decisions are made before layout, speed vs size decisions and all other optimizations. Now from my <a href=\"http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2013\/04\/08\/profile-guided-optimization-pgo.aspx\">last blog<\/a>, let us recall the output (figure 2) of an optimized PGO build.<\/span><span style=\"font-family: 'times new roman', times; font-size: small;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0<\/span><\/p>\n<p><span style=\"font-family: times new roman,times; font-size: small;\"><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/7266.pgo2_.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-28753\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/7266.pgo2_.jpg\" alt=\"Image 7266 pgo2\" width=\"1223\" height=\"329\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/7266.pgo2_.jpg 1223w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/7266.pgo2_-300x81.jpg 300w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/7266.pgo2_-1024x275.jpg 1024w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/7266.pgo2_-768x207.jpg 768w\" sizes=\"(max-width: 1223px) 100vw, 1223px\" \/><\/a><\/span><\/p>\n<p><span style=\"font-family: times new roman,times; font-size: small;\">\u00a0&#8216;Speed vs. Size&#8217; decisions are based on post-inliner dynamic instruction count. Code segments (i.e. functions) with higher dynamic instruction count are optimized for speed whereas code segments with lower dynamic instruction counts are optimized for size. In the build output shown in figure 2 above <span style=\"color: #1e1e1e;\"><span style=\"background-color: white;\"><strong>6 of 3619 (0.17%) profiled functions will be compiled for speed, and the rest of the functions will be compiled for size <\/strong>is a result of this decision ma<\/span>king process. The speed and size decisions are followed by <strong>&#8216;Block Layout&#8217;<\/strong> and <strong>&#8216;Live and Scenario dead code separation&#8217;<\/strong> optimization. Basic blocks are ordered so that most frequent paths fall through (figure 3). Live and Scenario dead code separation is performed to minimize working set and improve code locality. Code (functions\/blocks) which are scenario dead (i.e. not exercised in the training scenario) are placed in a special section (figure 4). <\/span><\/span><\/p>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1651.052813_0703_ProfileGuid7.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-28754\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1651.052813_0703_ProfileGuid7.png\" alt=\"Image 1651 052813 0703 ProfileGuid7\" width=\"668\" height=\"322\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1651.052813_0703_ProfileGuid7.png 668w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1651.052813_0703_ProfileGuid7-300x145.png 300w\" sizes=\"(max-width: 668px) 100vw, 668px\" \/><\/a><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1541.052813_0703_ProfileGuid8.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-28758\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1541.052813_0703_ProfileGuid8.png\" alt=\"Image 1541 052813 0703 ProfileGuid8\" width=\"709\" height=\"26\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1541.052813_0703_ProfileGuid8.png 709w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/1541.052813_0703_ProfileGuid8-300x11.png 300w\" sizes=\"(max-width: 709px) 100vw, 709px\" \/><\/a><\/p>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2146.052813_0703_ProfileGuid9.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-28755\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2146.052813_0703_ProfileGuid9.png\" alt=\"Image 2146 052813 0703 ProfileGuid9\" width=\"620\" height=\"324\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2146.052813_0703_ProfileGuid9.png 620w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/2146.052813_0703_ProfileGuid9-300x157.png 300w\" sizes=\"(max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p><span style=\"color: #1e1e1e; font-family: Times New Roman; font-size: 10pt;\"><span style=\"font-size: small;\"><span style=\"background-color: white;\">Finally, based on the post-inliner and post-code-separation, call graph profile data function layout is performed. Only functions in the live sections are laid out. Dead blocks are not included. The overall strategy behind function layout is that functions which are strongly connected (calling each other with high frequency) are put together. A call is supposed to have achieve page locality if the callee is located in the same page. Take a look at the example in figure<\/span> 6 below:<\/span>\n<\/span><\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/6862.052813_0703_ProfileGuid10.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-28752\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/6862.052813_0703_ProfileGuid10.png\" alt=\"Image 6862 052813 0703 ProfileGuid10\" width=\"625\" height=\"199\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/6862.052813_0703_ProfileGuid10.png 625w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/6862.052813_0703_ProfileGuid10-300x96.png 300w\" sizes=\"(max-width: 625px) 100vw, 625px\" \/><\/a>\n<strong>Figure 6:<\/strong> Function layout based upon call graph and profile data<\/p>\n<p><span style=\"font-family: times new roman,times; font-size: small;\">Some other optimizations that are performed based upon the training data are switch case expansion and virtual call speculation. The switch case expansion optimization uses the most common value of the switch expression that PGO collects and pulls it out of the switch construct. Take a look at how the code snippet presented at the start of this blog will be optimized using the PGO data when switch case expansion optimization is performed.\n<\/span><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4353.switch.png\"><img decoding=\"async\" class=\"size-full wp-image-28749 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4353.switch.png\" alt=\"Image 4353 switch\" width=\"1069\" height=\"414\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4353.switch.png 1069w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4353.switch-300x116.png 300w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4353.switch-1024x397.png 1024w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/05\/4353.switch-768x297.png 768w\" sizes=\"(max-width: 1069px) 100vw, 1069px\" \/><\/a><\/p>\n<p>Similarly, if a virtual call, <span style=\"font-family: 'Times New Roman','serif'; font-size: 10pt;\">or other call through a function pointer, frequently targets a certain function, profile-guided optimization can insert a conditionally-executed direct call to the frequently-targeted function, and the direct call can be inlined.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2>Wrap up<\/h2>\n<p>Although Profile Guided Optimization (PGO) is a complex technology, this blog should provide you folks an idea on the usefulness of profile guided optimization and how PGO works under the hood to make plethora of products more performant. In my future blogs I will try to cover a best practices guide which covers common pitfalls and highlights some tips and tricks for PGO users. So stay tuned! Additionally, if you would like us to blog about some other PGO-related scenarios please let us know.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To introduce myself I am Ankit Asthana and I am the program manager for the backend C++ compiler. In my last few blogs I provided an introduction to what PGO is all about along with case studies which covered how Profile Guided Optimization (PGO) is used to make real world applications such as SAP NetWeaver [&hellip;]<\/p>\n","protected":false},"author":265,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1503","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus"],"acf":[],"blog_post_summary":"<p>To introduce myself I am Ankit Asthana and I am the program manager for the backend C++ compiler. In my last few blogs I provided an introduction to what PGO is all about along with case studies which covered how Profile Guided Optimization (PGO) is used to make real world applications such as SAP NetWeaver [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/1503","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/265"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=1503"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/1503\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=1503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=1503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=1503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}