{"id":1473,"date":"2013-06-12T13:00:00","date_gmt":"2013-06-12T13:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2013\/06\/12\/optimizing-c-code-overview\/"},"modified":"2024-09-10T07:57:54","modified_gmt":"2024-09-10T07:57:54","slug":"optimizing-c-code-overview","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/optimizing-c-code-overview\/","title":{"rendered":"Optimizing C++ Code : Overview"},"content":{"rendered":"<p>If you have arrived in the middle of this blog series, you might want instead to begin at the <a href=\"http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2013\/05\/29\/optimizing-c-code.aspx\">beginning<\/a>.<\/p>\n<p>This post explains the flow of data within the Visual C++ compiler &ndash; starting with our C++ source program, and ending with a corresponding binary program. This post is&nbsp;an easy one &ndash; dipping our toes into the shallow end of ocean.<\/p>\n<p>Let&#8217;s examine what happens when we compile a single-file program, stored in <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">App.cpp<\/span>, from the command-line. (If you were to launch the compilation from within Visual Studio, the diagram below would have to include higher layers of software; however, these end up emitting the very commands I&#8217;m about to describe).<\/p>\n<p>So let&#8217;s imagine we just typed: <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CL \/O2 App.cpp<\/span><\/p>\n<p>The <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CL<\/span> stands from &#8220;Compile and Link&#8221;. <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">\/O2<\/span> tells the compiler to optimize for speed &ndash; to generate machine code that runs as fast as possible. This command launches a process to execute the <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CL.EXE<\/span> program &ndash; a &#8220;driver&#8221; that calls several pieces of software: when joined together, they process the text in our <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">App.cpp<\/span> and eventually generate a binary file, called <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">App.exe<\/span>. When executed, this binary will carry out the operations we specified in our original source file.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/06\/1727.061213_1742_OptimizingC1.png\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/06\/1727.061213_1742_OptimizingC1.png\" alt=\"Image 1727 061213 1742 OptimizingC1\" width=\"604\" height=\"369\" class=\"aligncenter size-full wp-image-28883\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/06\/1727.061213_1742_OptimizingC1.png 604w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2013\/06\/1727.061213_1742_OptimizingC1-300x183.png 300w\" sizes=\"(max-width: 604px) 100vw, 604px\" \/><\/a><\/p>\n<p>Let&#8217;s take a stroll around the diagram above and explain what&#8217;s going on.<\/p>\n<p><span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CL.EXE<\/span> parses our command-line, and checks that it makes sense. It then calls the C++ &#8220;front-end&#8221;, located in <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">C1XX.DLL<\/span> (the &#8220;CXX&#8221; is meant to suggest &#8220;C++&#8221;, but dates back to a time when &#8220;+&#8221; was not a legal character in file names). The <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">FrontEnd<\/span> is the part of the chain that <em>understands <\/em>the C++ language. It scans, parses and transforms your <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">App.cpp<\/span> into an equivalent tree, passed to the next component via 5 temporary files. The &#8220;language&#8221; represented by these 5 files is called &#8220;CIL&#8221;, for &#8220;C Intermediate Language&#8221;. Don&#8217;t confuse this with the intermediate code, generated by managed languages such as C#, sometimes called &#8220;MSIL&#8221; but also, unfortunately, named &#8220;CIL&#8221; in the <a href=\"http:\/\/www.ecma-international.org\/publications\/standards\/Ecma-335.htm\">ECMA-335<\/a> standard.<\/p>\n<p>Next, <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CL.EXE<\/span> calls the &#8220;back-end&#8221;, located in <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">C2.DLL<\/span>. We call the BackEnd &#8220;UTC&#8221;, which stands for &#8220;Universal Tuple Compiler&#8221;, although this name doesn&#8217;t show up in any of the binaries included into Visual Studio. The BackEnd starts by converting the information from the FrontEnd into tuples &ndash; a binary stream of instructions. If we were to display them, they would look like a kind of high-level assembly language. It&#8217;s high-level in the senses that:<\/p>\n<ul style=\"margin-left: 38pt\">\n<li>Operations are generic. For example, a <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">BRANCH(LE)<\/span> instruction versus how it will eventually be translated, into a <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">cmp<\/span> instruction in x64 machine code<\/li>\n<li>Operands are symbolic. For example, <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">t66<\/span> denoting a temporary variable generated by the compiler, versus <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">eax<\/span>, the x64 register that will hold its value at runtime<\/li>\n<\/ul>\n<p>Because we asked the compiler to optimize for speed, via the <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">\/O2<\/span> switch, the Optimize part of the BackEnd analyses the tuples and transforms them into a form that will execute faster but that is semantically equivalent &ndash; that yields the same results as would the original tuples. After this step, the tuples are passed to the CodeGen part of the BackEnd, which decides the final binary code to emit.<\/p>\n<p>The CodeGen module emits <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">App.obj<\/span> as a file on disk. Finally, the Linker consumes that file, resolves any references to libraries, and produces the final <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">App.exe<\/span> binary.<\/p>\n<p>In the diagram, black arrows show flow of data &ndash; text or binary. <span style=\"color: red\">Red<\/span> arrows show flow of control.<\/p>\n<p>(We shall return to this diagram later in the series, when we hit the topic of Whole Program Optimization, specified to the compiler with the <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">\/GL<\/span> switch, and to the Linker with the <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">\/LTCG<\/span> switch. We will see the same set of boxes, but connected in different ways)<\/p>\n<p>In summary:<\/p>\n<ul>\n<li>The <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">FrontEnd<\/span> <em>understands<\/em> C++ source code. Other parts of the chain &ndash; BackEnd and Linker &ndash; are (mostly) isolated from the details of the original source language. They work upon the tuples, mentioned above, that form a kind of high-level, binary assembly language. The original source program might have been any imperative language, such as FORTRAN or Pascal. The BackEnd really doesn&#8217;t care. Well, not much.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<ul>\n<li>The <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">Optimize<\/span> part of the BackEnd transforms tuples into more efficient forms that will run faster. Such transformations, we call optimizations. (We should really call them &#8220;improvements&#8221;, because there may be <em>other<\/em> improvements that produce even faster-running code &ndash; we just aim to get close to that ideal. However, someone coined the term &#8220;optimization&#8221; a few decades ago, and we&#8217;re stuck with it) There are dozens of such optimizations, with names like &#8220;constant folding&#8221;, &#8220;common sub-expression elimination&#8221;, &#8220;hoisting&#8221;, &#8220;invariant code motion&#8221;, &#8220;dead code elimination&#8221;, &#8220;function inlining&#8221;, &#8220;auto vectorization&#8221;, etc. For the most part, these optimizations are independent of the eventual processor on which the program will execute &ndash; they are &#8220;machine-independent&#8221; optimizations.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<ul>\n<li>The <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CodeGen<\/span> part of the Backend decides how to lay out the runtime stack (used to implement function &#8220;activation frames&#8221;); how to make best use of the available machine registers; adds details that implement function calling conventions; and transforms the code to make it run even faster, using its detailed knowledge of the target machine. (As one tiny example, if you ever look at assembly code &ndash; for example, using the Disassembly window in Visual Studio (Alt+8) whilst debugging code &ndash; you may notice instructions like <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">xor eax, eax<\/span> used to set EAX to zero, in preference to the more obvious <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">move eax,0<\/span>. Why? Because the XOR form is smaller (2 bytes instead of 5) and executes faster. We might well call this a &#8220;micro-optimization&#8221;, and wonder whether it&#8217;s worth the bother. Recall the proverb: &#8220;Look after the pennies and the pounds will look after themselves&#8221;). In contrast to the <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">Optimize<\/span> part, <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CodeGen<\/span> is aware of the processor architecture on which the code is going to run.&nbsp; In some cases, it will even change the order in which machine instructions are laid out &ndash; a process called &#8220;scheduling&#8221; &ndash; based upon its understanding of the target processor.&nbsp;&nbsp; Oh, I&#8217;d better explain that a bit more: <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CodeGen<\/span> <em>knows<\/em> whether it is targeting x86 or x64 or ARM-32; but it&#8217;s rare that is can&nbsp;<em>know<\/em> the specific micro-architecture the code will run on &#8211; Nehalem versus Sandy Bridge, for example. (see the \/favor:ATOM switch for a case where it knows more detail)<\/li>\n<\/ul>\n<p>This blog will focus on the <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">Optimize<\/span> part of the compiler, with scant mention of the other components that make up the chain &ndash; <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">FrontEnd<\/span>, <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">CodeGen<\/span> or <span style=\"font-family: Consolas;font-size: 9pt;background-color: #f2f2f2\">Linker<\/span>.<\/p>\n<p>This post has introduced lots of jargon. I&#8217;m not intending that you make total sense of it all at this point: this post is an overview, sprinkled with ideas that hopefully pique your interest, and ensure you&#8217;ll come back next time, where I will start to explain all of that jargon.<\/p>\n<p>Next time, we will look at perhaps the simplest optimization and how it works &ndash; the one called &#8220;constant folding&#8221;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you have arrived in the middle of this blog series, you might want instead to begin at the beginning. This post explains the flow of data within the Visual C++ compiler &ndash; starting with our C++ source program, and ending with a corresponding binary program. This post is&nbsp;an easy one &ndash; dipping our toes [&hellip;]<\/p>\n","protected":false},"author":271,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3946,1],"tags":[66,36],"class_list":["post-1473","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-backend","category-cplusplus","tag-performance","tag-vc"],"acf":[],"blog_post_summary":"<p>If you have arrived in the middle of this blog series, you might want instead to begin at the beginning. This post explains the flow of data within the Visual C++ compiler &ndash; starting with our C++ source program, and ending with a corresponding binary program. This post is&nbsp;an easy one &ndash; dipping our toes [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/1473","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/271"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=1473"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/1473\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=1473"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=1473"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=1473"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}