{"id":1363,"date":"2013-07-04T13:00:00","date_gmt":"2013-07-04T13:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2013\/07\/04\/optimizing-c-code-constant-folding\/"},"modified":"2024-09-10T07:57:49","modified_gmt":"2024-09-10T07:57:49","slug":"optimizing-c-code-constant-folding","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/optimizing-c-code-constant-folding\/","title":{"rendered":"Optimizing C++ Code : Constant-Folding"},"content":{"rendered":"<p><span style=\"font-family: Calibri;font-size: small\">If you have arrived in the middle of this blog series, you might want instead to begin at the <\/span><a href=\"http:\/\/blogs.msdn.com\/b\/vcblog\/archive\/2013\/05\/29\/optimizing-c-code.aspx\"><span style=\"color: #0563c1;font-family: Calibri;font-size: small\">beginning<\/span><\/a><span style=\"font-size: small\"><span style=\"font-family: Calibri\">.<\/span><\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Calibri\">This post examines Constant-Folding &ndash; one of the simplest optimizations performed by the VC++ compiler.&nbsp; In this optimization, the compiler works out the result of an expression while it is compiling (at &ldquo;compile-time&rdquo;), and inserts the answer directly into the generated code.&nbsp; This avoids the cost of performing those same calculations when the program runs (at &ldquo;runtime&rdquo;).<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">Here is an example.&nbsp; A tiny <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">main<\/span><span style=\"font-family: Calibri;font-size: small\"> function, stored in the file <\/span><span style=\"font-family: Consolas\">App.cpp<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\">:<\/span><\/span><\/p>\n<p class=\"Code\" style=\"padding-left: 30px\"><span style=\"font-family: Consolas;background-color: #ccffcc\">int main() { return 7 + 8; }<\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Calibri\">First, some <em>admin<\/em> about this blog:&nbsp; <\/span><\/span><\/p>\n<ul>\n<li><span style=\"font-size: small\"><span style=\"font-family: Calibri\">We will build programs from the command-line (rather than from within Visual Studio)<\/span><\/span><\/li>\n<li><span style=\"font-size: small\"><span style=\"font-family: Calibri\">We will use Visual Studio 2012.&nbsp; In particular, the version of the compiler that generates x64 code (rather than code for the aging x86 architecture), and that compiles on an x64 computer (i.e., &ldquo;x64 on x64&rdquo;)<\/span><\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: Calibri;font-size: small\">If you want to follow along, please follow the instructions <\/span><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/x4d2c09s.aspx\"><span style=\"color: #0563c1;font-family: Calibri;font-size: small\">found here<\/span><\/a><span style=\"font-size: small\"><span style=\"font-family: Calibri\">:&nbsp; essentially, you just need to choose the right variant from a list of possible &ldquo;Visual Studio Tools&rdquo;.<\/span><\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Calibri\">(Note that if you are using the free compiler from Visual Studio Express, it runs only on x86, but will happily generate code for x64: &ldquo;x64 on x86&rdquo;.&nbsp; This is equally good for experiments)<\/span><\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Calibri\">We can build our sample program with the command:&nbsp; <\/span><\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">CL \/FA App.cpp<\/span><span style=\"font-family: Calibri;font-size: small\">.&nbsp; The <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">\/FA<\/span><span style=\"font-family: Calibri;font-size: small\"> switch creates an output file holding the assembly code that the compiler generates for us.&nbsp; You can display it with: <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">type App.asm<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\"> to show:<\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: courier new,courier;font-size: small\">PUBLIC&nbsp; main<br \/><\/span><span style=\"font-family: courier new,courier;font-size: small\">_TEXT&nbsp;&nbsp; SEGMENT<br \/><\/span><span style=\"font-family: courier new,courier;font-size: small\">main&nbsp;&nbsp;&nbsp; PROC<br \/><\/span><span style=\"font-family: courier new,courier;font-size: small\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mov&nbsp;&nbsp;&nbsp;&nbsp; eax, 15<br \/><\/span><span style=\"font-family: courier new,courier;font-size: small\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret&nbsp;&nbsp;&nbsp;&nbsp; 0<br \/><\/span><span style=\"font-family: courier new,courier;font-size: small\">main&nbsp;&nbsp;&nbsp; ENDP<br \/><\/span><span style=\"font-family: courier new,courier;font-size: small\">_TEXT&nbsp;&nbsp; ENDS<br \/><\/span><span style=\"font-family: courier new,courier;font-size: small\">END<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">The point of interest is the <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">mov eax, 15<\/span><span style=\"font-family: Calibri;font-size: small\"> instruction &ndash; it just insert the value 15 into the EAX register (which, by definition of the x64 calling standard, is the way that an x64 function sets the <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">int<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\"> value it will return, as the function&rsquo;s result, to its caller).&nbsp; The compiler does <em>not <\/em>emit instructions that would add 7 to 8 at runtime.&nbsp; They would have gone something like:<\/span><\/span><\/p>\n<p class=\"Code\" style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">PUBLIC&nbsp; main<br \/><\/span><span style=\"font-family: Consolas\">_TEXT&nbsp;&nbsp; SEGMENT<br \/><\/span><span style=\"font-family: Consolas\">main&nbsp;&nbsp;&nbsp; PROC<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mov&nbsp;&nbsp;&nbsp;&nbsp; eax, 7<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; add&nbsp;&nbsp;&nbsp;&nbsp; eax, 8<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret&nbsp;&nbsp;&nbsp;&nbsp; 0<br \/><\/span><span style=\"font-family: Consolas\">main&nbsp;&nbsp;&nbsp; ENDP<br \/><\/span><span style=\"font-family: Consolas\">_TEXT&nbsp;&nbsp; ENDS<br \/><\/span><span style=\"font-family: Consolas\">END<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">(Note carefully, the last instruction, in both snippets, <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">ret 0<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\">.&nbsp; This means: return control to the caller, and pop zero bytes from the stack.&nbsp; Do not be misled into thinking this means return the value 0 to the caller!)<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">Let me guess: you are probably thinking &ldquo;this is all very well, but really!&nbsp; What kind of idiot would ever write code that includes arithmetic like <\/span><span style=\"font-family: Consolas\">7 + 8<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\">&rdquo;?&nbsp; Of course, you are right.&nbsp; But the compiler <em>does <\/em>see such constructs, often as a side-effect of macros.&nbsp; Here is an example to persuade you that Constant-Folding is a worthwhile optimization:<\/span><\/span><\/p>\n<p class=\"Code\" style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">#define SECS_PER_MINUTE&nbsp; 60<br \/><\/span><span style=\"font-family: Consolas\">#define MINUTES_PER_HOUR 60<br \/><\/span><span style=\"font-family: Consolas\">#define HOURS_PER_DAY&nbsp;&nbsp;&nbsp; 24<br \/><\/span><span style=\"font-family: Consolas\"><br \/>enum Event { Started, Stopped, LostData, ParityError };<\/span><\/span><\/p>\n<p class=\"Code\" style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">struct {<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp; int&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clock_time;<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp; enum Event ev;<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp; char*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; reason;<br \/><\/span><span style=\"font-family: Consolas\">}&nbsp;&nbsp; Record;<\/span><\/span><\/p>\n<p class=\"Code\" style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">int main() {<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp; const int table_size = SECS_PER_MINUTE * MINUTES_PER_HOUR * HOURS_PER_DAY * sizeof Record;<br \/><\/span><span style=\"font-family: Consolas\">&nbsp;&nbsp;&nbsp; \/\/ rest of program<br \/><\/span><span style=\"font-family: Consolas\">}<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">Here we are going to create a table big enough to hold a <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">Record<\/span><span style=\"font-family: Calibri;font-size: small\"> for each second of an entire day.&nbsp; So <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">table_size<\/span><span style=\"font-family: Calibri;font-size: small\"> would be the size, in bytes, of that table.&nbsp; It&rsquo;s easy to check the assembly instruction that sets the variable <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">table_size<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\"> :<\/span><\/span><\/p>\n<p class=\"Code\"><span><span style=\"font-family: Consolas\"><span><span><span><span><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><span style=\"background-color: #ccffcc\">mov<\/span><\/span>&nbsp;&nbsp;&nbsp;&nbsp; <\/span><span style=\"background-color: #ccffcc\">DWORD PTR table_size$[rsp], 1382400<\/span><\/span>&nbsp;&nbsp;&nbsp;&nbsp; <\/span><span style=\"background-color: #ccffcc\">; 00151800H<\/span><\/span><\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Calibri\">No multiply instructions here! &ndash; it&rsquo;s all calculated at compile-time: 60 * 60 * 24 * 16 = 1382400.<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">In fact, if we could peek inside the compiler, we would find that this level of Constant-Folding is so simple, it&rsquo;s performed by the FrontEnd.&nbsp; It does not require the <em>hea\nvy lifting power<\/em> of the BackEnd Optimizer. &nbsp;And therefore, it&rsquo;s always <em>on<\/em>.&nbsp; It makes no difference whether you request optimization (with <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">\/O2<\/span><span style=\"font-family: Calibri;font-size: small\">) or disable optimization (with <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">\/Od<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\">) &ndash; this optimization always cuts in.<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">How complicated can the expression be, and yet we still fold those constants together at compile-time?&nbsp; In fact, the FrontEnd will cope with pretty much any arithmetic expression involving constants (even values such as <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">sizeof<\/span><span style=\"font-family: Calibri;font-size: small\">, as above, so long as they can be evaluated at compile-time) and operators <\/span><span style=\"font-family: Consolas;background-color: #ffffff\"><span style=\"background-color: #ccffcc\">+<\/span> <span style=\"background-color: #ccffcc\">&#8211;<\/span> <span style=\"background-color: #ccffcc\">*<\/span> <span style=\"background-color: #ccffcc\">\/<\/span> <span style=\"background-color: #ccffcc\">%<\/span> <span style=\"background-color: #ccffcc\">&lt;&lt;<\/span> <span style=\"background-color: #ccffcc\">&gt;&gt;<\/span> <span style=\"background-color: #ccffcc\">++<\/span><\/span><span style=\"font-family: Calibri;font-size: small\"> and <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">&#8212;<\/span><span style=\"font-family: Calibri;font-size: small\">.&nbsp; You can even throw in <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">bool<\/span><span style=\"font-family: Calibri;font-size: small\">s, logical operators, <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">if<\/span><span style=\"font-family: Calibri;font-size: small\">s and <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">?:<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\"> &nbsp;<\/span><\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Calibri\">Are there any cases of Constant-Folding, then, that <em>do <\/em>require the power of the BackEnd Optimizer?&nbsp; Yes.&nbsp; Consider this example:<\/span><\/span><\/p>\n<p class=\"Code\" style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">int bump(int n) { return n + 1; }<\/p>\n<p><\/span><span style=\"font-family: Consolas\">int main() { return 3 + bump(6); } <\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">With the commands, <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">cl \/FA \/Od App.cpp<\/span><span style=\"font-family: Calibri;font-size: small\"> which says &#8220;no optimizations, thank you&#8221;&nbsp;and <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">type App.asm<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\">, we get:<\/span><\/span><\/p>\n<p class=\"Code\" style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">mov&nbsp;&nbsp;&nbsp;&nbsp; ecx, 6<br \/><\/span><span style=\"font-family: Consolas\">call&nbsp;&nbsp;&nbsp; ?bump@@YAHH@Z&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ; bump<br \/><\/span><span style=\"font-family: Consolas\">add&nbsp;&nbsp;&nbsp;&nbsp; eax, 3 <\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">Just as we would expect: load 6 into ECX &ndash; which holds the first argument, in the x64 calling convention, to our function <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">bump<\/span><span style=\"font-family: Calibri;font-size: small\">.&nbsp; Then call <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">bump<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\">.&nbsp; Its result is returned in EAX.&nbsp; Finally, add 3 into EAX.<\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">Let&rsquo;s see what happens if we request optimization, with: <\/span><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">cl \/FA \/O2 App.cpp<\/span><\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"background-color: #ccffcc\"><span style=\"font-family: Consolas\">mov&nbsp;&nbsp;&nbsp;&nbsp; eax, 10 <\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;font-size: small\">Here the BackEnd Optimizer has recognized that the <\/span><span style=\"font-family: Consolas;background-color: #ccffcc\">bump<\/span><span style=\"font-size: small\"><span style=\"font-family: Calibri\"> function is so small that its body should simply be included into its caller (a sophisticated optimization called &ldquo;function inlining&rdquo; that we will examine later in the series).&nbsp; It then realizes it can evaluated the entire expression at compile time, and so ends up with a single instruction.&nbsp; Quite impressive, right?<\/span><\/span><\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you have arrived in the middle of this blog series, you might want instead to begin at the beginning. This post examines Constant-Folding &ndash; one of the simplest optimizations performed by the VC++ compiler.&nbsp; In this optimization, the compiler works out the result of an expression while it is compiling (at &ldquo;compile-time&rdquo;), and inserts [&hellip;]<\/p>\n","protected":false},"author":271,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3946,1],"tags":[],"class_list":["post-1363","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-backend","category-cplusplus"],"acf":[],"blog_post_summary":"<p>If you have arrived in the middle of this blog series, you might want instead to begin at the beginning. This post examines Constant-Folding &ndash; one of the simplest optimizations performed by the VC++ compiler.&nbsp; In this optimization, the compiler works out the result of an expression while it is compiling (at &ldquo;compile-time&rdquo;), and inserts [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/1363","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/271"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=1363"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/1363\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=1363"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=1363"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=1363"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}