{"id":103472,"date":"2020-02-24T07:00:00","date_gmt":"2020-02-24T15:00:00","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/oldnewthing\/?p=103472"},"modified":"2020-02-24T21:14:10","modified_gmt":"2020-02-25T05:14:10","slug":"20200224-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20200224-00\/?p=103472","title":{"rendered":"Why are there trivial functions like <CODE>Copy&shy;Rect<\/CODE> and <CODE>Equal&shy;Rect<\/CODE>?"},"content":{"rendered":"<p>If you dig into the bag of tricks inside <code>user32<\/code>, you&#8217;ll see some seemingly-trivial functions like <code>Copy\u00adRect<\/code> and <code>Equal\u00adRect<\/code>. Why do we even need functions for things that could be done with the <code>=<\/code> and <code>==<\/code> operators?<\/p>\n<p>Because those operators generate a lot of code.<\/p>\n<p>Copying a rectangle would go like this:<\/p>\n<pre>c4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 8b 07        mov  ax, es:[bx]    ; ax = source.left\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 89 07        mov  es:[bx], ax    ; dest.left = ax\r\n\r\nc4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 8b 47 02     mov  ax, es:[bx+2]  ; ax = source.top\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 89 47 02     mov  es:[bx+2], ax  ; dest.top = ax\r\n\r\nc4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 8b 47 04     mov  ax, es:[bx+4]  ; ax = source.right\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 89 47 04     mov  es:[bx+4], ax  ; dest.right = ax\r\n\r\nc4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 8b 47 06     mov  ax, es:[bx+6]  ; ax = source.bottom\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 89 47 06     mov  es:[bx+6], ax  ; dest.bottom = ax\r\n<\/pre>\n<p>This takes 54 bytes of code. It&#8217;s rather inefficient because the 8086 processor could indirect only through the <code>bx<\/code>, <code>bp<\/code>, <code>si<\/code>, and <code>di<\/code> registers. The <code>bp<\/code> register was reserved for use as the frame pointer, so that was off the table. The <code>si<\/code> and <code>di<\/code> registers were used as register variables, so they are busy holding something important. That leaves <code>bx<\/code> as the only register that can be used to dereference pointers.<\/p>\n<p>Since this is a 16:16 pointer, we also need a segment register, and the 8086 has only four segment registers: <code>cs<\/code> (code segment), <code>ds<\/code> (data segment), <code>ss<\/code> (stack segment), <code>es<\/code> (extra segment). Three of them have dedicated purposes, so the only one left is <code>es<\/code>. Even if we could borrow <code>si<\/code> or <code>di<\/code> temporarily, we would still be bottlenecked on <code>es<\/code>.<\/p>\n<p>If we move <code>Copy\u00adRect<\/code> to a function, then we can save a bunch of code:<\/p>\n<pre>c4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n53              push bx\r\n06              push es\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n53              push bx\r\n06              push es\r\n9a xx xx xx xx  call CopyRect\r\n<\/pre>\n<p>Only 15 bytes. Less than a third the size.<\/p>\n<p>This was the era in which <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190827-00\/?p=102809\"> developers counted bytes<\/a>, and any trick to save a few bytes was worth considering, especially since you had &#8220;only&#8221; 256KB of memory.\u00b9<\/p>\n<p>And since copying and comparing rectangles were common operations, factoring the code into a function saved a lot of bytes.<\/p>\n<p>Of course, nowadays, it&#8217;s not a lot of code to copy a rectangle manually: An entire rectangle fits into a single 128-bit register.<\/p>\n<pre>    mov    eax, [sourcerect]\r\n    movups xmm0, [eax]\r\n    mov    eax, [destrect]\r\n    movups [eax], xmm0\r\n<\/pre>\n<p><b>Bonus code golf<\/b>: We could have squeezed out a few instructions by moving two integers at a time. This requires that the two rectangles be non-overlapping in memory (to avoid data aliasing), but that&#8217;s probably a safe assumption because the original code didn&#8217;t work anyway in that case.<\/p>\n<pre>int v[5];\r\n*(RECT*)&amp;v[0] = *(RECT*)&amp;v[1]; \/\/ bad idea\r\n<\/pre>\n<p>Switching to moving two integers at a time doesn&#8217;t break anything that wasn&#8217;t already broken, so let&#8217;s do it:<\/p>\n<pre>c4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 8b 07        mov  ax, es:[bx]    ; ax = source.left\r\n26 8b 57 02     mov  dx, es:[bx+2]  ; dx = source.top\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 89 07        mov  es:[bx], ax    ; dest.left = ax\r\n26 89 57 02     mov  es:[bx+2], dx  ; dest.top = dx\r\n\r\nc4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 8b 47 04     mov  ax, es:[bx+4]  ; ax = source.right\r\n26 8b 57 06     mov  dx, es:[bx+6]  ; dx = source.bottom\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 89 47 04     mov  es:[bx+4], ax  ; dest.right = ax\r\n26 89 57 06     mov  es:[bx+6], dx  ; dest.bottom = dx\r\n<\/pre>\n<p>That dropped us down to 42 bytes. It helps, but it&#8217;s still a lot of code.<\/p>\n<p>If we&#8217;re willing to spill one of our other register variables, say, <code>si<\/code>, then we can squeeze it even further.<\/p>\n<pre>c4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 8b 07        mov  ax, es:[bx]    ; ax = source.left\r\n26 8b 57 02     mov  dx, es:[bx+2]  ; dx = source.top\r\n26 8b 4f 04     mov  cx, es:[bx+4]  ; cx = source.right\r\n26 8b 77 06     mov  si, es:[bx+6]  ; si = source.bottom\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 89 07        mov  es:[bx], ax    ; dest.left = ax\r\n26 89 57 02     mov  es:[bx+2], dx  ; dest.top = dx\r\n26 89 4f 04     mov  es:[bx+4], cx  ; dest.right = cx\r\n26 89 77 06     mov  es:[bx+6], si  ; dest.bottom = si\r\n<\/pre>\n<p>Only 36 bytes. Getting better. But still twice as big as calling <code>CopyRect<\/code>, and it cost us a register.<\/p>\n<p>Another trick: Copy the rectangle through the stack.<\/p>\n<pre>c4 5e f0        les  bx, [bp-10]    ; es:bx -&gt; source rect\r\n26 ff 37        push es:[bx]        ; push source.left\r\n26 ff 77 02     push es:[bx+2]      ; push source.top\r\n26 ff 77 04     push es:[bx+4]      ; push source.right\r\n26 8b 77 06     push es:[bx+6]      ; push source.bottom\r\nc4 5e ec        les  bx, [bp-14]    ; es:bx -&gt; destination rect\r\n26 8f 47 06     pop  es:[bx+6]      ; pop dest.bottom\r\n26 8f 47 04     pop  es:[bx+4]      ; pop dest.right\r\n26 8f 47 02     pop  es:[bx+2]      ; pop dest.top\r\n26 8f 47        pop  es:[bx]        ; pop dest.left\r\n<\/pre>\n<p>Hm, same code size as using registers.<\/p>\n<p>Okay, how about borrowing the <code>ds<\/code> register as well the <code>si<\/code> and <code>di<\/code> registers?<\/p>\n<pre>1e              push ds\r\nc5 7e ec        lds  di, [bp-14]\r\nc4 76 f0        les  si, [bp-10]\r\nfc              cld\r\na5              movsw\r\na5              movsw\r\na5              movsw\r\na5              movsw\r\n1f              pop  ds\r\n<\/pre>\n<p>Thirteen bytes, yay, though it did cost us register spills that are not immediately visible.<\/p>\n<p>This version is a tightrope walk because any operation that yields the processor risks discarding the former <code>ds<\/code> segment, which will cause problems because we will restore it to an invalid value and corrupt memory!<\/p>\n<p>\u00b9 The word &#8220;only&#8221; in in quotation marks because 256KB seems like a tiny amount of memory today, but at the time, that was the maximum amount of memory you could get for an IBM PC XT! At least not without resorting to expansion cards.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You could call them to save a dozen bytes!<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-103472","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>You could call them to save a dozen bytes!<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/103472","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=103472"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/103472\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=103472"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=103472"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=103472"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}