{"id":106331,"date":"2022-03-09T07:00:00","date_gmt":"2022-03-09T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106331"},"modified":"2022-03-09T06:21:21","modified_gmt":"2022-03-09T14:21:21","slug":"20220309-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220309-00\/?p=106331","title":{"rendered":"Optimizing code to darken a bitmap, part 3"},"content":{"rendered":"<p>Our investigation into a simple function to darken a bitmap is still trying to beat this function:<\/p>\n<pre>union Pixel\r\n{\r\n    uint8_t c[4]; \/\/ four channels: red, green, blue, alpha\r\n    uint32_t v;   \/\/ full pixel value as a 32-bit integer\r\n};\r\n\r\nvoid darken(Pixel* first, Pixel* last, int darkness)\r\n{\r\n  int lightness = 256 - darkness;\r\n  for (; first &lt; last; ++first) {\r\n    first-&gt;c[0] = (uint8_t)(first-&gt;c[0] * lightness \/ 256);\r\n    first-&gt;c[1] = (uint8_t)(first-&gt;c[1] * lightness \/ 256);\r\n    first-&gt;c[2] = (uint8_t)(first-&gt;c[2] * lightness \/ 256);\r\n  }\r\n}\r\n<\/pre>\n<p>We transformed the problem into subtracting off the darkness rather than preserving the lightness, specialized the function so it supported only darkness values 8, 16, and 24 (which when rescaled to become 1, 2, and 3), and used a bitfield trick so that we had to perform only one expensive multiply instruction per iteration, rather than three, but that didn&#8217;t seem to be helping. We&#8217;ll now take advantage of the fact that the darkness factor is known to be 1, 2, or 3.<\/p>\n<pre>void darken(Pixel* first, Pixel* last, int darkness)\r\n{\r\n  int factor = darkness \/ 8;\r\n  <span style=\"color: blue;\">uint32_t mask2 = factor &gt;= 2 ? 0xFFFFFFFF : 0;\r\n  uint32_t mask3 = factor &gt;= 3 ? 0xFFFFFFFF : 0;<\/span>\r\n  for (; first &lt; last; ++first) {\r\n    uint32_t v = first-&gt;v;\r\n    uint32_t fields = (v &amp; 0xFF) |\r\n                     ((v &amp; 0xFF00) &lt;&lt; 2) |\r\n                     ((v &amp; 0xFF0000) &lt;&lt; 4);\r\n    <span style=\"color: blue;\">fields += (fields &amp; mask2) + (fields &amp; mask3);<\/span>\r\n    fields += pack_fields(31, 31, 31);\r\n    v -= (fields &gt;&gt; 5) &amp; 0x1F;\r\n    v -= (fields &gt;&gt; 7) &amp; 0x1F00;\r\n    v -= (fields &gt;&gt; 9) &amp; 0x1F0000;\r\n    first-&gt;v = v;\r\n  }\r\n}\r\n<\/pre>\n<p>The usage pattern for this function is that the darkness factor is in practice always 1, 2, or 3. So we calculate some masks that keep track of the actual value of the factor.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th style=\"border: solid 1px black;\">factor<\/th>\n<th style=\"border: solid 1px black;\"><code>mask2<\/code><\/th>\n<th style=\"border: solid 1px black;\"><code>mask3<\/code><\/th>\n<th style=\"border: solid 1px black; border-right: none;\"><code>(fields &amp; mask2)<\/code><\/th>\n<th style=\"border: 1px black; border-style: solid none;\"><code>+<\/code><\/th>\n<th style=\"border: solid 1px black; border-left: none;\"><code>(fields &amp; mask3)<\/code><\/th>\n<\/tr>\n<tr>\n<td style=\"text-align: center; border: solid 1px black;\">1<\/td>\n<td style=\"text-align: center; border: solid 1px black;\"><code>0x00000000<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black;\"><code>0x00000000<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black; border-right: none;\"><code>0<\/code><\/td>\n<td style=\"text-align: center; border: 1px black; border-style: solid none;\"><code>+<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black; border-left: none;\"><code>0<\/code><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center; border: solid 1px black;\">2<\/td>\n<td style=\"text-align: center; border: solid 1px black;\"><code>0xFFFFFFFF<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black;\"><code>0x00000000<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black; border-right: none;\"><code>fields<\/code><\/td>\n<td style=\"text-align: center; border: 1px black; border-style: solid none;\"><code>+<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black; border-left: none;\"><code>0<\/code><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center; border: solid 1px black;\">3<\/td>\n<td style=\"text-align: center; border: solid 1px black;\"><code>0xFFFFFFFF<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black;\"><code>0xFFFFFFFF<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black; border-right: none;\"><code>fields<\/code><\/td>\n<td style=\"text-align: center; border: 1px black; border-style: solid none;\"><code>+<\/code><\/td>\n<td style=\"text-align: center; border: solid 1px black; border-left: none;\"><code>fields<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The masks let us zero out one or both of the <code>fields<\/code> terms when calculating the product.<\/p>\n<p>Alas, this is 2.2\u00d7 slower than the previous version. It seems that performing two bitwise <i>and<\/i> operations and two additions is slower than a single multiply. My guess is that it&#8217;s because the factor is so small, and the CPU has an early-out for small factors.<\/p>\n<p>Okay, it&#8217;s time to bring out the big guns. Time for the SIMD registers. We&#8217;ll do that next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Jumpless conditionals via masking.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-106331","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Jumpless conditionals via masking.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106331","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106331"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106331\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106331"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106331"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106331"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}