{"id":43223,"date":"2015-01-05T07:00:00","date_gmt":"2015-01-05T22:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2015\/01\/05\/more-notes-on-calculating-constants-in-sse-registers\/"},"modified":"2019-03-13T12:11:45","modified_gmt":"2019-03-13T19:11:45","slug":"20150105-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150105-00\/?p=43223","title":{"rendered":"More notes on calculating constants in SSE registers"},"content":{"rendered":"<p>A few weeks ago <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2014\/12\/15\/10580665.aspx\">I noted some tricks for creating special bit patterns in all lanes<\/a>, but I forgot to cover the case where you treat the 128-bit register as one giant lane: Setting all of the least significant <var>N<\/var> bits or all of the most significant <var>N<\/var> bits. <\/p>\n<p>This is a variation of the trick for setting a bit pattern in all lanes, but the catch is that the <code>pslldq<\/code> instruction shifts by bytes, not bits. <\/p>\n<p>We&#8217;ll assume that <var>N<\/var> is not a multiple of eight, because if it were a multiple of eight, then the <code>pslldq<\/code> or <code>psrldq<\/code> instruction does the trick (after using <code>pcmpeqd<\/code> to fill the register with ones). <\/p>\n<p>One case is if <var>N<\/var> &le; 64. This is relatively easy because we can build the value by first building the desired value in both 64-bit lanes, and then finishing with a big <code>pslldq<\/code> or <code>psrldq<\/code> to clear the lane we don&#8217;t like. <\/p>\n<table BORDER=\"0\" STYLE=\"border-collapse: collapse\">\n<tr>\n<td COLSPAN=\"11\"><code>;<\/code> set the bottom <var>N<\/var> bits, where <var>N<\/var> &le; 64<\/td>\n<\/tr>\n<tr>\n<td><code>pcmpeqd xmm0, xmm0<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"4\" ALIGN=\"center\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift right<br>64 &minus; <var>N<\/var> bits<\/td>\n<td COLSPAN=\"4\" ALIGN=\"center\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift right<br>64 &minus; <var>N<\/var> bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td><code>psrlq &nbsp; xmm0, 64 - N<\/code><\/td>\n<td><code>;<\/code>     <\/p>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0FFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0FFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"8\" ALIGN=\"center\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift right 64 bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td><code>psrldq &nbsp;xmm0, 8<\/code><\/td>\n<td><code>;<\/code>     <\/p>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0FFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\">&nbsp;<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\"><code>;<\/code> set the top <var>N<\/var> bits, where <var>N<\/var> &le; 64<\/td>\n<\/tr>\n<tr>\n<td><code>pcmpeqd xmm0, xmm0<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"4\" ALIGN=\"center\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left<br>64 &minus; <var>N<\/var> bits<\/td>\n<td COLSPAN=\"4\" ALIGN=\"center\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left<br>64 &minus; <var>N<\/var> bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td><code>psllq &nbsp; xmm0, 64 - N<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"8\" ALIGN=\"center\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left 64 bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td><code>pslldq &nbsp;xmm0, 8<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<\/tr>\n<\/table>\n<p>If <var>N<\/var> &ge; 80, then we shift in zeroes into the top and bottom half, but then use a shuffle to patch up the half that needs to stay all-ones. <\/p>\n<table BORDER=\"0\" STYLE=\"border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"11\" ALIGN=\"left\"><code>;<\/code> set the bottom <var>N<\/var> bits, where <var>N<\/var> &ge; 80<\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>pcmpeqd xmm0, xmm0<\/code><\/td>\n<td ALIGN=\"left\"><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift right<br>128 &minus; <var>N<\/var> bits<\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift right<br>128 &minus; <var>N<\/var> bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>psrlq &nbsp; xmm0, 128 - N<\/code><\/td>\n<td ALIGN=\"left\"><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0FFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0FFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">copy<\/td>\n<td COLSPAN=\"3\">shuffle<\/td>\n<td STYLE=\"border: solid black;border-width: 0px 1px\">&darr;<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td STYLE=\"border-left: solid 1px black\">&darr;<\/td>\n<td>&darr;<\/td>\n<td>&darr;<\/td>\n<td STYLE=\"border-right: solid 1px black\">&darr;<\/td>\n<td STYLE=\"border-right: solid 1px black\">&#x2199;<\/td>\n<td STYLE=\"border-right: solid 1px black\">&#x2199;<\/td>\n<td STYLE=\"border-right: solid 1px black\">&#x2199;<\/td>\n<td STYLE=\"border-right: solid 1px black\">&darr;<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<td ALIGN=\"left\"><code>pshuflw xmm0, _MM_SHUFFLE(0, 0, 0, 0)<\/code><\/td>\n<td ALIGN=\"left\"><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0FFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\">&nbsp;<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" ALIGN=\"left\"><code>;<\/code> set the top <var>N<\/var> bits, where N &ge; 80<\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>pcmpeqd xmm0, xmm0<\/code><\/td>\n<td ALIGN=\"left\"><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left<br>128 &minus; <var>N<\/var> bits<\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left<br>128 &minus; <var>N<\/var> bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>psllq &nbsp; xmm0, 128 - N<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td STYLE=\"border: solid black;border-width: 0px 1px\">&darr;<\/td>\n<td COLSPAN=\"3\">shuffle<\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">copy<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td STYLE=\"border-left: solid 1px black\">&darr;<\/td>\n<td STYLE=\"border-left: solid 1px black\">&#x2198;<\/td>\n<td STYLE=\"border-left: solid 1px black\">&#x2198;<\/td>\n<td STYLE=\"border-left: solid 1px black\">&#x2198;<\/td>\n<td STYLE=\"border-left: solid 1px black\">&darr;<\/td>\n<td>&darr;<\/td>\n<td>&darr;<\/td>\n<td STYLE=\"border-right: solid 1px black\">&darr;<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>pshufhw xmm0, _MM_SHUFFLE(3, 3, 3, 3)<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<\/tr>\n<\/table>\n<p>We have <var>N<\/var> &ge; 80, which means that 128 &#8211; <var>N<\/var> &le; 48, which means that there are at least 16 bits of ones left in low-order bits after we shift right. We then use a 4&times;16-bit shuffle to copy those known-all-ones 16 bits into the other lanes of the lower half. (A similar argument applies to setting the top bits.) <\/p>\n<p>This leaves 64 &lt; <var>N<\/var> &lt; 80. That uses a different trick: <\/p>\n<table BORDER=\"0\" STYLE=\"border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"11\" ALIGN=\"left\"><code>;<\/code> set the bottom <var>N<\/var> bits, where <var>N<\/var> &le; 120<\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>pcmpeqd xmm0, xmm0<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"8\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift right 8 bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>psrldq &nbsp;xmm0, 1<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>00FF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">signed shift right<br>120 &minus; <var>N<\/var> bits<\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">signed shift right<br>120 &minus; <var>N<\/var> bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>psrad &nbsp;xmm0, 120 - N<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>00FF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<\/table>\n<p>The sneaky trick here is that we use a <i>signed<\/i> shift in order to preserve the bottom half. Unfortunately, there is no corresponding left shift that shifts in ones, so the best I can come up with is four instructions: <\/p>\n<table BORDER=\"0\" STYLE=\"border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"11\" ALIGN=\"left\"><code>;<\/code> set the top <var>N<\/var> bits, where 64 &le; <var>N<\/var> &le; 96<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>pcmpeqd xmm0, xmm0<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left<br>96 &minus; <var>N<\/var> bits<\/td>\n<td COLSPAN=\"4\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left<br>96 &minus; <var>N<\/var> bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>psllq &nbsp; xmm0, 96 - N<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"8\" STYLE=\"border: solid black;border-width: 0px 1px\">shuffle<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"2\" STYLE=\"border: solid black;border-width: 0px 1px\">&darr;<\/td>\n<td COLSPAN=\"2\">&#x2198;<\/td>\n<td COLSPAN=\"2\" STYLE=\"border-left: solid 1px black\">&darr;<\/td>\n<td COLSPAN=\"2\" STYLE=\"border: solid black;border-width: 0px 1px\">&darr;<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>pshufd &nbsp;xmm0, _MM_SHUFFLE(3, 3, 1, 0)<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFF0<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\"><\/td>\n<td COLSPAN=\"8\" STYLE=\"border: solid black;border-width: 0px 1px\">unsigned shift left 32 bits<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"11\" STYLE=\"height: 5px\"><\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"left\"><code>pslldq &nbsp;xmm0, 4<\/code><\/td>\n<td><code>;<\/code><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FFFF<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>FF00<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<td STYLE=\"border: solid 1px black\"><tt>0000<\/tt><\/td>\n<\/tr>\n<\/table>\n<p>We view the 128-bit register as four 32-bit lanes. split the shift into two steps. First, we fill Lane&nbsp;0 with the value we ultimately want in Lane&nbsp;1, then we patch up the damage we did to Lane&nbsp;2, then we do a shift the 128-bit value left 32 places to slide the value into position and zero-fill Lane&nbsp;0. <\/p>\n<p>Note that a lot of the ranges of <var>N<\/var> overlap, so you often have a choice of solutions. There are other three-instruction solutions I didn&#8217;t bother presenting here. The only one I couldn&#8217;t find a three-instruction solution for was setting the top <var>N<\/var> bits where 64 &lt; <var>N<\/var> &lt; 80. <\/p>\n<p>If you find a three-instruction solution for this last case, share it in the comments. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>All at the top or bottom.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-43223","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>All at the top or bottom.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/43223","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=43223"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/43223\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=43223"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=43223"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=43223"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}