{"id":43283,"date":"2014-12-29T07:00:00","date_gmt":"2014-12-29T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2014\/12\/29\/integer-signum-in-sse\/"},"modified":"2014-12-29T07:00:00","modified_gmt":"2014-12-29T07:00:00","slug":"integer-signum-in-sse","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20141229-00\/?p=43283","title":{"rendered":"Integer signum in SSE"},"content":{"rendered":"<p>\nThe signum function is defined as follows:\n<\/p>\n<table BORDER=\"0\">\n<tr>\n<td>signum(<var>x<\/var>) =&nbsp;<\/td>\n<td ALIGN=\"right\">&minus;1&nbsp;<\/td>\n<td>if <var>x<\/var> &lt; 0<\/td>\n<\/tr>\n<tr>\n<td>signum(<var>x<\/var>) =&nbsp;<\/td>\n<td ALIGN=\"right\">0&nbsp;<\/td>\n<td>if <var>x<\/var> = 0<\/td>\n<\/tr>\n<tr>\n<td>signum(<var>x<\/var>) =&nbsp;<\/td>\n<td ALIGN=\"right\">+1&nbsp;<\/td>\n<td>if <var>x<\/var> &gt; 0<\/td>\n<\/tr>\n<\/table>\n<p>\nThere are a couple of ways of calculating this in SSE integers.\n<\/p>\n<p>\nOne way is to convert the C idiom\n<\/p>\n<pre>\nint signum(int x) { return (x &gt; 0) - (x &lt; 0); }\n<\/pre>\n<p>\nThe SSE translation of this is mostly straightforward.\nThe quirk is that the SSE comparison functions return &minus;1\nto indicate <code>true<\/code>,\nwhereas C uses +1 to represent <code>true<\/code>.\nBut this is easy to take into account:\n<\/p>\n<table BORDER=\"0\">\n<tr>\n<td><var>x<\/var> &gt; 0<\/td>\n<td>&nbsp;&hArr;&nbsp;<\/td>\n<td> &minus; pcmpgt(<var>x<\/var>, 0)<\/td>\n<\/tr>\n<tr>\n<td><var>x<\/var> &lt; 0<\/td>\n<td>&nbsp;&hArr;&nbsp;<\/td>\n<td> &minus; pcmpgt(0, <var>x<\/var>)<\/td>\n<\/tr>\n<\/table>\n<p>\nSubstituting this into the original <code>signum<\/code> function,\nwe get\n<\/p>\n<table BORDER=\"0\">\n<tr>\n<td ALIGN=\"right\">signum(<var>x<\/var>) =&nbsp;<\/td>\n<td ALIGN=\"center\">(<var>x<\/var> &gt; 0)<\/td>\n<td>&nbsp;&minus;&nbsp;<\/td>\n<td ALIGN=\"center\">(<var>x<\/var> &lt; 0)<\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"right\">=&nbsp;<\/td>\n<td ALIGN=\"center\">&minus;&thinsp;pcmpgt(<var>x<\/var>, 0)<\/td>\n<td>&nbsp;&minus;&nbsp;<\/td>\n<td ALIGN=\"center\">&minus;&thinsp;pcmpgt(0, <var>x<\/var>)<\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"right\">=&nbsp;<\/td>\n<td ALIGN=\"center\">&minus;&thinsp;pcmpgt(<var>x<\/var>, 0)<\/td>\n<td>&nbsp;+&nbsp;<\/td>\n<td ALIGN=\"center\">pcmpgt(0, <var>x<\/var>)<\/td>\n<\/tr>\n<tr>\n<td ALIGN=\"right\">=&nbsp;<\/td>\n<td ALIGN=\"center\">pcmpgt(0, <var>x<\/var>)<\/td>\n<td>&nbsp;&minus;&nbsp;<\/td>\n<td ALIGN=\"center\">pcmpgt(<var>x<\/var>, 0)<\/td>\n<\/tr>\n<\/table>\n<p>\nIn assembly:\n<\/p>\n<pre>\n        ; assume x is in xmm0\n        pxor    xmm1, xmm1\n        pxor    xmm2, xmm2\n        pcmpgtw xmm1, xmm0 ; xmm1 = pcmpgt(0, x)\n        pcmpgtw xmm0, xmm2 ; xmm0 = pcmpgt(x, 0)\n        psubw   xmm0, xmm1 ; xmm0 = signum\n        ; answer is in xmm0\n<\/pre>\n<p>\nWith intrinsics:\n<\/p>\n<pre>\n__m128i signum16(__m128i x)\n{\n    return _mm_sub_epi16(_mm_cmpgt_epi16(_mm_setzero_si128(), x),\n                         _mm_cmpgt_epi16(x, _mm_setzero_si128()));\n}\n<\/pre>\n<p>\nThis pattern extends <i>mutatus mutandis<\/i> to\n<code>signum8<\/code>,\n<code>signum32<\/code>,\nand\n<code>signum64<\/code>.\n<\/p>\n<p>\nAnother solution is to use the signed minimum and maximum opcodes,\nusing the formula\n<\/p>\n<table BORDER=\"0\">\n<tr>\n<td>signum(<var>x<\/var>) = min(max(<var>x<\/var>, &minus;1), +1)<\/td>\n<\/tr>\n<\/table>\n<p>\nIn assembly:\n<\/p>\n<pre>\n        ; assume x is in xmm0\n        <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2014\/12\/15\/10580665.aspx\">pcmpgtw<\/a> xmm1, xmm1 ; xmm1 = -1 in all lanes\n        pmaxsw  xmm0, xmm1\n        psrlw   xmm1, 15   ; xmm1 = +1 in all lanes\n        pminsw  xmm0, xmm1\n        ; answer is in xmm0\n<\/pre>\n<p>\nWith intrinsics:\n<\/p>\n<pre>\n__m128i signum16(__m128i x)\n{\n    \/\/ alternatively: minusones = _mm_set1_epi16(-1);\n    __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),\n                                        _mm_setzero_si128());\n    x = _mm_max_epi16(x, minusones);\n    \/\/ alternatively: ones = _mm_set1_epi16(1);\n    __m128i ones = _mm_srl_epi16(minusones, 15);\n    x = _mm_min_epi16(x, ones);\n    return x;\n}\n<\/pre>\n<p>\nThe catch here is that\nSSE2 supports only 16-bit signed minimum and maximum;\nto get other bit sizes, you need to bump up to SSE4.\nBut if you&#8217;re going to do that, you may as well use the\n<code>psign<\/code> instruction.\nIn assembly:\n<\/p>\n<pre>\n        ; assume x is in xmm0\n        <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2014\/12\/15\/10580665.aspx\">pcmpgtw<\/a> xmm1, xmm1 ; xmm1 = -1 in all lanes\n        psrlw   xmm1, 15   ; xmm1 = +1 in all lanes\n        psignw  xmm1, xmm0 ; apply sign of x to xmm1\n        ; answer is in xmm1\n<\/pre>\n<p>\nWith intrinsics:\n<\/p>\n<pre>\n__m128i signum16(__m128i x)\n{\n    \/\/ alternatively: ones = _mm_set1_epi16(1);\n    __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),\n                                        _mm_setzero_si128());\n    __m128i ones = _mm_srl_epi16(minusones, 15);\n    return _mm_sign_epi16(ones, x);\n}\n<\/pre>\n<p>\nThe <code>psign<\/code> instruction applies the sign of its second\nargument to its first argument.\nWe load up the first argument\nwith the value <code>+1<\/code> in all lanes,\nthen apply the sign of <var>x<\/var>,\nwhich negates the value if the corresponding lane of <var>x<\/var>\nis negative;\nsets the value to zero if the lane is zero,\nand leaves it alone if the corresponding lane is positive.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The signum function is defined as follows: signum(x) =&nbsp; &minus;1&nbsp; if x &lt; 0 signum(x) =&nbsp; 0&nbsp; if x = 0 signum(x) =&nbsp; +1&nbsp; if x &gt; 0 There are a couple of ways of calculating this in SSE integers. One way is to convert the C idiom int signum(int x) { return (x &gt; [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-43283","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>The signum function is defined as follows: signum(x) =&nbsp; &minus;1&nbsp; if x &lt; 0 signum(x) =&nbsp; 0&nbsp; if x = 0 signum(x) =&nbsp; +1&nbsp; if x &gt; 0 There are a couple of ways of calculating this in SSE integers. One way is to convert the C idiom int signum(int x) { return (x &gt; [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/43283","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=43283"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/43283\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=43283"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=43283"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=43283"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}