Integer signum in SSE

Raymond Chen

Raymond

The signum function is defined as follows:

signum(x) = −1 if x < 0
signum(x) = if x = 0
signum(x) = +1 if x > 0

There are a couple of ways of calculating this in SSE integers.

One way is to convert the C idiom

int signum(int x) { return (x > 0) - (x < 0); }

The SSE translation of this is mostly straightforward.
The quirk is that the SSE comparison functions return −1
to indicate true,
whereas C uses +1 to represent true.
But this is easy to take into account:

x > 0 ⇔  − pcmpgt(x, 0)
x < 0 ⇔  − pcmpgt(0, x)

Substituting this into the original signum function,
we get

signum(x) = (x > 0) − (x < 0)
− pcmpgt(x, 0) − − pcmpgt(0, x)
− pcmpgt(x, 0) + pcmpgt(0, x)
pcmpgt(0, x) − pcmpgt(x, 0)

In assembly:

        ; assume x is in xmm0
        pxor    xmm1, xmm1
        pxor    xmm2, xmm2
        pcmpgtw xmm1, xmm0 ; xmm1 = pcmpgt(0, x)
        pcmpgtw xmm0, xmm2 ; xmm0 = pcmpgt(x, 0)
        psubw   xmm0, xmm1 ; xmm0 = signum
        ; answer is in xmm0

With intrinsics:

__m128i signum16(__m128i x)
{
    return _mm_sub_epi16(_mm_cmpgt_epi16(_mm_setzero_si128(), x),
                         _mm_cmpgt_epi16(x, _mm_setzero_si128()));
}

This pattern extends mutatus mutandis to
signum8,
signum32,
and
signum64.

Another solution is to use the signed minimum and maximum opcodes,
using the formula

signum(x) = min(max(x, −1), +1)

In assembly:

        ; assume x is in xmm0
        pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
        pmaxsw  xmm0, xmm1
        psrlw   xmm1, 15   ; xmm1 = +1 in all lanes
        pminsw  xmm0, xmm1
        ; answer is in xmm0

With intrinsics:

__m128i signum16(__m128i x)
{
    // alternatively: minusones = _mm_set1_epi16(-1);
    __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
                                        _mm_setzero_si128());
    x = _mm_max_epi16(x, minusones);
    // alternatively: ones = _mm_set1_epi16(1);
    __m128i ones = _mm_srl_epi16(minusones, 15);
    x = _mm_min_epi16(x, ones);
    return x;
}

The catch here is that
SSE2 supports only 16-bit signed minimum and maximum;
to get other bit sizes, you need to bump up to SSE4.
But if you’re going to do that, you may as well use the
psign instruction.
In assembly:

        ; assume x is in xmm0
        pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
        psrlw   xmm1, 15   ; xmm1 = +1 in all lanes
        psignw  xmm1, xmm0 ; apply sign of x to xmm1
        ; answer is in xmm1

With intrinsics:

__m128i signum16(__m128i x)
{
    // alternatively: ones = _mm_set1_epi16(1);
    __m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
                                        _mm_setzero_si128());
    __m128i ones = _mm_srl_epi16(minusones, 15);
    return _mm_sign_epi16(ones, x);
}

The psign instruction applies the sign of its second
argument to its first argument.
We load up the first argument
with the value +1 in all lanes,
then apply the sign of x,
which negates the value if the corresponding lane of x
is negative;
sets the value to zero if the lane is zero,
and leaves it alone if the corresponding lane is positive.

Raymond Chen
Raymond Chen

Follow Raymond   

0 comments

Comments are closed.