The signum function is defined as follows:
| signum(x) = | −1 | if x < 0 |
| signum(x) = | 0 | if x = 0 |
| signum(x) = | +1 | if x > 0 |
There are a couple of ways of calculating this in SSE integers.
One way is to convert the C idiom
int signum(int x) { return (x > 0) - (x < 0); }
The SSE translation of this is mostly straightforward.
The quirk is that the SSE comparison functions return −1
to indicate true,
whereas C uses +1 to represent true.
But this is easy to take into account:
| x > 0 | ⇔ | − pcmpgt(x, 0) |
| x < 0 | ⇔ | − pcmpgt(0, x) |
Substituting this into the original signum function,
we get
| signum(x) = | (x > 0) | − | (x < 0) |
| = | − pcmpgt(x, 0) | − | − pcmpgt(0, x) |
| = | − pcmpgt(x, 0) | + | pcmpgt(0, x) |
| = | pcmpgt(0, x) | − | pcmpgt(x, 0) |
In assembly:
; assume x is in xmm0
pxor xmm1, xmm1
pxor xmm2, xmm2
pcmpgtw xmm1, xmm0 ; xmm1 = pcmpgt(0, x)
pcmpgtw xmm0, xmm2 ; xmm0 = pcmpgt(x, 0)
psubw xmm0, xmm1 ; xmm0 = signum
; answer is in xmm0
With intrinsics:
__m128i signum16(__m128i x)
{
return _mm_sub_epi16(_mm_cmpgt_epi16(_mm_setzero_si128(), x),
_mm_cmpgt_epi16(x, _mm_setzero_si128()));
}
This pattern extends mutatus mutandis to
signum8,
signum32,
and
signum64.
Another solution is to use the signed minimum and maximum opcodes, using the formula
| signum(x) = min(max(x, −1), +1) |
In assembly:
; assume x is in xmm0
pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
pmaxsw xmm0, xmm1
psrlw xmm1, 15 ; xmm1 = +1 in all lanes
pminsw xmm0, xmm1
; answer is in xmm0
With intrinsics:
__m128i signum16(__m128i x)
{
// alternatively: minusones = _mm_set1_epi16(-1);
__m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
_mm_setzero_si128());
x = _mm_max_epi16(x, minusones);
// alternatively: ones = _mm_set1_epi16(1);
__m128i ones = _mm_srl_epi16(minusones, 15);
x = _mm_min_epi16(x, ones);
return x;
}
The catch here is that
SSE2 supports only 16-bit signed minimum and maximum;
to get other bit sizes, you need to bump up to SSE4.
But if you’re going to do that, you may as well use the
psign instruction.
In assembly:
; assume x is in xmm0
pcmpgtw xmm1, xmm1 ; xmm1 = -1 in all lanes
psrlw xmm1, 15 ; xmm1 = +1 in all lanes
psignw xmm1, xmm0 ; apply sign of x to xmm1
; answer is in xmm1
With intrinsics:
__m128i signum16(__m128i x)
{
// alternatively: ones = _mm_set1_epi16(1);
__m128i minusones = _mm_cmpeq_epi16(_mm_setzero_si128(),
_mm_setzero_si128());
__m128i ones = _mm_srl_epi16(minusones, 15);
return _mm_sign_epi16(ones, x);
}
The psign instruction applies the sign of its second
argument to its first argument.
We load up the first argument
with the value +1 in all lanes,
then apply the sign of x,
which negates the value if the corresponding lane of x
is negative;
sets the value to zero if the lane is zero,
and leaves it alone if the corresponding lane is positive.
0 comments