I am trying to convert fp32 values to int32 values with x86 SIMD instructions. Unfortunately, the result is not what I expect. Consider the following program (available on godbolt):
#include "immintrin.h"
#include "smmintrin.h"
#include <iostream>
int main() {
__m256 inp = {151.175064, 215.287735, 218.776123, 216.049164, 159.008453, 98.0167694, 75.8706512, 14.4608536};
auto converted = _mm256_cvtps_epi32(inp);
std::cout << " The converted results are: " <<
(int)converted[0] << ", " <<
(int)converted[1] << ", " <<
(int)converted[2] << ", " <<
(int)converted[3] << ", " <<
(int)converted[4] << ", " <<
(int)converted[5] << ", " <<
(int)converted[6] << ", " <<
(int)converted[7] << std::endl;
}
The code prints: The converted results are: 151, 219, 159, 76, 8236, 8236, 8236, 8236
. I expected the result to be 151, 215, 218, 216, 159, 98, 75, 14
(with floor truncation). The manual says about the _mm256_cvtps_epi32 instruction:
Convert packed single-precision (32-bit) floating-point elements in […] to packed 32-bit integers
What do I need to do to cast fp32 floats to int32 and obtain the result I expect? Why are the printed results so funny?
I haven’t checked, but are you sure that applying
operator[]
to a__m256i
does what you think it does?It’s the current rounding mode, the default being the IEEE default of nearest, with even as a tie-break. So it’s like
lrint(x)
or(int)nearbyint(x)
. To match C casts, use_mm256_cvttps_epi32
(note the extrat
) for truncation toward 0 (not floor, that’s toward -Inf so it’s different for negative numbers).But also, as Nate noticed, you’re indexing wrong. In GNU C,
__m256i
is a vector oflong long
elements. Casting them to(int)
discards the high half so you don’t have huge numbers, but you’re indexing off the end. Compile with warnings! Wait, clang doesn’t warn for this? That’s a nasty compiler bug, it should definitely warn you about this obvious (to the compiler which has the type information) UB. GCC warns as expected, although only with-Wall
enabled. godbolt.org/z/rq6KPsvdzGCC defines
__m256
as a GNU C native vector of 8 floats, so indexing with indexes from 0 to 7 is well-defined. Unlike with a__m256i
which GCC defines as 4xlong long
. But don’t use loops like that to horizontal sum, especially without-ffast-math
which would forbid the compiler from making efficient asm. Fastest way to do horizontal SSE vector sum (or other reduction)Using a variable to index a simd vector with _mm256_extract_epi32() intrinsic mentions a bit about how
[]
works on__m256
/__m256i
the way GCC define them (which is not portable to compilers other than GCC/Clang, and maybe ICX) Added that to the duplicate list.Show 3 more comments