What conversion is the SIMD packed fp32 to packed int32 doing? [duplicate]

I am trying to convert fp32 values to int32 values with x86 SIMD instructions. Unfortunately, the result is not what I expect. Consider the following program (available on godbolt):

#include "immintrin.h"
#include "smmintrin.h" 
#include <iostream>
int main() {
    __m256 inp = {151.175064, 215.287735, 218.776123, 216.049164, 159.008453, 98.0167694, 75.8706512, 14.4608536};
    auto converted = _mm256_cvtps_epi32(inp);
    std::cout << " The converted results are: " <<
        (int)converted[0] << ", " <<
        (int)converted[1] << ", " <<
        (int)converted[2] << ", " <<
        (int)converted[3] << ", " <<
        (int)converted[4] << ", " <<
        (int)converted[5] << ", " <<
        (int)converted[6] << ", " <<
        (int)converted[7] << std::endl;
        
}

The code prints: The converted results are: 151, 219, 159, 76, 8236, 8236, 8236, 8236. I expected the result to be 151, 215, 218, 216, 159, 98, 75, 14 (with floor truncation). The manual says about the _mm256_cvtps_epi32 instruction:

Convert packed single-precision (32-bit) floating-point elements in […] to packed 32-bit integers

What do I need to do to cast fp32 floats to int32 and obtain the result I expect? Why are the printed results so funny?

  • 3

    I haven’t checked, but are you sure that applying operator[] to a __m256i does what you think it does?

    – 

  • 1

    It’s the current rounding mode, the default being the IEEE default of nearest, with even as a tie-break. So it’s like lrint(x) or (int)nearbyint(x). To match C casts, use _mm256_cvttps_epi32 (note the extra t) for truncation toward 0 (not floor, that’s toward -Inf so it’s different for negative numbers).

    – 




  • 1

    But also, as Nate noticed, you’re indexing wrong. In GNU C, __m256i is a vector of long long elements. Casting them to (int) discards the high half so you don’t have huge numbers, but you’re indexing off the end. Compile with warnings! Wait, clang doesn’t warn for this? That’s a nasty compiler bug, it should definitely warn you about this obvious (to the compiler which has the type information) UB. GCC warns as expected, although only with -Wall enabled. godbolt.org/z/rq6KPsvdz

    – 




  • 1

    GCC defines __m256 as a GNU C native vector of 8 floats, so indexing with indexes from 0 to 7 is well-defined. Unlike with a __m256i which GCC defines as 4x long long. But don’t use loops like that to horizontal sum, especially without -ffast-math which would forbid the compiler from making efficient asm. Fastest way to do horizontal SSE vector sum (or other reduction)

    – 




  • 2

    Using a variable to index a simd vector with _mm256_extract_epi32() intrinsic mentions a bit about how [] works on __m256 / __m256i the way GCC define them (which is not portable to compilers other than GCC/Clang, and maybe ICX) Added that to the duplicate list.

    – 

Leave a Comment