Where is Clang’s ‘_mm256_pow_ps’ intrinsic?

I can’t seem to find the intrinsics for either _mm_pow_ps or _mm256_pow_ps, both of which are supposed to be included with ‘immintrin.h’.

Does Clang not define these or are they in a header I’m not including?

  • You may have a look at the SLEEF project – SLEEF: Vectorized Math Library. It has great performance which rivals Intel SVML and variable accuracy for the user to chose from. The only issue I found is it only supports MSVC on Windows (I’d like it to support CLang-CL as well).

    – 

That’s not an intrinsic; it’s an Intel SVML library function name that confusingly uses the same naming scheme as actual intrinsics. There’s no vpowps instruction. (AVX512ER on Xeon Phi does have the semi-related vexp2ps instruction…)

IDK if this naming scheme is to trick people into depending on Intel tools when writing SIMD code with their compiler (which comes with SVML), or because their compiler does treat it like an intrinsic/builtin for doing constant propagation if inputs are known, or some other reason.

For functions like that and _mm_sin_ps to be usable, you need Intel’s Short Vector Math Library (SVML). Most people just avoid using them. If it has an implementation of something you want, though, it’s worth looking into. IDK what other vector pow implementations exist.


In the intrinsics finder, you can avoid seeing these non-portable functions in your search results if you leave the SVML box unchecked.

There are some “composite” intrinsics like _mm_set_epi8() that typically compile to multiple loads and shuffles which are portable across compilers, and do inline instead of being calls to library functions.

Also note that sqrtps is a native machine instruction, so _mm_sqrt_ps() is a real intrinsic. IEEE 754 specifies mul, div, add, sub, and sqrt as “basic” operations that are requires to produce correctly-rounded results (error <= 0.5ulp), so sqrt() is special and does have direct hardware support, unlike most other “math library” functions.


There are various libraries of SIMD math functions. Some of them come with C++ wrapper libraries that allow a+b instead of _mm_add_ps(a,b).

  • glibc libmvec – since glibc 2.22, to support OpenMP 4.0 vector math functions. GCC knows how to auto-vectorize some functions like cos(), sin(), and probably pow() using it. This answer shows one inconvenient way of using it explicitly for manual vectorization. (Hopefully better ways are possible that don’t have mangled names in the source code).

  • Agner Fog’s VCL has some math functions like exp and log. (Formerly GPL licensed, now Apache).

  • https://github.com/microsoft/DirectXMath (MIT license) – I think portable to non-Windows, and doesn’t require DirectX.
  • https://sleef.org/ – apparently great performance, with variable accuracy you can choose. Formerly only supported on MSVC on Windows, the support matrix on its web site now includes GCC and Clang for x86-64 GNU/Linux and AArch64.

  • Intel’s own SVML (comes with ICC; ICC auto-vectorizes with SVML by default). Confusingly has its prototypes in immintrin.h along with actual intrinsics. Maybe they want to trick people into writing code that’s dependent on Intel tools/libraries. Or maybe they think fewer includes are better and that everyone should use their compiler…

    Also related: Intel MKL (Math Kernel Library), with matrix BLAS functions.

  • AMD ACML – end-of-life closed-source freeware. I think it just has functions that loop over arrays/matrices (like Intel MKL), not functions for single SIMD vectors.

  • sse_mathfun (zlib license) SSE2 and ARM NEON. Hasn’t been updated since about 2011 it seems. But does have implementations of single-vector math / trig functions.

Leave a Comment