What is the downside of replacing if statements with function pointers?

I am looking for optimizing my update loops, and got around to this code:

// Define two lambda functions
auto iftrue = [](int x, int y) -> int {
    return x;
};

auto iffalse = [](int x, int y) -> int {
    return y;
};

// Define a pointer to a function that takes two integers and returns an integer
typedef int (*FunctionPointer)(int, int);

FunctionPointer ptr = nullptr;

void SetFalse()
{
    ptr = iffalse;
}
void SetTrue()
{
    ptr = iftrue;
}

int main() {
    int a = 5, b = 3;

    SetTrue();

    // Use the second lambda function
    int result = ptr(a, b);
    std::cout << "Result: " << result << std::endl;
    
    SetFalse();
    
    // Use the second lambda function
    result = ptr(a, b);
    std::cout << "Result: " << result << std::endl;

    return 0;
}

And it hit me, wouldn’t this always run better than an if statement?

I checked speeds, and, well… https://onlinegdb.com/mCZc6Eb7Z

It seems like functions are faster. Shouldn’t everything be changed to function pointers where possible?

  • 3

    The compiler is far too clever, it optimises away your function calls completely godbolt.org/z/jT5KWTPoe. A function call is likely to be far more expensive than an if in most cases, an if does create performance penalities when the branch predictor fails and the pipeline gets flushed but calling a function involves a lot more, it has to push arguments and registers onto the stack

    – 




  • 2

    Use quick-bench.com for finding out the answer.

    – 




  • 1

    Conditionals tends to be easier to predict than data dependent function calls. Modern processors have branch predictors for conditions and conditions are cheap if they are correctly predicted. If the processor do not know, it use the static prediction with 50% chance to be fast with no past infos on it. Function calls are always slow, especially the first time due to the hard-to-predict jump (which tends to cause big pipeline stalls). Put it shortly: keep the conditions. This is simpler, easier to read and certainly faster on most platforms, especially future ones.

    – 

Firstly, your testing methodology is quite flawed. You should use a website such as quick-bench if you want to get meaningful results.

Secondly, function pointers don’t get optimized very well by the compiler.
They don’t always get inlined and simplified nicely.

Is it plausible that function pointers could be faster

Let’s compare three implementations:

int choose_if(bool c, int x, int y) {
    if (c) return x; // most naive version using if statements
    else   return y;
}

int choose_ptr(bool c, int x, int y) {
    static constexpr int(*selectors[2])(int, int) = { // table of two function pointers
        [](int a, int b) { return a; },
        [](int a, int b) { return b; }
    };
    return selectors[c](x, y); // choose a function pointer and call it; no branches
}

int (*selector)(int, int);

int choose_ptr(int x, int y) {
    return selector(x, y); // let's assume that we've made the previous decision
                           // in advance, and just have to call the pointer
}

This gives us the following output (https://godbolt.org/z/bd1ceb9s8):

choose_if(bool, int, int):                        # @choose_if(bool, int, int)
        mov     eax, esi
        test    edi, edi
        cmove   eax, edx
        ret
choose_ptr(bool, int, int):                      # @choose_ptr(bool, int, int)
        mov     eax, edi
        lea     rcx, [rip + choose_ptr(bool, int, int)::selectors]
        mov     edi, esi
        mov     esi, edx
        jmp     qword ptr [rcx + 8*rax]         # TAILCALL
choose_ptr(int, int):                       # @choose_ptr(int, int)
        mov     rax, qword ptr [rip + selector]
        jmp     rax                             # TAILCALL

choose_if is most naive, but also the most simple. cmove has only two cycles latency.
choose_ptr requires quite a bit more work: it accesses memory and makes an indirect call, both of which may be costly. It does not get optimized to the simple cmove version above, even though that is technically possible.
choose_ptr is simply an indirect (tail) call.

On some architectures, it is plausible in principle that calling function pointers could beat if statements, especially on a very simple CPU with no form of branch prediction. However, due to poor optimization of function pointers and due to branchless instructions like cmov, it’s highly unlikely on x86_64.

Also keep in mind that accessing memory is much more latent and expensive than working with registers, and most uses of function pointers force you to access memory.

Benchmark

enter image description here

This extremely naive benchmark is obviously not representative of if statements vs. function pointers as a whole, but confirms that the performance of the three functions matches intuition.

The very short answer is that when your code is processed, the compiler tried to predict the outcome of the if statement for optimization purposes. However, if it predicts incorrectly, it can be more costly.

For function pointers, you get rid of the conditional part of the execution path, so the above doesn’t occur, and in some cases the code will run faster. That being said, this speed advantage here occurs in a minority of cases.

So ultimately, the answer to your question is branch predictions.

The compiler is far too clever, it optimises away your function calls completely https://godbolt.org/z/jT5KWTPoe. This sort of micro-benchmark can be very difficult to implement, modern compilers can take many shortcuts and distort your results.

With quick-bench we can see that if is 3 times faster than function pointers when the compiler is unable to optimise the function calls away: https://quick-bench.com/q/3-9AgNCJN1MwiD2dkJ-FttbGVWY

An if statement is potentially slow as the CPU has to guess which way the branch will go, if it gets it wrong then it will have speculatively executed the wrong instructions, this execution has to then be discarded and the computation restarted from the branch point.

However a function call is likely to be more expensive than even a failed branch prediction. To call a function you need to save various registers on the stack, perform a branch (which the CPU might fail to predict), run the function then restore the registers from the stack.

The slight possibility of a benefit would be if you change between two sets of functionality, swapping function pointers could potentially be cheaper than repeated if executions, however this is the ideal case for the branch predictor when the same branch is always taken so any benefit is likely to be minimal.

Leave a Comment