Fastest way to find both the minimum and maximum of an BGR array per cell in Numba?

Question

I have an array of 3 dimensional with shape (height, width, 3) representing a BGR image with values being floats between 0 and 1 (inclusive).

I need to find both the minimum and maximum channel of each pixel in one loop. If img is the array, then the effect would be the same as np.dstack([img.max(axis=-1), img.min(axis=-1)]), and I wish to do it in one loop.

I have already tried to implement it myself, and I have beaten NumPy builtin methods by far, but the efficiency is only because Numba. I think perhaps there can be more efficient ways to do this.

My idea is simple, with three numbers a, b, c, there are 6 possible relationships:

a <= b <= c
a <= c <= b
b <= a <= c
b < c < a
c < a < b
c < b < a

With 3 conditional variable swaps I have managed to find minimum and maximum in one function without calling min and max:

def extrema(a, b, c):
    if b > c:
        b, c = c, b

    if a > b:
        a, b = b, a

    if b > c:
        b, c = c, b

    return a, c

My method is faster than min, max calls:

In [9]: %timeit extrema(29, 500, 12)
235 ns ± 14.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [10]: %timeit max((29, 500, 12))
186 ns ± 7.26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [11]: %timeit min((29, 500, 12))
186 ns ± 5.74 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

So I have implemented the following functions for benchmarking:

import numba as nb
import numpy as np

def extrema_img(img: np.ndarray) -> np.ndarray:
    ext = img.copy()
    for a, b in ((1, 2), (0, 1), (1, 2)):
        temp = ext[..., a][mask := ext[..., a] > ext[..., b]]
        ext[mask, a] = ext[mask, b]
        ext[mask, b] = temp
    return ext


def extrema_img_1(img: np.ndarray) -> np.ndarray:
    (h, w) = img.shape[:2]
    ext = np.zeros((h, w, 2))
    for y in range(h):
        for x in range(w):
            a, b, c = img[y, x]
            if b > c:
                b, c = c, b

            if a > b:
                a, b = b, a

            if b > c:
                b, c = c, b

            ext[y, x] = (a, c)
    return ext


@nb.njit(cache=True, fastmath=True)
def extrema_img_nb1(img: np.ndarray) -> np.ndarray:
    (h, w) = img.shape[:2]
    ext = np.zeros((h, w, 2))
    for y in range(h):
        for x in range(w):
            a, b, c = img[y, x]
            if b > c:
                b, c = c, b

            if a > b:
                a, b = b, a

            if b > c:
                b, c = c, b

            ext[y, x] = (a, c)
    return ext


@nb.njit(cache=True, fastmath=True)
def extrema_img_nb3(img: np.ndarray) -> np.ndarray:
    (h, w) = img.shape[:2]
    ext = np.zeros((h, w, 2))
    for y in range(h):
        for x in range(w):
            ext[y, x] = np.sort(img[y, x])[:3:2]
    return ext


def extrema_img_7(img: np.ndarray) -> np.ndarray:
    return np.sort(img, axis=-1)


def extrema_img_8(img: np.ndarray) -> tuple:
    return img.min(axis=-1), img.max(axis=-1)

With img = np.random.random((1080, 1920, 3)), the benchmark results are below:

In [2]: %timeit extrema_img(img)
338 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit extrema_img_1(img)
3.88 s ± 55.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit extrema_img_nb1(img)
18.6 ms ± 788 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %timeit extrema_img_nb1(img)
19.6 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit extrema_img_nb3(img)
1.67 s ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit extrema_img_7(img)
89.3 ms ± 3.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: %timeit extrema_img_8(img)
184 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

extrema_img_nb1 is the fastest, but extrema_img_1 is much slower than the boolean masking based solution, I have tried to compile the masking based solution but it doesn’t work, I guess I should post a separate question.

How can it be faster?

Leave a Comment Cancel reply