Extract sub arrays and then concatenate them in Python

Using Python, I am sending 50 images to a cuda routine that calculates the location of 50 beads in each image. Below is a drawing explaining (I just have drawn 4 beads):

Image

The cuda routine is expecting a flattened array and so I have to extract from my image region of interests (say 64 x 64) that I flatten and then concatenate.

Here is what I have done :

import numpy as np
import time
from numpy.lib.stride_tricks import sliding_window_view

# Create a numpy array of 1024 x  1280 (the image)
image = np.random.randint(0, 255, (1024, 1280), dtype=np.uint8)


# Assuming x_values and y_values are lists of x and y coordinates of those 64 x 64 region of interests.
x_values =  np.random.randint(100, 900, 50)
y_values =  np.random.randint(100, 900, 50)

# create 2 arrays for the 2 methods. Such array is used by cuda
main_array_1 = np.zeros((50*50*64*64))
main_array_2 = np.zeros((50*50*64*64))


print("#########################")
# First method 
print("#########################")
start_time = time.time()
sub_images = np.array([image[x:x+64, y:y+64] for x, y in zip(x_values, y_values)])
flattened_sub_arrays = sub_images.ravel()
main_array_1 [:len(flattened_sub_arrays)] = flattened_sub_arrays
print("--- %s seconds ---" % ((time.time() - start_time)))
print(flattened_sub_arrays.shape)


print("#########################")
# second method (Thanks to  hpaulj, see comments).
print("#########################")
# Create an array of indices for x and y
x_indices = np.array([np.arange(x, x+64) for x in x_values])
y_indices = np.array([np.arange(y, y+64) for y in y_values])

start_time = time.time()
# Create a sliding window view of the image
window_view = np.lib.stride_tricks.sliding_window_view(image, (64, 64))
# Extract the sub-images using the x and y values
sub_images = window_view[x_values, y_values]
# Flatten the sub-images
flattened_sub_arrays_2 = sub_images.ravel()
main_array_2 [:len(flattened_sub_arrays_2)] = flattened_sub_arrays_2
print("--- %s seconds ---" % ((time.time() - start_time)))
print(flattened_sub_arrays_2.shape)


print("#########################")
print("#########################")

# compare the two methods
print(np.array_equal(main_array_1, main_array_2))

Is there a way to make this much faster ?

  • 2

    Have you used a profiler to figure out which part of this code takes the longest to run, and where there might be room for improvement?

    – 

  • Well. I just calculated the time for those 3 lines of code. I thought there may alternatives.

    – 




  • 1

    ‘sliding windows view’ might be useful here. Use it make all (64,64) blocks, and then select 50 of them for further processing.

    – 

  • That sub_images is most likely the big time consumer. It iterates on 50. It is possible to construct advanced indexing arrays that do thaat indexing once. np.linspace(x_values, x_values+64,64) would be a start.

    – 

One possible solution how to speed the function is to use numba + parallelize it:

from numba import njit, prange

@njit(parallel=True)
def store_array_numba(image, x_values, y_values, main_array):
    for i in prange(len(x_values)):
        x = x_values[i]
        y = y_values[i]
        idx = 64 * 64 * i
        for j in range(64):
            main_array[idx + 64 * j : idx + 64 * (j + 1)] = image[x + j, y : y + 64]

Benchmark on my machine (AMD 5700x):

from timeit import timeit

import numpy as np
from numba import njit, prange

# Create a numpy array of 1024 x  1280
image = np.random.randint(0, 255, (1024, 1280), dtype=np.uint8)

N = 50
# Assuming x_values and y_values are  lists of x and y coordinates (that define the top-left corner of each sub-image)
x_values = np.random.randint(100, 900, N)
y_values = np.random.randint(100, 900, N)

# This is the array that will store the sub-images and which will be used by cuda
main_array_1 = np.zeros((N * 64 * 64))
main_array_2 = np.zeros((N * 64 * 64))


def store_array(image, x_values, y_values, main_array):
    sub_images = np.array(
        [image[x : x + 64, y : y + 64] for x, y in zip(x_values, y_values)]
    )
    flattened_sub_arrays = sub_images.ravel()
    main_array[: len(flattened_sub_arrays)] = flattened_sub_arrays
    return main_array


@njit(parallel=True)
def store_array_numba(image, x_values, y_values, main_array):
    for i in prange(len(x_values)):
        x = x_values[i]
        y = y_values[i]
        idx = 64 * 64 * i
        for j in range(64):
            main_array[idx + 64 * j : idx + 64 * (j + 1)] = image[x + j, y : y + 64]


main_array_1 = store_array(image, x_values, y_values, main_array_1)
store_array_numba(image, x_values, y_values, main_array_2)
assert np.allclose(main_array_1, main_array_2)

t1 = timeit(
    "store_array(image, x_values, y_values, main_array_1)",
    number=50,
    globals=globals(),
)
t2 = timeit(
    "store_array_numba(image, x_values, y_values, main_array_2)",
    number=50,
    globals=globals(),
)
print(f"time normal = {t1}")
print(f"time numba  = {t2}")

Prints:

time normal = 0.0057696549920365214
time numba  = 0.001736361999064684

Leave a Comment