Correctly measure OpenCL kernel latency
I want to measure the execution time of a loop that enqueues OpenCL kernels and then copies a sub-buffer to another location on the device. I use the following structure to profile the kernel: for (int i = 0; i < N; ++i) { cl_event test; clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global, &local, 0, NULL, &test); … Read more