Huge performance boost with minimal changes

I recently started testing some methods of sorting the pixels of an image. Specifically, I am using the System.Drawing.Common.Bitmap. I have two methods:

[Benchmark]
public unsafe void HorizontalOuter()
{
    for (int row = 0; row < _bmpData.Height - 1; row++)
    {
        int bytes = _bmpData.Stride;
        void* ptr = (void*)(_bmpData.Scan0 + _bmpData.Stride * row);
        var span = new Span<TPixel>(ptr, bytes);

        InsertionSort(
            keys: span,
            comparer: (IComparer<TPixel>)new Comparer24bit_soA_stR());
    }
}

Creating a span around each row of the image data, but using a quicker InsertionSort.

and

[Benchmark]
public unsafe void HorizontalInner()
{
    int bytes = _bmpData.Stride * _bmpData.Height;
    void* ptr = (void*)_bmpData.Scan0;
    var span = new Span<TPixel>(ptr, bytes);

    for (int row = 0; row < _bmpData.Height - 1; row++)
    {
        InsertionSort(
            keys: span,
            comparer: (IComparer<TPixel>)new Comparer24bit_soA_stR(),
            step: 1,
            from: _bmpData.Width * row,
            to: (row + 1) * _bmpData.Width);
    }
}

Creating a span around the whole image data, but using a slower implementation InsertionSort.

I benchmarked the performance of both of these methods and I was suprised to see, that HorizontalInner Is over a magnitude faster than HorizontalOuter:

HorizontalOuter
Standard Deviation: 00:00:00.0550178
Average:            00:00:00.686941

HorizontalInner
Standard Deviation: 00:00:00.0531962
Average:            00:00:00.0619287

(Warmup time of the JIT is taken into consideration.)

My question now is: Why is HorizontalOuter so much slower than HorizontalInner?

Here are both implementations of InsertionSort:
(HorizontalInner)

private static void InsertionSort(Span<TPixel> keys, IComparer<TPixel> comparer, int step, int from, int to)
{
    for (int i = from; i < to - step; i += step)
    {
        TPixel t = keys[i + step];

        int j = i;
        while (j >= from && comparer.Compare(t, keys[j]) < 0)
        {
            keys[j + step] = keys[j];
            j -= step;
        }

        keys[j + step] = t;
    }
}

(HorizontalOuter)

private static void InsertionSort(Span<TPixel> keys, IComparer<TPixel> comparer)
{
    for (int i = 0; i < keys.Length - 1; i += 1)
    {
        TPixel t = keys[i + 1];

        int j = i;
        while (j >= 0 && comparer.Compare(t, keys[j]) < 0)
        {
            keys[j + 1] = keys[j];
            j -= 1;
        }

        keys[j + 1] = t;
    }
}

Edit: The code I was using to benchmark this:

{
    var methods = typeof(T).GetMethods().Where(x => x.GetCustomAttribute<BenchmarkAttribute>() != null);
    var results = new Dictionary<MethodInfo, List<long>>();

    foreach (var method in methods)
    {

        Console.WriteLine("\n" + method.Name);

        Stopwatch metTime;
        List<long> allElapsedTicks = new();


        Console.WriteLine("Warming up...");
        metTime = Stopwatch.StartNew();
        while (metTime.ElapsedTicks <= WarmupTimePerMethod.Ticks)
        {
            T instance = Activator.CreateInstance<T>();
            Stopwatch sw = Stopwatch.StartNew();
            method.Invoke(instance, null);
            sw.Stop();
        }
        metTime.Stop();
        Console.WriteLine("Finished warm up.");


        Console.WriteLine("Starting benchmark...");
        metTime = Stopwatch.StartNew();
        while (metTime.ElapsedTicks <= BenchmarkTimePerMethod.Ticks)
        {
            T instance = Activator.CreateInstance<T>();

            Stopwatch sw = Stopwatch.StartNew();
            method.Invoke(instance, null);
            sw.Stop();

            allElapsedTicks.Add(sw.ElapsedTicks);
            Console.WriteLine($"{TimeSpan.FromTicks(sw.ElapsedTicks)}");
        }
        metTime.Stop();


        results.Add(method, allElapsedTicks);
    }


    foreach (var result in results)
    {
        Console.WriteLine("\n" + result.Key.Name);
        Console.WriteLine($"Standard Deviation: {TimeSpan.FromTicks(StdDev(result.Value))}");
        Console.WriteLine($"Average:            {TimeSpan.FromTicks(Average(result.Value))}");
    }
}
class SorterBenchmark : Sorter<Pixel_24bit> 
{ 
    public SorterBenchmark() : base(GetBmpData(new Bitmap(IMG))) { } 
}


static BitmapData GetBmpData(Bitmap bmp)
{
    Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
    return bmp.LockBits(rect, ImageLockMode.ReadWrite, bmp.PixelFormat);
}
static void Main()
{
    Benchmark.WarmupTimePerMethod = TimeSpan.FromSeconds(10);
    Benchmark.BenchmarkTimePerMethod = TimeSpan.FromSeconds(10);
    Benchmark.Run<SorterBenchmark>();
}

  • Where exactly? Looking at the final images, Both methods compute the same output. The input into ˋInsertionSortˋ is relative to the pixel-count and not bytes.

    – 




  • Sorry, but what is TPixel? If it’s int (or any types rather than byte), then new Span<TPixel>(ptr, bytes); is wrong, because the second argument is the element-count.

    – 

  • BTW are you sure you’ve created a brand new Bitmap object before every benchmark? I tested this and I got almost the same elapsed time.

    – 




  • 1

    @AlexeiLevenkov I think you picked a wrong duplicate, both methods in this question are running in the same order, row by row.

    – 

  • 1

    @shingo you are right – indeed code iterates in the same order. Have to wait till OP post a minimal reproducible example compatible with BenchmarkDotNet…

    – 




Leave a Comment