I recently started testing some methods of sorting the pixels of an image. Specifically, I am using the System.Drawing.Common.Bitmap
. I have two methods:
[Benchmark]
public unsafe void HorizontalOuter()
{
for (int row = 0; row < _bmpData.Height - 1; row++)
{
int bytes = _bmpData.Stride;
void* ptr = (void*)(_bmpData.Scan0 + _bmpData.Stride * row);
var span = new Span<TPixel>(ptr, bytes);
InsertionSort(
keys: span,
comparer: (IComparer<TPixel>)new Comparer24bit_soA_stR());
}
}
Creating a span around each row of the image data, but using a quicker InsertionSort
.
and
[Benchmark]
public unsafe void HorizontalInner()
{
int bytes = _bmpData.Stride * _bmpData.Height;
void* ptr = (void*)_bmpData.Scan0;
var span = new Span<TPixel>(ptr, bytes);
for (int row = 0; row < _bmpData.Height - 1; row++)
{
InsertionSort(
keys: span,
comparer: (IComparer<TPixel>)new Comparer24bit_soA_stR(),
step: 1,
from: _bmpData.Width * row,
to: (row + 1) * _bmpData.Width);
}
}
Creating a span around the whole image data, but using a slower implementation InsertionSort
.
I benchmarked the performance of both of these methods and I was suprised to see, that HorizontalInner
Is over a magnitude faster than HorizontalOuter
:
HorizontalOuter
Standard Deviation: 00:00:00.0550178
Average: 00:00:00.686941
HorizontalInner
Standard Deviation: 00:00:00.0531962
Average: 00:00:00.0619287
(Warmup time of the JIT is taken into consideration.)
My question now is: Why is HorizontalOuter
so much slower than HorizontalInner
?
Here are both implementations of InsertionSort
:
(HorizontalInner
)
private static void InsertionSort(Span<TPixel> keys, IComparer<TPixel> comparer, int step, int from, int to)
{
for (int i = from; i < to - step; i += step)
{
TPixel t = keys[i + step];
int j = i;
while (j >= from && comparer.Compare(t, keys[j]) < 0)
{
keys[j + step] = keys[j];
j -= step;
}
keys[j + step] = t;
}
}
(HorizontalOuter
)
private static void InsertionSort(Span<TPixel> keys, IComparer<TPixel> comparer)
{
for (int i = 0; i < keys.Length - 1; i += 1)
{
TPixel t = keys[i + 1];
int j = i;
while (j >= 0 && comparer.Compare(t, keys[j]) < 0)
{
keys[j + 1] = keys[j];
j -= 1;
}
keys[j + 1] = t;
}
}
Edit: The code I was using to benchmark this:
{
var methods = typeof(T).GetMethods().Where(x => x.GetCustomAttribute<BenchmarkAttribute>() != null);
var results = new Dictionary<MethodInfo, List<long>>();
foreach (var method in methods)
{
Console.WriteLine("\n" + method.Name);
Stopwatch metTime;
List<long> allElapsedTicks = new();
Console.WriteLine("Warming up...");
metTime = Stopwatch.StartNew();
while (metTime.ElapsedTicks <= WarmupTimePerMethod.Ticks)
{
T instance = Activator.CreateInstance<T>();
Stopwatch sw = Stopwatch.StartNew();
method.Invoke(instance, null);
sw.Stop();
}
metTime.Stop();
Console.WriteLine("Finished warm up.");
Console.WriteLine("Starting benchmark...");
metTime = Stopwatch.StartNew();
while (metTime.ElapsedTicks <= BenchmarkTimePerMethod.Ticks)
{
T instance = Activator.CreateInstance<T>();
Stopwatch sw = Stopwatch.StartNew();
method.Invoke(instance, null);
sw.Stop();
allElapsedTicks.Add(sw.ElapsedTicks);
Console.WriteLine($"{TimeSpan.FromTicks(sw.ElapsedTicks)}");
}
metTime.Stop();
results.Add(method, allElapsedTicks);
}
foreach (var result in results)
{
Console.WriteLine("\n" + result.Key.Name);
Console.WriteLine($"Standard Deviation: {TimeSpan.FromTicks(StdDev(result.Value))}");
Console.WriteLine($"Average: {TimeSpan.FromTicks(Average(result.Value))}");
}
}
class SorterBenchmark : Sorter<Pixel_24bit>
{
public SorterBenchmark() : base(GetBmpData(new Bitmap(IMG))) { }
}
static BitmapData GetBmpData(Bitmap bmp)
{
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
return bmp.LockBits(rect, ImageLockMode.ReadWrite, bmp.PixelFormat);
}
static void Main()
{
Benchmark.WarmupTimePerMethod = TimeSpan.FromSeconds(10);
Benchmark.BenchmarkTimePerMethod = TimeSpan.FromSeconds(10);
Benchmark.Run<SorterBenchmark>();
}
Where exactly? Looking at the final images, Both methods compute the same output. The input into ˋInsertionSortˋ is relative to the pixel-count and not bytes.
Sorry, but what is
TPixel
? If it’sint
(or any types rather than byte), thennew Span<TPixel>(ptr, bytes);
is wrong, because the second argument is the element-count.BTW are you sure you’ve created a brand new Bitmap object before every benchmark? I tested this and I got almost the same elapsed time.
@AlexeiLevenkov I think you picked a wrong duplicate, both methods in this question are running in the same order, row by row.
@shingo you are right – indeed code iterates in the same order. Have to wait till OP post a minimal reproducible example compatible with BenchmarkDotNet…
Show 6 more comments