Evolution of Solutions - and Perceived Performance

by Rasmus Kromann-Larsen December 20, 2008 00:04

I entered a small competition yesterday and wanted to write a short post describing my progress and findings. The competition was rather simple, write a method that returns an array containing the numbers from 1 to 1000 in a random order.

My first solution was the naive LINQ solution.

public static int[] NaiveLinq(int max)
{
    var random = new Random();
    var query = from number in Enumerable.Range(1, max)
                orderby random.Next()
                select number;

    return query.ToArray();
}

I reckoned that several people would post this solution, so I felt like doing something slightly more fancy - and since parallelization is so hot these days, why not do it with PLINQ instead, so it would actually be faster on multi-core systems. The change is really simple.

public static int[] PLinq(int max)
{
    var random = new Random();
    var query = from number in ParallelEnumerable.Range(1, max)
                orderby random.Next()
                select number;

    return query.ToArray();
}

I benchmarked it (on my quad-core machine), and saw that the PLINQ solution was indeed faster - but only for rather big solutions, the overhead was simply too big in small instances of the problem (array size < 3000). This is not too well for a competition that is supposed to make arrays of size 1000. However, reluctant to ditch my parallel idea, I made a hybrid solution, which would use regular LINQ for small instances and PLINQ for big instances, based on my benchmark:

public static int[] GenerateUsingHybridLinq(int arrayLength)
{
    var random = new Random();

    // Use PLINQ if we're above the "very scientific" limit of 3000.
    if (arrayLength >= 3000)
    {
        return (from i in ParallelEnumerable.Range(1, arrayLength)
                orderby random.Next()
                select i).ToArray();
    }

    return (from i in Enumerable.Range(1, arrayLength)
            orderby random.Next()
            select i).ToArray();
}

I posted this solution to the competition and decided I had spent enough time on it. However, it bugged me because I knew that the Random class in the .NET framework is not guaranteed to be thread-safe. It also bugged be that I was actually solving a shuffling problem by sorting. Shuffling can be solved in O(n) while sorting is O(n*log(n)). In addition I thought that the LINQ solutions were kind of dull.

So I decided to take a stab at solving it without resolving to sorting. Shuffling is a well known problem, so I implemented the Fisher-Yates algorithm, often used for shuffling cards. It is actually a rather elegant algorithm.

public static int[] Generate()
{
    Random random = new Random();
    var numbers = new int[1000];

    // Add sorted numbers to shuffle
    for (var i = 0; i < 1000; i++)
        numbers[i] = i + 1;

    var last = numbers.Length;

    while (last > 1)
    {
        // Select a random entry in the array to swap
        var swap = random.Next(last);

        // Decrease relevant end of array
        last = last - 1;

        // Swap numbers using XOR swap, we don't need no stinkin' temp variables
        if (last != swap)
        {
            numbers[last] ^= numbers[swap];
            numbers[swap] ^= numbers[last];
            numbers[last] ^= numbers[swap];
        }
    }

    return numbers;
}

I adjusted it a bit and finally found a place to make an ode to the awesomeness that is XOR swap, swapping two values without using a temporary variable. Even though the algorithm was faster asymptotically, I was curious how it would venture against the sort-based LINQ solutions performance-wise. Here is the result:

image

Note that both HLINQ and PLINQ use all 4 cores on my quad-core machine. I realize there's an overhead using LINQ, but I'm still impressed how much faster this simple little algorithm is.

I submitted my new solution, with a note to replace my hybrid solution, just before midnight, but unfortunately I think it might have been too late, it didn't seem to make the cut for the competition. At least my hybrid solution made it into the final round - and non-thread-safe Random surely won't be a problem for instances of size 1000 - since it will use regular LINQ for that.

kick it on DotNetKicks.com

Tags: , , ,

Development

Comments

12/22/2008 11:19:47 AM #

Jakob Christensen (t4rzsan)

Your Fisher-Yates solution is very similar to my entry to the competition (except I am using tmp variables for the swap Smile). This is the "standard" way of solving the random permutation problem. I believe the great performance is caused by the fact that the solution uses a single array which will probably fit into the processor cache.  Who knows how memory is handled by (P)LINQ when using OrderBy?

Jakob Christensen (t4rzsan) Denmark

Comments are closed

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant | Adjusted by Rasmus Kromann-Larsen

About Me

I am a danish .NET developer blogging about the technical side of my life, mostly .NET stuff, but also fundamental topics like design patterns, principles and productivity boosters.

In addition, I am a core group member of Aarhus .NET User Group.