Saturday 15 December 2018

LINQ TO OBJECTS AND THE PERFORMANCE OF NESTED “WHERE” CALLS

This post came out of this Stack Overflow question, which essentially boils down to which is better out of these two options:
var oneBigPredicate = collection.Where(x => Condition1(x)
                                         && Condition2(x)
                                         && Condition3(x));
var multiplePredicates = collection.Where(x => Condition1(x))
                                   .Where(x => Condition2(x))
                                   .Where(x => Condition3(x))
The first case is logically a single "wrapper" sequence around the original collection, with a filter which checks all three conditions (but applying short-circuiting logic, of course) before the wrapper will yield an item from the original sequence.
The second case is logically a set of concentric wrapper sequences, each applying a filter which checks a single condition. Asking the "outer" wrapper for the next item involves that one asking the "middle" wrapper for the next item, which asks the "inner" wrapper for the next item, which asks the original collection for the next item… the item is then passed "outwards" through the wrappers, with the filters being applied as we go, of course.
Now the two will achieve the same result, and I should say up-front that in most realistic cases, it won’t make a significant difference which you use. But realistic cases aren’t nearly as interesting to investigate as pathological cases, so I decided to benchmark a few different options for the second case. In particular, I wanted to find out how long it took to iterate over a query in the cases where the condition was either "always true" or "always false" – and vary the depth of nesting. Note that I’m not actually testing the first kind of query shown above… I suspect it wouldn’t be terribly interesting, at least compared with the results of the second query.
The simplest way of creating a "collection" of a large size is to use Enumerable.Repeat(), and the simplest way of iterating over the whole sequence is to just call Count() on it… so that’s what I do, timing how long the Count() call takes. (I don’t actually print out the results of Count(), but with an "always false" predicate I’ll get 0, and with an "always true" predicate I’ll get the size of the input collection.)
Here’s the sample code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
class Test
{
    static void Main()
    {
        int size = 10000000;
        Console.WriteLine("Always false");
        RunTests(size, x => false);
        
        Console.WriteLine("Always true");
        RunTests(size, x => true);
        
    }
    
    static void RunTests(int size, Func<stringbool> predicate)
    {
        for (int i = 1; i <= 10; i++)
        {
            RunTest(i, size, predicate);
        }
    }
    
    static void RunTest(int depth, int size, Func<stringbool> predicate)
    {
        IEnumerable<string> input = Enumerable.Repeat("value", size);
        
        for (int i = 0; i < depth; i++)
        {
            input = input.Where(predicate);
        }
        
        Stopwatch sw = Stopwatch.StartNew();
        input.Count();
        sw.Stop();
        Console.WriteLine("Depth: {0} Size: {1} Time: {2}ms",
                          depth, size, sw.ElapsedMilliseconds);
    }
}
You may notice there’s no JIT warm-up here – but I’ve tried that, and it doesn’t alter the results much. Likewise I’ve also tried with a larger size of collection, and the trend is the same – and the trend is the interesting bit.

Time to guess the results…

Before I include the results, I’ll explain what I thought would happen. I had the mental model I described before, with multiple sequences feeding each other.
When the condition is always false, I’d expected the call to MoveNext() from Count() to get all the way to the innermost filtering sequence, which would then iterate over all of the input collection, applying the filter on each item and never yielding a result. That innermost MoveNext() call returns false, and that propagates right back to Count(). All the (significant) time is spent in the innermost loop, testing and rejecting items. That shouldn’t depend on the depth of the nesting we’ve got, right? I’d expect it to be linear in terms of the size of the collection, but constant in terms of nesting. We’ll see.
When the condition is always true, we’re in a much worse situation. We need to propagate every item from the collection through all the filters, out to Count(), checking the condition and accepting the item on each check. Each MoveNext() call has to go all the way to the innermost filter, then the result has to propagate out again. That sounds like it should be roughly linear in the depth of the nesting, as well as still being linear in the size of the collection – assuming a very simplistic execution model, of course.
Before you look at the results, check that you understand my logic, and see if you can think why it might not be the case.

The actual results

Here’s what I’ve actually observed, with all times in milliseconds. The size of the collection is constant – we’re only varying the depth
DepthTime for "always false"Time for "always true"
1182305
2219376
3246452
4305548
5350650
6488709
7480795
8526880
9583996
108821849
There are a few things to note here:
  • The "always true" time is going up broadly linearly, as we’d expected
  • The "always false" time is also going going up broadly linearly, even though we’d expected it to be roughly constant
  • The "always false" time is still significantly better than the "always true" time, which is somewhat reassuring
  • The performance at a depth of 10 is significantly worse than at a depth of 9 in both cases
Okay, so that’s bizarre. What can possibly be making the time taken by the "inner" filtering sequence go up with the depth of the nesting? It shouldn’t know about the rest of the calls to Where – that’s not its job. Time to investigate…

Reimplementing Where

As you may have seen before, Where isn’t very hard to implement – at least it’s not hard to implement if you’re not trying to be clever. In order to mess around with things to check that my mental model was correct, I decided to run exactly the same tests again, but against the Edulinq implementation. This time, the results are very different:
DepthTime for "always false"Time for "always true"
1232502
2235950
32561946
42292571
52273103
62263535
72263901
82324365
92294838
102265219
Well look at that… suddenly our "always false" query has the expected characteristics. The "always true" query is basically linear except for the jump in time taken between a depth of 2 and a depth of 3. This may well be the same sort of discontinuity present in the earlier results, but at a different depth. (I’ll do more research on that another time, I think.)
So if the naive implementation actually works better in some cases, what’s going wrong in the "always false" case in the real LINQ to Objects? We can work this out by taking a stack trace from a predicate. Here’s a sample query which will throw an exception:
var query = Enumerable.Repeat("value", 1)
                      .Where(x => { throw new Exception("Bang!"); })
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true);
When you try to count that query (or do anything else which will iterate over it) you get a stack trace like this:
Unhandled Exception: System.Exception: Bang!
   at Test.<Main>b__0(String x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.WhereEnumerableIterator`1.MoveNext()
   at System.Linq.Enumerable.Count[TSource](IEnumerable`1 source)
   at Test.Main()
Ooh, look at that stack – we’ve got 8 Where clauses, and 7 levels of DisplayClassf`1 – which looks like a generated class, perhaps for a lambda expression.
Between this and a helpful email from Eric Lippert, we can basically work out what’s going on. LINQ to Objects knows that it can combine two Where clauses by just constructing a compound filter. Just for kicks, let’s do the same thing – almost certainly more simplistically than LINQ to Objects – but in a way that will give the right idea:
// We could have an interface for this, of course.
public class FilteredEnumerable<T> : IEnumerable<T>
{
    private readonly IEnumerable<T> source;
    private readonly Func<T, bool> predicate;
    
    internal FilteredEnumerable(IEnumerable<T> source,
                                Func<T, bool> predicate)
    {
        this.source = source;
        this.predicate = predicate;
    }
    
    public IEnumerator<T> GetEnumerator()
    {
        foreach (T item in source)
        {
            if (predicate(item))
            {
                yield return item;
            }
        }
    }
    
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    
    public FilteredEnumerable<T> Where(Func<T, bool> extraPredicate)
    {
        return new FilteredEnumerable<T>(source,
                                         x => this.predicate(x) && extraPredicate(x));
    }
}
public static class Extensions
{
    public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
                                          Func<T, bool> predicate)
    {
        var filtered = source as FilteredEnumerable<T>;
        return filtered == null ? new FilteredEnumerable<T>(source, predicate)
                                : filtered.Where(predicate);
    }
}
Let’s run the same tests as before, and see how we do…
DepthTime for "always false"Time for "always true"
1240504
2275605
3332706
4430797
5525892
6558980
76381085
87151208
98021343
1013122768
Oh look – it feels like the real LINQ to Objects implementation. A bit slower, certainly, but that’s okay. The way it trends is the same. In particular, it’s slower than the naive implementation for the "always false" filter, but faster for the "always true" filter.

Could things be improved?

The problem here is the creation of nested delegates. We end up with a large stack (which appears to be causing a bigger problem when it reaches a depth of 10) when really we want to just build another delegate.
The thought occurs that we could potentially use expression trees to do this. Not in a signature-compatible way with LINQ to Objects, but we should be able to combine (a => a == 3) and (b => b != 10) into (x => x == 3 && x != 10) effectively. Then when we’re asked to iterate, we just need to compile the expression tree to a delegate, and filter using that single, efficient delegate.
There are three problems with this:
  • It goes against the normal approach of LINQ to Objects, using delegates instead of expression trees. Heck, with expression trees throughout we could do all kinds of interesting optimizations.
  • Creating and compiling the expression tree may be more expensive than the iteration – we don’t really know. It depends on how large the collection is, etc.
  • It’s somewhat complicated to implement, because we need to rewrite the constituent expressions; note how "a" and "b" both become "x" in the example above. This is the main reason I haven’t actually bothered trying this in order to benchmark it :)
There are various other options to do with building a more efficient way of evaluating the predicates. One pretty simple (but repetitive) solution is to have one class per number of predicates, with a field per predicate – FilteredEnumerable1, FilteredEnumerable2 etc. When you get bored (e.g. FilteredEnumberable9) you construct any further ones by combining predicates as per the LINQ to Objects approach. For example, here’s an implementation of FilteredEnumerable3:
public class FilteredEnumerable3<T> : IFilteredEnumerable<T>
{
    private readonly IEnumerable<T> source;
    private readonly Func<T, bool> predicate0;
    private readonly Func<T, bool> predicate1;
    private readonly Func<T, bool> predicate2;
    
    internal FilteredEnumerable3(IEnumerable<T> source,
                                 Func<T, bool> predicate0,
                                 Func<T, bool> predicate1,
                                 Func<T, bool> predicate2)
    {
        this.source = source;
        this.predicate0 = predicate0;
        this.predicate1 = predicate1;
        this.predicate2 = predicate2;
    }
    
    public IEnumerator<T> GetEnumerator()
    {
        foreach (T item in source)
        {
            if (predicate0(item) &&
                predicate1(item) &&
                predicate2(item))
            {
                yield return item;
            }
        }
    }
    
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    
    public IFilteredEnumerable<T> Where(Func<T, bool> extraPredicate)
    {
        return new FilteredEnumerable4<T>(source,
            predicate0,
            predicate1,
            predicate2,
            extraPredicate);
    }
}
In this case, I’m rather pleased with the results:
DepthTime for "always false"Time for "always true"
1237504
2231551
3231599
4225625
5228703
6224787
7222876
8225966
92321069
102191191
Now that’s more like it. Other than a few cells, we’re actually outperforming LINQ to Objects – and still compatible with it. The only problem is that the cells where it’s slower are the common ones – the top few in the right hand column. And this was just going up to FilteredEnumerable4<T>.
I honestly don’t know why my FilteredEnumerable1<T> is slower than whatever’s in LINQ to Objects. But it’s nice to see the rest going quickly… I suspect there are some downsides as well, and basically I trust the team to make the right decisions for normal cases.

And there’s more…

You may be surprised to hear I’ve kept things simple here – as Eric mentioned to me, it’s not just Where and Select that can be transformed in this way. Select works as well – you can combine projections together pretty easily. So imagine that we’ve effectively got this transformation:
// Normal LINQ query
var query = list.Where(x => Condition1(x))
                .Where(x => Condition2(x))
                .Select(x => Projection1(x))
                .Select(y => Projection2(y));
// After optimization
var query = list.WhereSelect(x => Condition1(x) && Condition2(x),
                             x => Projection2(Projection1(x));
Of course at that point, my FilteredEnumerableX<T> approach becomes even more unwieldy as you have two axes – the number of predicates and the number of projections.

Conclusion

As is so often the case with performance, there’s more to Where than there first appears. When I started benchmarking this I had no idea what I was getting myself into. The great thing is that very few people would ever need to know this. It’s all hidden by the abstraction.

1 comment:

  1. Good evening, a note to tell you that I love your blog, so I do not deprive myself! Thank you for all the work that it represents and for all the pleasure that I find there.

    voyance gratuite par Email

    ReplyDelete

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...