Friday, 2 November 2012

Linq Best practices.

Best Practices for Linq Enumerables and Queryables

LINQ expressions and their associated extension methods have greatly improved developer productivity ever since their introduction in .NET 3.5. Unfortunately, like any code abstraction, it can hide execution details that can come back to haunt you in performance problems and odd behavior.

I haven’t seen this information anywhere else, so I’m sharing my best practices for using LINQ and the C# language extensions.

(UPDATE: BTW, feel free to disagree with this, as long as you are thinking about your code. Maybe this should be called “Things to Think About with Linq Enumerables, etc.”, but it’s more fun to have an opinion.)

If you aren’t familiar with LINQ in C#, check out the classic 101 LINQ Samples in C#.

Also, click here for a great article on LINQ and Counting.

1. Use “Fluent C#” rather than the language extensions.

This is more of a personal preference. You can use the C# language extensions to write something that looks like a query. It’s usually pretty readable and reminds you of SQL.

1var waCustomerIDs =

2    from c in customers

3    where c.Region == "WA"

4    select c.ID;

The language extensions actually compile down to extension methods, and the actual code is more like this:

1var waCustomerIDs = customers

2    .Where (c => c.Region == "WA")

3    .Select (c => c.ID);

Notice how this is a chain of method calls with lots of embedded anonymous lambda expressions. If you use the language extensions, the syntactic sugar masks the underlying flavor of the code. In other words, you are hiding the implementation from the next person to read the code.

I believe that obvious code is better than pretty code (or clever code), because it allows the reader to better understand what is going on in the code, which helps in debugging and performance evaluation.

2. “Seal” LINQ chains as soon as they have been built (if they are going to be reused).

One of the beauties of LINQ is that the IEnumerable and IQueryable interfaces are composable. You can build chains of filters and maps and evaluate them after the chain is built. In fact, that is what the extension methods actually do:

1var allCustomers = customers;

2var waCustomers = allCustomers.Where (c => c.Region == "WA");

3var waCustomerIDs = waCustomers.Select (c => c.ID);

The danger here is that if you use your chain in more than one place, the filter logic will get evaluated more than once. That means your source IEnumerable may be scanned multiple times, or in the case of an IQueryable, you may hit your database more than once. This can cause unexpected performance problems.

1var waCustomers = customers.Where (c => c.Region == "WA");

2 

3// this will scan the entire list of customers twice

4var waCustomerIDs = waCustomers.Select (c = c.ID);

5var waCustomerNames = waCustomers.Select (c = c.Name);

The solution to this is to “seal” your chain by converting it ToList as soon as you finish building the chain. (UPDATE: When I say “built”, I mean when you have added all of the filtering, sorting and other stuff to the chain. If you plan on adding more to the chain, you might want to defer execution until then. It depends on your situation.)

Then the data is resident in memory as a new filtered, sorted, etc. List. Since List is also an Enumerable, you can continue to build the chain with additional filters. You can think of ToList as a checkpoint to tell the system to evaluate the chain into a new chain in memory.

1var waCustomers = customers.Where (c => c.Region == "WA")

2   .ToList();

3 

4// NOW waCustomers is a List of customers that has already been filtered

5 

6// this will only scan the list of WA customers twice

7var waCustomerIDs = waCustomers.Select (c = c.ID);

8var waCustomerNames = waCustomers.Select (c = c.Name);

UPDATE: note that if you aren’t going to reuse your LINQ chains, then converting it to a list is probably a waste. Also, if your list is going to have a lot of data, you might not want to pull everything into memory at once. In that case, you should really use IEnumerable, but you should be extra careful to make sure you aren’t exposing any unintended behavior.

1var waCustomers = customers.Where (c => c.Region == "WA");

2 

3// no need to close up this chain, because you're only using it once and you aren't returning it

4foreach (var customer in waCustomers)

5    Console.WriteLine(customer.Name);

3. Only return “open” chains if you want to allow for composition.

A corollary of #2 is that your methods (particularly public API methods) should never return an “open” (“unsealed”) LINQ chain. If you return an open chain, you have no idea how the caller is going to use the chain, and your code may get called in unexpected ways at unexpected times.

01public IEnumerable<Customer> GetCustomersForState(string state)

02{

03     return customerIDs = customers

04        .Where (c => c.Region == state);

05}

06 

07// this will scan the entire list of customers twice

08// AND the caller won't know it

09var waCustomers = GetCustomersForState("WA");

10var waCustomerIDs = waCustomers.Select (c = c.ID);

11var waCustomerNames = waCustomers.Select (c = c.Name);

The exception to this rule is when you want to allow for composition. Returning an open IQueryable may allow the caller to add additional filtering or sorting to your query, and then evaluate the query against a data source like SQL Server. The query is then built in the client and the logic is sent to the server for evaluation.

01public IQueryable<Customer> GetCustomersForState(string state)

02{

03     return customerIDs = customers

04        .Where (c => c.Region == state);

05}

06 

07var waCustomers = GetCustomersForState("WA");

08// waCustomers has not been evaluated and is still an IQueryable

09 

10var waPrimeMembers = waCustomers.Where(c => c.Status == "Prime");

11// waPrimeMembers has not been evaluated and is still an IQueryable

12 

13var results = waPrimeMembers.ToList();

14// NOW we hit the data source and filter on Region AND Status at the same time.

4. Make sure the semantics for an “open” chain return are clear

I would say you should always seal your IEnumerables before returning them. Sometimes you can leave IQueryable unsealed when returning them, but make sure that your API semantics and documentation make it clear about the behavior of the returned object.

5. If you are returning an open chain, make sure the inner code is free of side-effects

One of the dangers of LINQ is that the filter lambdas are actually anonymous functions that are called back during evaluation. If you don’t seal your chain, then you aren’t in control of when the lambdas get executed. This means your lambda will be executed in an unknown context.

1public IEnumerable<Customer> GetCustomersForState(string state)

2{

3     return customerIDs = customers

4        .Where (c => /* THIS WILL GET CALLED AT AN UNEXPECTED TIME */);

5}

So make sure that your lambda expressions don’t have any side-effects. At one point I had to debug a crazy bug where:

An unsealed IEnumerable was returned from a WCF method.
WCF unwound the stack and reverted the security principal back to the service.
WCF attempted to serialize the IEnumerable to the return message.
The IEnumerable chain was evaluated and one of the filter lambda expressions was called.
The lambda expression evaluated a property on one of the objects.
The object’s property get performed an security access check.
Since the security context was now the service and not the caller, the check failed.

This was solved simply by sealing the result with a ToList before returning.

So be careful when returning open chains. Better yet, seal everything before you let anyone else see it.

BTW – these best practices also apply to the Linq.js javascript library.

6 comments:

dasaradh reddy2 November 2012 at 23:49
http://www.devproconnections.com/article/database-development/guidelines-and-best-practices-in-optimizing-linq-performance
ReplyDelete
Replies
dasaradh reddy2 November 2012 at 23:53
http://blog.bobcravens.com/2009/09/best-practices-for-using-linq-in-your-data-access-layer/
ReplyDelete
Replies
dasaradh reddy4 November 2012 at 07:31
Optimize LINQ to SQL Performance

http://visualstudiomagazine.com/Articles/2007/11/01/Optimize-LINQ-to-SQL-Performance.aspx?Page=2&p=1
ReplyDelete
Replies
dasaradh reddy9 November 2012 at 19:58
Some performance issues and caveats of LINQ
THURSDAY, SEPTEMBER 25, 2008 AT 2:18 IN LINQ
20% of the code executes 80% of the time
As you may already know, Linq to collections generally performs worse, than the hard-coded approach. That happens because we deal with the iterator classes and lazy evaluation.

Most of the time this is not a problem, since the impact on the overall performance is negligible. And the code itself gets much cleaner and easier to read.

However, in the high-stress scenarios (i.e.: in heavy math calculations) improper Linq usage could become a problem. Let's write a couple of micro-benchmarks and compare the performance of hand-written loops with the Linq.

In simple selection and sum Linq works just x1.5 times slower:

// Loop - 01.73 s
for (int j = 0; j < array.Length; j++)
{
var v = array[j].Value;
if (v > Math.PI)
{
sum1 += v;
}
}

// LINQ - 02.54 s
sum2 += array.Where(o => o.Value > Math.PI).Sum(o => o.Value);
But the difference could get bigger (x2.6), if we forget, that Linq does the lazy evaluation:

// Loop - 01.77 s
for (int j = 0; j < array.Length; j++)
{
var v = array[j].Value;
if (v > Math.PI)
{
sum1 += v;
total1 += 1;
}
}

// Linq - 04.60 s
var doubles = array.Where(o => o.Value > Math.PI);
total2 += doubles.Count();
sum2 += doubles.Sum(o => o.Value);
One could be tempted to use .ToArray() to force the evaluation once, but this has its own cost and will degrade the performance even more (x3.1):

// Linq - 05.43 s
for (int i = 0; i < 100000; i++)
{
var doubles = array
.Select(o => o.Value)
.Where(d => d > Math.PI)
.ToArray();
total2 += doubles.Count();
sum2 += doubles.Sum();
}
Resume: Linq is sharp. There is a lot of stuff going inside these simple looking extension methods for collections. Just be careful with them, while writing performance-critical code, and you should be fine.

Appendix: these micro-snippets have been executed against array generated like this:

var r = new Random(1);
var array = Enumerable.Range(1, 1000)
.Select(i => new
{
ID = i,
Name = "Item_" + i,
Category = "Category_" + (i%10),
Value = r.NextDouble()*10
})
.ToArray();
Note: we use randSeed parameter to create repeatable experiment results.

Every statement was compiled in "Release" mode and executed 100000 times. Total execution time was measured with the Stopwatch class.
ReplyDelete
Replies
dasaradh reddy9 November 2012 at 20:23
Here is a simple example where LINQ helps performance. Consider this typical old-school approach:

List foos = GetSomeFoos();
List filteredFoos = new List();
foreach(Foo foo in foos)
{
if(foo.SomeProperty == "somevalue")
{
filteredFoos.Add(foo);
}
}
myRepeater.DataSource = filteredFoos;
myRepeater.DataBind();
So the above code will iterate twice and allocate a second container to hold the filtered values. What a waste! Compare with:

var foos = GetSomeFoos();
var filteredFoos = foos.Where(foo => foo.SomeProperty == "somevalue");
myRepeater.DataSource = filteredFoos;
myRepeater.DataBind();
This only iterates once (when the repeater is bound); it only ever uses the original container; filteredFoos is just an intermediate enumerator. And if, for some reason, you decide not to bind the repeater later on, nothing is wasted. You don't even iterate or evaluate once.
ReplyDelete
Replies
dasaradh reddy9 November 2012 at 20:38
http://davepeck.org/linq-collection-performance/
ReplyDelete
Replies

Perfect MVC code