Wednesday, 29 October 2014

Tips to improve Entity Framework Performance


 Version Support : EF 4.1, 5.0+
Keywords : performance considerations for entity framework 5, tips to improve linq to entity query performance, tips to improve linq query performance
LINQ to Entity is a great ORM for querying and managing database. It offers a lot of things, so it is mandatory to know about performance of it. These are right up to a certain point as LINQ comes with its own penalties. There are some tips and tricks that we should keep in mind while desiging and query database using entity framework ORM. Here is a list of some tips that I would like to share with you.
  1. Avoid to put all the DB Objects into One Single Entity Model

    Entity Model specifies a single unit of work, not all our database. If we have many database objects that are not connected to one another or these(log tables, objects used by batch processes,etc.) are not used at all. Hence these objects are consuming space in the memory and cause performance degrades. So try to make separate entity models of related database objects.
  2. Disable change tracking for entity if not needed

    Whenever you retrieve the data only for reading purpose, not for modification then there is no need of object tracking. So disable object tracking by using MergeOption as below:
    1. NorthwindDataContext context = new NorthwindDataContext() context.tblCities.MergeOption = MergeOption.NoTracking;
    This option allow us to turn off the object cache and unnecessary identity management of the objects.
  3. Use Pre-Generating Views to reduce response time for first request

    When the object of ObjectContext is created first time in the application, the entity framework creates a set of classes that is required to access the database. This set of classes is called view and if your data model is large then creating the view may delay the web application response to the first request for a page. We can reduce this response time by creating view at compile time by using T4 template or EdmGen.exe command-line tool.
  4. Avoid fetching all the fields if not required

    Avoid fetching not required fields from the database. Suppose I have table of Customer with 20 fields and I am interested only in three fields - CustomerID, Name, Address then fetch only these three fields instead of fetching all the fields of the Customer table.
    1. //Bad Practice
    2. var customer =
    3. (from cust in dataContext.Customers
    4. select cust).ToList();
    5. //Good Practice
    6. var customer =
    7. (from cust in dataContext.Customers
    8. select new {
    9. customer. CustomerID,
    10. customer.Name,
    11. customer.Address
    12. }). ToList ();
  5. Choose appropriate Collection for data manipulation

    In linq we have Var, IEnumerable, IQueryable, IList type collection for data manipulation. Each collection has its importance and performance impact on the query, so beware of using all these collection for data manipulation. For learning difference among all these collection refer my articles IEnumerable VS IQueryableIEnumerable VS IList and Var VS IEnumerable.
  6. Use Compiled Query wherever needed

    Make a query to compiled query if it is frequently used to fetch records from the database. This query is slow in first time but after that it boost the performance significantly. We use Compile method of CompiledQuery class for making compiled query.
    Suppose you required to retrieve customers details again and again based on city then make this query to compiled query like as
    1. // create the entity object
    2. NorthwindEntities mobjentity = new NorthwindEntities();
    3. //Simple Query
    4. IQueryable lstCus = from customer in mobjentity.tblCustomers
    5. where customer.City == "Delhi"
    6. select customer;
    7. //Compiled Query
    8. Func> compiledQuery
    9. = CompiledQuery.Compile>(
    10. (ctx, city) =>from customer in ctx.Customers
    11. where customer.City == city
    12. select customer);
    In above query we are passing the string parameter city for filtering the records. For more about anonymous method.
  7. Retrieve only required number of records

    When we are binding data to grid or doing paging, retrieve only required no of records to improve performance. This can achieved by using Take,While and Skip methods.
    1. // create the entity object
    2. NorthwindEntities mobjentity = new NorthwindEntities();
    3. int pageSize=10,startingPageIndex=2;
    4. List lstCus = mobjentity.tblCustomers.Take(pageSize)
    5. .Skip(startingPageIndex * pageSize)
    6. .ToList();
  8. Avoid using Contains

    In LINQ, we use contains method for checking existence. It is converted to "WHERE IN" in SQL which cause performance degrades.
  9. Avoid using Views

    Views degrade the LINQ query performance costly. These are slow in performance and impact the performance greatly. So avoid using views in LINQ to Entities.
  10. Debug and Optimize LINQ Query

    If you want to debug and optimize your query then LINQ Pad is a great tool for this purpose. I am a big fan of LINQ Pad. It is very useful for query construction, debugging and optimization.
    1. IQueryable lstCus = from customer in mobjentity.tblCustomers
    2. where customer.City == "Delhi"
    3. select customer;
    4. lstCus.Dump();
    Dump method of LINQ Pad give the result of above query in the result window.

Using Grouping instead of Distinct in Entity Framework to Optimize Performance

On a number of pages on this web site I display a list of articles. For example, the home page shows a list of all articles from all categories. The FAQs page shows articles from the FAQs category only. Internally, the content items in the database (sorted in the Content table) are linked through a junction table to the Roles table. This enables me to publish content targeting specific roles. Anonymous users see content for the Anonymous role only, while, for example, I as the site administrator can see content that is assigned to the Administrators role only. I use this in order to add content to my site which is not public yet.

Using Distinct Can Kill Performance

To query the content I use a JOIN between the content and the roles, like this:
var result = from c in _repositoryWrapper.ContentItems.GetActiveForUser(roles)
             from r in c.Roles
             where roles.Contains(r.Description) || r.Description == "Visitors"
             select c;
This works fine, except when a content item is assigned to multiple roles. This happens for example when an item is assigned to the Administrators role and Visitors roles at the same time, which is the case when I put an article live. When this happens, the article is listed twice. This leads to a few problems. First of all, the Count is off, returning more items that really exist in the database which screws up the paging system. Secondly, it would display the items multiple times in the content list.
My first take at solving this problem was with the Distinct method which is an extension on IEnumerable. The good thing and the bad thing with the parameterless version of Distinct() is that it carries out the distinct operation at the database level. That's a good thing as it improves performance, minimizing the number of records that are being brought into the application. It's a bad thing for me in this case as SQL Server doesn't support distinct on text or next columns, which happens to be the datatype for some of my columns (the Summary and Body of a ContentItem to be precise). If you try it, you get an error such as the following:
{System.Data.SqlClient.SqlException (0x80131904): The text data type cannot be selected as DISTINCT because it is not comparable.
My next attempt was using an overload of Distinct that accepts an IEqualityComparer<T>. Creating the comparer class is pretty simple:
public class ContentComparer : IEqualityComparer<Content>
{
  public bool Equals(Content x, Content y)
  {
    return x.Id == y.Id;
  }

  public int GetHashCode(Content obj)
  {
    return obj.Id.GetHashCode();
  }
}
Simple and effective. Or so I thought. I used the comparer as follows:
var result = (from c in _repositoryWrapper.ContentItems.GetActiveForUser(roles)
              from r in c.Roles
              where roles.Contains(r.Description) || r.Description == "Visitors"
              select c).Distinct(new ContentComparer());
However, when you run this, you're greeted with the following exception:
"LINQ to Entities does not recognize the method 'System.Linq.IQueryable`1[Spaanjaars.Imar.Model.Content] Distinct[Content](System.Linq.IQueryable`1[Spaanjaars.Imar.Model.Content], System.Collections.Generic.IEqualityComparer`1[Spaanjaars.Imar.Model.Content..."
Ouch. Simple: yes. Effective: no. Distinct with the comparer can't be translated to SQL by EF. The fix seems easy. Simply call ToList on the EF result, and call Distinct on the returned list:
var result = (from c in _repositoryWrapper.ContentItems.GetActiveForUser(roles)
              from r in c.Roles
              where roles.Contains(r.Description) || r.Description == "Visitors"
              select c).ToList().Distinct(new ContentComparer());
When you now run this code, you get a nice, unique list of content items in the database for the specified roles.
However, this code has introduced one major problem: it has moved the Distinct logic from the database to .NET. In order to successfully create a distinct list of all records, EF has to pull all records from the database and create the distinct list within your application. This is not that bad when you need to get all content anyway and there aren't that many duplicates, but it will surely kill performance when you have lots of records and want to query just a few; usingSkip() and Take() for example. When I added Take(10) to my query:
var result = (from c in _repositoryWrapper.ContentItems.GetActiveForUser(roles)
              from r in c.Roles
              where roles.Contains(r.Description) || r.Description == "Visitors"
              select c).ToList().Distinct(new ContentComparer()).Take(10);
I got exactly the 10 items I was expecting. However an inspection of the SQL code that gets executed using the SQL Server Profiler revealed that I am still pulling all records from the database; hundreds in my case. Clearly, this is undesired behavior. With just a few hundred records, this isn't too bad (but still completely unnecessary) but it will bring down your application if you're querying tables with thousands or millions of records this way.
The fix is pretty simple though: use grouping. Grouping is another LINQ and EF supported concept and fully translates to SQL. In my case, it was as easy as this:
var distinctResult = from c in result
             group c by c.Id into uniqueIds
             select uniqueIds.FirstOrDefault();
This groups each content item in a unique group for the ID of the content item. The FirstOrDefault method then ensures I am only getting a unique record per ID. The result variable is the result from the very first LINQ query with the JOIN from the beginning of this article.
The cool thing is that the result variable is still completely composable and results in proper and optimized SQL being sent to the database. With this code, getting only 10 records using Take(10) now only retrieves those 10 items from the database.
Thanks to Andreas Niedermair and Jon Skeet for originally bringing up and answering this question onstackoverflow.com.

2 comments:

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...