Sunday 24 April 2016

Basics for performance improvement in Entity Framework using LINQ

LINQ to Entity is a great ORM for querying and managing database. It offers a lot of things, so it is mandatory to know about performance of it. These are right up to a certain point as LINQ comes with its own penalties. There are some tips and tricks that we should keep in mind while designing and query database using entity framework ORM. Here is a list of some tips that I would like to share with you.
  1. Avoid to put all the DB Objects into One Single Entity Model
    Entity Model specifies a single unit of work, not all our database. If we have many database objects that are not connected to one another or these(log tables, objects used by batch processes etc.) are not used at all. Hence these objects are consuming space in the memory and cause performance degrades. So try to make separate entity models of related database objects.
  2. Disable change tracking for entity if not needed
    Whenever you retrieve the data only for reading purpose, not for modification then there is no need of object tracking. So disable object tracking by using MergeOption as below:
    NorthwindDataContext context = new NorthwindDataContext();
    context.tblCities.MergeOption = MergeOption.NoTracking;
    This option allows us to turn off the object cache and unnecessary identity management of the objects.
  3. Use Pre-Generating Views to reduce response time for first request
    When the object of ObjectContext is created first time in the application, the entity framework creates a set of classes that is required to access the database. This set of classes is called view and if your data model is large then creating the view may delay the web application response to the first request for a page. We can reduce this response time by creating view at compile time by using T4 template or EdmGen.exe command-line tool.
  4. Avoid fetching all the fields if not required
    Avoid fetching not required fields from the database. Suppose I have table of Customer with 20 fields and I am interested only in three fields – CustomerID, Name, Address then fetch only these three fields instead of fetching all the fields of the Customer table.
    //Bad Practice
    var customer =
    (from cust in dataContext.Customers
    select cust).ToList();
    //Good Practice
    var customerLite =
    (from cust in dataContext.Customers
    select new {
    customer. CustomerID,
    customer.Name,
    customer.Address
    }). ToList ();
  5. Choose appropriate Collection for data manipulation
    In linq we have Var, IEnumerable, IQueryable, IList type collection for data manipulation. Each collection has its importance and performance impact on the query, so beware of using all these collection for data manipulation.
  6. Use Compiled Query wherever needed
    Make a query to compiled query if it is frequently used to fetch records from the database. This query is slow in first time but after that it boost the performance significantly. We use Compile method of CompiledQuery class for making compiled query.
    Suppose you required to retrieve customers details again and again based on city then make this query to compiled query like as
    // create the entity object
    NorthwindEntities mobjentity = new NorthwindEntities();
    //Simple Query
    IQueryable lstCus = from customer in mobjentity.tblCustomers
    where customer.City == “Delhi”
    select customer;
    //Compiled Query
    Func> compiledQuery
    = CompiledQuery.Compile>(
    (ctx, city) =>from customer in ctx.Customers
    where customer.City == city
    select customer);
    In above query we are passing the string parameter city for filtering the records.
  7. Retrieve only required number of records
    When we are binding data to grid or doing paging, retrieve only required no of records to improve performance. This can achieved by using Take,While and Skip methods.
    // create the entity object
    NorthwindEntities mobjentity = new NorthwindEntities();
    int pageSize=10,startingPageIndex=2;
    List lstCus = mobjentity.tblCustomers.Take(pageSize)
    .Skip(startingPageIndex * pageSize)
    .ToList();
  8. Avoid using Contains
    In LINQ, we use contains method for checking existence. It is converted to “WHERE IN” in SQL which cause performance degrades.
  9. Avoid using Views
    Views degrade the LINQ query performance costly. These are slow in performance and impact the performance greatly. So avoid using views in LINQ to Entities.
  10. Debug and Optimize LINQ Query
    If you want to debug and optimize your query then LINQ Pad is a great tool for this purpose. I am a big fan of LINQ Pad. It is very useful for query construction, debugging and optimization.
    IQueryable lstCus = from customer in mobjentity.tblCustomers
    where customer.City == “Delhi”
    select customer;
    lstCus.Dump();
    Dump method of LINQ Pad give the result of above query in the result window.
1.      Remove all for each loops for accessing and manipulating data using entity framework. Instead use LINQ to SQL/ENTITIES.
a.       LINQ though adds slight additional overhead over for each loop when dealing with very large amount of data.
b.      But LINQ to SQL or Entity framework has a better performance as LINQ query gets directly converted to T-SQL.
c.       LINQ provides shorter better readable and maintainable code.
d.      Basic facts –
                                                              i.      LINQ gets executed when it gets enumerated by foreach loop.
                                                            ii.      LINQ gets executed by collection operations like ToArray(), ToList(), ToDictionary().


2.      Disable change tracking for entity if not needed.
a.       Whenever we are only reading data (no modification involved) then we should disable tracking of changes to entity.
b.      This prevents unnecessary entity caching and management.
unitResponse = (from unitRow in entities.Units.AsNoTracking()
                                    where unitRow.UnitID == unitId
                                    select new PocoEntities.Unit()
                                    {
                                        ID = unitRow.UnitID,
                                        Level = unitRow.Level,
                                 Theme = unitRow.Theme
  });

3.      Avoid fetching fields that are not required.
a.       For example I want to get only unit id, level and theme fields of the Unit table from DB.
b.      So I fetch only those fields in my LINQ query instead of selecting the entire unitRow variable in the query.
unitResponse = (from unitRow in entities.Units.AsNoTracking()
                                    where unitRow.UnitID == unitId
                                    select new PocoEntities.Unit()
                                    {
                                        ID = unitRow.UnitID,
                                        Level = unitRow.Level,
                                 Theme = unitRow.Theme
                             });
4.      Try to write LINQ in such a way that there is a single round trip while executing the query.
a.       Code with multiple round trips.
foreach (Customer c in Customers)
   foreach (Purchase p in c.Purchases)    // Another SQL round-trip
          Console.WriteLine (c.Name + " spent " + p.Price);
b.      Code with single round trip.
from c in Customers
select
   from p in c.Purchases
       select new { c.Name, p.Price }

5.      Avoid using Contains.
a.       When using LINQ to SQL avoid using contains.
b.      This gets converted to WHERE IN clause in SQL. This leads to performance degrades with large data sets.
c.       In WHERE IN sub query gets executed first, the result is indexed and made distinct. The output is then joined to first part of the query.
d.      But at the same time we can use Contains in LINQ to entities.


6.      For filtering data use DataLoadOptions.
a.       DataLoadOptions in LINQ allows immediate loading and filtering of related data. It allows you load related objects so this removes the need for firing a sub query every time you ask for related objects.
b.      If you have written code like below then each time inner loop fires a query on DB. This in turn decreases performance.
foreach (Customer c in Customers)
{
   Console.WriteLine (c.ID + " " + c.Name);
   foreach (Purchase p in c.Purchases)
          Console.WriteLine ("  - purchased a " + p.Description);
}
c.       Using DataLoadOptions load related objects.
var options = new DataLoadOptions();
options.AssociateWith <Customer> (c => c.Purchases.Where (p => p.Price > 1000));
LoadOptions = options;

foreach (Customer c in Customers)
{
   Console.WriteLine (c.ID + " " + c.Name);
   foreach (Purchase p in c.Purchases)
          Console.WriteLine ("  - purchased a " + p.Description);
}

7.      Eager loading means immediate loading of all the related objects when you query for an object. This can be done in two ways:-
a.       DataLoadOptions (LINQ to SQL) – This creates a single SQL query for querying two related tables.
var options = new DataLoadOptions();
options.LoadWith <Customer> (c => c.Purchases);
LoadOptions = options;

foreach (Customer c in Customers)
{
   Console.WriteLine (c.ID + " " + c.Name);
   foreach (Purchase p in c.Purchases)
          Console.WriteLine ("  - purchased a " + p.Description);
}
b.      Using include method (LINQ to Entity framework) – This creates a single query for related objects by using a SQL join.
List<Customer> customers = db.Customer.Include("Purchases").ToList();


8.      In order to debug and optimize LINQ queries I used the below two mentioned tools:-
a.       LINQPad – This helps us execute and debug LINQ queries. It also helps us in converting the LINQ queries to SQL queries. This way we can validate the performance of our LINQ query.
b.      Linqer – This helps us in converting our SQL queries to LINQ. So we can write our own optimized SQL queries and convert them to LINQ.

9.      Areas to explore :-
a.       Use Pre-Generating Views to reduce response time for first request.

b.      Use Compiled Query wherever needed.


How to Improve Performance of Entity Framework Query ?


What is an Entity Framework ( EF ) ?
  • Entity Framework is an Object Relational Mapper (ORM)
  • It basically generates business objects and entities according to the database tables
  • Performing basic CRUD (Create, Read, Update, Delete) operations
  • Easily managing "1 to 1", "1 to many", and "many to many" relationships
  • Ability to have Inheritance relationships between entities

Entity Framework Architecture as below :


Entity Framework Architecture

Cold vs. Warm Query Execution

What is a Cold Query Execution ?
  • The very first time any query is made against a given model ,
  • The Entity Framework does a lot of work behind the scenes to load and validate the model
  • We frequently refer to this first query as a "Cold" query

First Query Execution ( Cold query ) Performance is Like below :

First Query Execution ( Cold query )

Second Query Execution (Warm query ) Performance is Like below :

Second Query Execution (Warm query )

How to Improve Performance of First Query Execution ?
  • For that We have to Remove Cost of View Generation

What is View Generation ?
  • In order to understand what view generation is, we must first understand what “Mapping Views”are :

            What are Mapping Views ?

                     # Mapping Views are executable representations of the transformations
                         specified in the  mapping for each entity set and association

            There are 2 types of Mapping Views exist
                               
                1. Query Views
                         - These represent the Transformation necessary to go from the
                            database schema to the  conceptual schema

               2. Update Views
                        - These represent the Transformation necessary to go from the
                           conceptual schema to the database schema
        
  • The process of computing these views based on the specification of the mapping is what we call View Generation
  • View Generation can either take place dynamically when a model is loaded (run timeor at build time (compile time)
  • Default is a Run Time (that's why first query is very slow)
  • When Views are Generated, they are also Validated
  • From a performance standpoint, the Vast Majority of the cost of View Generation is actually theValidation of the Views
  • Which ensures that the connections between the Entities make sense and have the correct Cardinality for all the supported operations

What are the Factors that Affect View Generation Performance ?
  • Model size -  Referring to the number of entities and the amount of associations between these entities
  • Model complexity - Specifically inheritance involving a large number of types
  • Using Independent Associations, instead of Foreign Key Associations (will explain this in a separate blog post)

How to Use Pre-Generated Views to Decrease Model Load Time ?
  • We can use T4 Templates for Create Pre-Generated Views
  • Then View Generation is happened in Compile Time

What is a T4 ?
  • Text Template Transformation Toolkit
  • T4 is a general-purpose Templating Engine you can use to generate C# code, Visual Basic code, XML, HTML, or text of any kind

How to Use T4 For Generate Views ?

Here I am using VS 2010,C# and EF 4.1 with Code First Approach as Technologies

Step 1 : First you need to Download the Templates
              # Right click on your project's Models Folder (where, DbContext derived class exist)
              # Select Add -> New Item Like below
       
New Item


 Step 2 : # In the Add New Item dialog go to “Online Templates”
              # Then Give "EF Views" as Search Condition
              # After that Select "EF CodeFirst View Generation T4 Template for C#" Like below


EF CodeFirst View Generation T4 Template for C#


Step 3 : # In the above Step 2 Change the Name of the file at the bottom to {Context}.Views.tt
             # Where {Context} is the Name of the Class Derived from DbContext you want to
                 create Pre-Generated Views for. 


In my scenario it's MusicStoreEntities.Views.tt

B'cos my DbContext derived class in Models Folder as below

using System.Data.Entity;

namespace MvcMusicStore.Models
{
    public class MusicStoreEntities : DbContext
    {
        public DbSet<Album> Albums { getset; }
        public DbSet<Genre> Genres { getset; }
        public DbSet<Artist> Artists { getset; }
        public DbSet<Cart> Carts { getset; }
        public DbSet<Order> Orders { getset; }
        public DbSet<OrderDetail> OrderDetails { getset; }
    }
}

Step 4 : Install Template for First Time


Install Template for First Time

Step 5 : Then Finish the Template Installation by clicking OK Button

OK


What If You have already installed "EF CodeFirst View Generation T4 Template for C#" ?

  • Then You can Bypass Step 2,Step 4 & Step 5's Screens with below one
Bypass Step 2,Step 4 & Step 5's Screens


What is a Final Look of  T4 - Generated Views Inside a Project ?

Final Look of  T4 - Generated Views


Conclusion
  • By Creating Pre-Generated Views You can get More Than 4 ,5 Times Performance Boost for Your First Query Execution
  • This is a Big Difference when You're considering a Large Domain Models
  • So Try This and Enjoy the Performance Boost

No comments:

Post a Comment

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...