Saturday 3 November 2012

linq vs stored procs


10 Tips to Improve your LINQ to SQL Application Performance
Hey there, back again. In my first post about LINQ I tried to provide a brief(okay, bit detailed) introduction for those who want to get involved with LINQ to SQL. In that post I promised to write about a basic integration of WCF and LINQ to SQL working together, but this is not that post.

Since LINQ to SQL is a code generator and an ORM and it offers a lot of things, it is normal to be suspicious about performance of it. These are right up to a certain point as LINQ comes with its own penalties. But there are several benchmarks showing that DLINQ brings us up to %93 of the ADO.NET SQL DataReader performance if optimizations are done correctly.

Hence I summed up 10 important points for me that needs to be considered during tuning your LINQ to SQL’s data retrieval and data modifying process:

1 – Turn off ObjectTrackingEnabled Property of Data Context If Not Necessary

If you are trying only to retrieve data as read only, and not modifying anything, you don’t need object tracking. So turn it off using it like in the example below:

using (NorthwindDataContext context = new NorthwindDataContext())
{
  context.ObjectTrackingEnabled = false;
}



This will allow you to turn off the unnecessary identity management of the objects – hence Data Context will not have to store them because it will be sure that there will be no change statements to generate.

2 – Do NOT Dump All Your DB Objects into One Single DataContext

DataContext represents a single unit of work, not all your database. If you have several database objects that are not connected, or they are not used at all (log tables, objects used by batch processes,etc..). These objects just unnecessarily consume space in the memory hence increasing the identity management and object tracking costs in CUD engine of the DataContext.

Instead think of separating your workspace into several DataContexts where each one represents a single unit of work associated with it. You can still configure them to use the same connection via its constructors to not to loose the benefit of connection pooling.

3 – Use CompiledQuery Wherever Needed

When creating and executing your query, there are several steps for generating the appropriate SQL from the expression, just to name some important of them:

Create expression tree

Convert it to SQL

Run the query

Retrieve the data

Convert it to the objects

As you may notice, when you are using the same query over and over, hence first and second steps are just wasting time. This is where this tiny class in System.Data.Linq namespace achieves a lot. With CompiledQuery, you compile your query once and store it somewhere for later usage. This is achieved by static CompiledQuery.Compile method.

Below is a Code Snippet for an example usage:

Func<NorthwindDataContext, IEnumerable<Category>> func =
   CompiledQuery.Compile<NorthwindDataContext, IEnumerable<Category>>
   ((NorthwindDataContext context) => context.Categories.
      Where<Category>(cat => cat.Products.Count > 5));



And now, “func” is my compiled query. It will only be compiled once when it is first run. We can now store it in a static utility class as follows :

/// <summary>
/// Utility class to store compiled queries
/// </summary>
public static class QueriesUtility
{
  /// <summary>
  /// Gets the query that returns categories with more than five products.
  /// </summary>
  /// <value>The query containing categories with more than five products.</value>
  public static Func<NorthwindDataContext, IEnumerable<Category>>
    GetCategoriesWithMoreThanFiveProducts
    {
      get
      {
        Func<NorthwindDataContext, IEnumerable<Category>> func =
          CompiledQuery.Compile<NorthwindDataContext, IEnumerable<Category>>
          ((NorthwindDataContext context) => context.Categories.
            Where<Category>(cat => cat.Products.Count > 5));
        return func;
      }
    }
}



And we can use this compiled query (since it is now a nothing but a strongly typed function for us) very easily as follows:

using (NorthwindDataContext context = new NorthwindDataContext())
{
  QueriesUtility.GetCategoriesWithMoreThanFiveProducts(context);
}



Storing and using it in this way also reduces the cost of doing a virtual call that’s done each time you access the collection – actually it is decreased to 1 call. If you don’t call the query don’t worry about compilation too, since it will be compiled whenever the query is first executed.

4 – Filter Data Down to What You Need Using DataLoadOptions.AssociateWith

When we retrieve data with Load or LoadWith we are assuming that we want to retrieve all the associated data those are bound with the primary key (and object id). But in most cases we likely need additional filtering to this. Here is where DataLoadOptions.AssociateWith generic method comes very handy. This method takes the criteria to load the data as a parameter and applies it to the query – so you get only the data that you need.

The following code below associates and retrieves the categories only with continuing products:

using (NorthwindDataContext context = new NorthwindDataContext())
{
  DataLoadOptions options = new DataLoadOptions();
  options.AssociateWith<Category>(cat=> cat.Products.Where<Product>(prod => !prod.Discontinued));
  context.LoadOptions = options;
}



5 – Turn Optimistic Concurrency Off Unless You Need It

LINQ to SQL comes with out of the box Optimistic Concurrency support with SQL timestamp columns which are mapped to Binary type. You can turn this feature on and off in both mapping file and attributes for the properties. If your application can afford running on “last update wins” basis, then doing an extra update check is just a waste.

UpdateCheck.Never is used to turn optimistic concurrency off in LINQ to SQL.

Here is an example of turning optimistic concurrency off implemented as attribute level mapping:

[Column(Storage=“_Description”, DbType=“NText”,
            UpdateCheck=UpdateCheck.Never)]
public string Description
{
  get
  {
    return this._Description;
  }
  set
  {
    if ((this._Description != value))
    {
      this.OnDescriptionChanging(value);
      this.SendPropertyChanging();
      this._Description = value;
      this.SendPropertyChanged(“Description”);
      this.OnDescriptionChanged();
    }
  }
}



6 – Constantly Monitor Queries Generated by the DataContext and Analyze the Data You Retrieve

As your query is generated on the fly, there is this possibility that you may not be aware of additional columns or extra data that is retrieved behind the scenes. Use Data Context’s Log property to be able to see what SQL are being run by the Data Context. An example is as follows:

using (NorthwindDataContext context = new NorthwindDataContext())
{
  context.Log = Console.Out;
}



Using this snippet while debugging you can see the generated SQL statements in the Output Window in Visual Studio and spot performance leaks by analyzing them. Don’t forget to comment that line out for production systems as it may create a bit of an overhead. (Wouldn’t it be great if this was configurable in the config file?)

To see your DLINQ expressions in a SQL statement manner one can use SQL Query Visualizer which needs to be installed separately from Visual Studio 2008.

7 – Avoid Unnecessary Attaches to Tables in the Context


Since Object Tracking is a great mechanism, nothing comes for free. When you  Attach an object to your context, you mean that this object was disconnected for a while and now you now want to get it back in the game. DataContext then marks it as an object that potentially will change - and this is just fine when you really intent to do that.

But there might be some circumstances that aren’t very obvious, and may lead you to attach objects that arent changed. One of such cases is doing an AttachAll for collections and not checking if the object is changed or not. For a better performance, you should check that if you are attaching ONLY the objects in the collection those are changed.

I will provide a sample code for this soon.

8 – Be Careful of Entity Identity Management Overhead

During working with a non-read only context, the objects are still being tracked – so be aware that non intuitive scenarios this can cause while you proceed. Consider the following DLINQ code:

using (NorthwindDataContext context = new NorthwindDataContext())
{
  var a = from c in context.Categories
  select c;
}



Very plain, basic DLINQ isn’t it? That’s true; there doesn’t seem any bad thing in the above code. Now let’s see the code below:

using (NorthwindDataContext context = new NorthwindDataContext())
{
  var a = from c in context.Categories
  select new Category
  {
    CategoryID = c.CategoryID,
    CategoryName = c.CategoryName,
    Description = c.Description
  };
}



The intuition is to expect that the second query will work slower than the first one, which is WRONG. It is actually much faster than the first one.

The reason for this is in the first query, for each row the objects need to be stored, since there is a possibility that you still can change them. But in the 2nd one, you are throwing that object away and creating a new one, which is more efficient.

9 – Retrieve Only the Number of Records You Need

When you are binding to a data grid, and doing paging – consider the easy to use methods that LINQ to SQL provides. These are mainly Take and Skip methods. The code snippet involves a method which retrieves enough products for a ListView with paging enabled:

/// <summary>
/// Gets the products page by page.
/// </summary>
/// <param name=”startingPageIndex”>Index of the starting page.</param>
/// <param name=”pageSize”>Size of the page.</param>
/// <returns>The list of products in the specified page</returns>
private IList<Product> GetProducts(int startingPageIndex, int pageSize)
{
  using (NorthwindDataContext context = new NorthwindDataContext())
  {
    return context.Products
           .Take<Product>(pageSize)
           .Skip<Product>(startingPageIndex * pageSize)
           .ToList<Product>();
   }
}



10 – Don’t Misuse CompiledQuery

I can hear you saying “What? Are you kiddin’ me? How can such a class like this be misused?”

Well, as it applies to all optimization LINQ to SQL is no exception:

“Premature optimization is root all of evil” – Donald Knuth

If you are using CompiledQuery make sure that you are using it more than once as it is more costly than normal querying for the first time. But why?

That’s because the resulting function coming as a CompiledQuery is an object, having the SQL statement and the delegate to apply it. It is not compiled like the way regular expressions are compiled. And your delegate has the ability to replace the variables (or parameters) in the resulting query.

That’s the end folks, I hope you’ll enjoy these tips while programming with LINQ to SQL. Any comments or questions via sidarok at sidarok dot com or here to this post are welcome.

1 comment:


  1. 110
    down vote
    accepted
    Some advantages of LINQ over sprocs:

    Type safety: I think we all understand this.
    Abstraction: This is especially true with LINQ-to-Entities. This abstraction also allows the framework to add additional improvements that you can easily take advantage of. PLINQ is an example of adding multi-threading support to LINQ. Code changes are minimal to add this support. It would be MUCH harder to do this data access code that simply calls sprocs.
    Debugging support: I can use any .NET debugger to debug the queries. With sprocs, you cannot easily debug the SQL and that experience is largely tied to your database vendor (MS SQL Server provides a query analyzer, but often that isn't enough).
    Vendor agnostic: LINQ works with lots of databases and the number of supported databases will only increase. Sprocs are not always portable between databases, either because of varying syntax or feature support (if the database supports sprocs at all).
    Deployment: Others have mentioned this already, but it's easier to deploy a single assembly than to deploy a set of sprocs. This also ties in with #4.
    Easier: You don't have to learn T-SQL to do data access, nor do you have to learn the data access API (e.g. ADO.NET) necessary for calling the sprocs. This is related to #3 and #4.
    Some disadvantages of LINQ vs sprocs:

    Network traffic: sprocs need only serialize sproc-name and argument data over the wire while LINQ sends the entire query. This can get really bad if the queries are very complex. However, LINQ's abstraction allows Microsoft to improve this over time.
    Less flexible: Sprocs can take full advantage of a database's featureset. LINQ tends to be more generic in it's support. This is common in any kind of language abstraction (e.g. C# vs assembler).
    Recompiling: If you need to make changes to the way you do data access, you need to recompile, version, and redeploy your assembly. Sprocs can sometimes allow a DBA to tune the data access routine without a need to redeploy anything.
    Security and manageability are something that people argue about too.

    Security: For example, you can protect your sensitive data by restricting access to the tables directly, and put ACLs on the sprocs. With LINQ, however, you can still restrict direct access to tables and instead put ACLs on updatable table views to achieve a similar end (assuming your database supports updatable views).
    Manageability: Using views also gives you the advantage of shielding your application non-breaking from schema changes (like table normalization). You can update the view without requiring your data access code to change.

    ReplyDelete

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...