Friday, 9 November 2012

Increase LINQ Query Performance


Increase LINQ Query Performance
Jared Parsons
Code download available at: BasicInstincts2008_08.exe (2,291 KB)
Browse the Code Online
LINQ is a powerful tool enabling quick filtering data based on a standard query language. It can tear through a structured set of data using a simple and straightforward syntax. This makes it ideal for filtering data that is to be plugged into a data-binding scenario.
There is a catch, though. While it is true that LINQ is powerful and very efficient, large sets of data can still cause unexpected performance problems. This column will focus on a technique to achieve maximum performance from LINQ queries on large sets of data in order to create a responsive UI.

The Problem
In a recent project, I devised a word wheel search UI for a set of XML data using a LINQ-to-XML query for filtering the data. A word wheel search allows users to do a word-based search over a set of data. It will display results that contain a substring matching the text the user has typed in. The results are updated as the user types, providing immediate feedback.
I designed and implemented the word wheel UI using generated XML files. Users could type into a textbox, and the results would be narrowed to those elements whose name contained a substring of the typed-in text. The UI met the design specification and responded efficiently in my test environment. When the rest of the project was completed, my code began producing production XML files, I checked in the new code, and went home. Unfortunately, the next morning I found that QA had gifted me with many UI bugs. The UI had become unresponsive to the point of being unusable.
Figuring I had a bottleneck in one of my query conditions, I thought this would be an easy fix. But after profiling the UI and optimizing my query, I found that I couldn't improve the performance to an acceptable level. The efficiency of the query wasn't the issue; I was being blocked by the sheer amount of data. The test XML files only consisted of 5,000 to 10,000 elements, yet the production XML files had around 40 times as much data. Even with an ultra-fast query, searching several hundred thousand rows with no built-in indexing gets very sluggish.
I needed to find a way to speed up the search process, so I decided to take advantage of some special features of LINQ to improve performance: deferred execution and lazy evaluations. With these features, LINQ queries are not executed at the point at which they are defined. Instead, they are executed as you enumerate through the results. This gives a great deal of flexibility in how you use query results to efficiently scroll through data.

Building a Demo Application
In order to demonstrate my use of deferred execution and lazy evaluations, I'm going to walk you through building a simplified version of my word wheel search using a set of XML files and LINQ-to-XML queries. To make sure that you fully utilize the XML integration for Visual Basic®, it is necessary for you to add an XML schema to the project for the documents that you plan to process. XML schemas are similar to Microsoft® .NET Framework metadata in that they provide an outline for how the XML must look. This allows the Visual Studio® IDE to provide IntelliSense® for LINQ-to-XML queries.
Getting Visual Studio to provide IntelliSense for a schema is straightforward. Visual Basic will recognize all XML schemas that are in the current project. Simply add the schema to your project and then add an import statement for the XML namespace in any file where you will be defining a LINQ-to-XML query:
Imports <xmlns="http://tempuri.org/MsdnWheelSearch.xsd">
It is perfectly possible to use Xlinq without a schema. However, it's kind of like using Visual Studio with IntelliSense turned off. It's possible, but not any fun.
In the word wheel search application, the XML files contain information concerning assemblies, types, and members from the base class libraries (BCLs). Each XML element, or row, will contain an attribute named Id that uniquely identifies the row in its respective parent. This will be a generated number for the assembly rows. For all other rows, it will be the underlying metadata token. In addition, each row will also contain the unique ID of its parent. See Figure 1 for the XML schema of this data.
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="MsdnWheelSearch"
    targetNamespace="http://contoso.com/MsdnWheelSearch.xsd"
    elementFormDefault="qualified"
    xmlns="http://contoso.com/MsdnWheelSearch.xsd"
    xmlns:mstns="http://contoso.com/MsdnWheelSearch.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
>
  <xs:element name="Info">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Assemblies" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
               <xs:element name="Assembly" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="Name" type="xs:string" />
                  <xs:attribute name="CodeBase" type="xs:string" />
                  <xs:attribute name="Id" type="xs:int" />
                  <xs:attribute name="FullName" type="xs:string" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element name="Types" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="Type" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="Name" type="xs:string" />
                  <xs:attribute name="Namespcae" type="xs:string" />
                  <xs:attribute name="Id" type="xs:int" />
                  <xs:attribute name="AssemblyId" type="xs:int" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element name="Members" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="Member" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="Name" type="xs:string" />
                  <xs:attribute name="Kind" type="xs:string" />
                  <xs:attribute name="Id" type="xs:int" />
                  <xs:attribute name="TypeId" type="xs:int" />
                  <xs:attribute name="AssemblyId" type="xs:int" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>
You can download the sample project associated with this column to get several XML files that fit this schema. Included are both a small and a very large data set. You can use the UI to easily perform the same queries across both sets of data to get an idea of the relative performance impact they have.
The main query on which I am going to focus for this column returns all members where the name contains the specific substring currently entered by the user. It also will display the type information for the type that owns this member. This is accomplished by filtering the method rows based on a name match and then joining the results with all types having the identical Id and AssemblyId attributes, as shown here:
Private Function GetQueryMethodsOfName( _
  ByVal doc As XDocument, ByVal name As String) As IEnumerable
  Return From method In doc...<Member> _
    Where method.@Name.Contains(name) _
    Where 0 = String.Compare(method.@Kind, "Method") _
    Join type In doc...<Type> _
    On type.@Id Equals method.@TypeId _
    And type.@AssemblyId Equals method.@AssemblyId _
    Select Type = type.@Name, Method = method.@Name
End Function
The sample code has other queries that explore joining the rows to probe for data in the BCL, and I encourage you to try them out.

Basic User Interface
Figure 2 shows the example application. The primary interface for the user is the textbox. As the user types, the application filters the query results to include only rows that contain data matching the input. As the user types, the application constantly updates this value to produce a more refined result for the user.Figure 3 shows the code for my wheel search class.
Figure 2 Sample Word Wheel Application (Click the image for a larger view)
Public Class WheelSearch
  Private m_doc As XDocument = New XDocument()
  Private m_query As Query = Query.MethodsOfName

  Public Property Query() As Query
    Get
      Return m_query
    End Get
    Set(ByVal value As Query)
      m_query = value
      OnSearchTextChanged(Nothing, EventArgs.Empty)
    End Set
  End Property

  Public Property Xml() As XDocument
    Get
      Return m_doc
    End Get
    Set(ByVal value As XDocument)
      If value Is Nothing Then
        value = New XDocument
      End If
      m_doc = value
      OnSearchTextChanged(Nothing, EventArgs.Empty)
    End Set
  End Property

  Public Sub New()

    ' This call is required by the Windows Form Designer.
    InitializeComponent()

    ' Add any initialization after the InitializeComponent() call.
    m_dataGridView.AutoGenerateColumns = True
  End Sub

  Protected Overrides Sub OnLoad(ByVal e As System.EventArgs)
    MyBase.OnLoad(e)
    OnSearchTextChanged(Nothing, EventArgs.Empty)
  End Sub

#Region "Event Handlers"

  Private Sub OnSearchTextChanged(ByVal sender As Object, _
    ByVal e As EventArgs) Handles m_searchTb.TextChanged
    If m_doc Is Nothing Then
      Return
    End If
    m_dataGridView.IncrementalSearch = _
      QueryContainer.Search(m_query, m_doc, m_searchTb.Text)
  End Sub

#End Region

End Class
The results of the query will be shown in a DataGridView. And while the results of a LINQ query typically work great with data binding, it's necessary for you to convert the results to a type such as List(Of Object) when using the DataGridView. It will be the responsibility of each individual query to return an enumeration of a type that is easily plugged into the data-binding infrastructure.
Now you should load up the UI and test out the performance of the word wheel search. For the smaller data set, you will notice that the performance is snappy. With the larger data set, the performance becomes very sluggish and almost unusable.
The reason for the poor performance can be traced directly to the conversion of the LINQ query into a List(Of Object). This can be verified with a profiler. What happens under the hood here is the List(Of Object) must add each result from the query immediately, forcing the query to fully process each piece of data it encompasses. With the smaller data set, this occurs quickly enough not to be noticed. With the larger data set, this is just too much to process in between keystrokes.

Methods for Improving Performance
Assuming that the structure of the data cannot be modified, how can you increase the responsiveness of the UI? One option is to not search as the user types but instead wait until he explicitly invokes a search (for example, by pressing the Enter key or clicking a button). This solution, however, provides a much poorer user experience because it doesn't provide any immediate feedback. In addition, it doesn't solve the problem that any given search could take a significant amount of time.
Could you improve performance by capping the number of rows returned by the query? Sure, but this still detracts from the user experience because it's not showing all of the data. You'd have to design more UI to tell the user you didn't finish the search and provide a mechanism to do a full search. The full search would leave you right back where you started—with a slow experience.
So you need to show all the data, and you can't modify the structure. But do you have to show all of the data at once? What if you only give the user a small subset of the data at a time, and if a key press is detected you abandon the current search and start a new one? This would be close to other instant-search technologies, such as that found in Microsoft Outlook®.

Incremental Search
A great feature of LINQ is that it supports deferred execution. Queries are not executed at the point that they are written in code; they are merely defined at that point. They are not actually executed until the call to MoveNext on the returned enumerator. You can take advantage of this to search the data incrementally and only spend a minimal amount of time possibly blocking the UI.
I defined a class called IncrementalSearch that works on any object implementing IEnumerable. All LINQ queries implement this interface. IncrementalSearch supports the concept of stopping a search after a certain time period expires. The algorithm is to get the next value by calling MoveNext until the specified time period has expired. At that point, the data found is returned in a new Result object (see Figure 4).
Public Enum SearchStatus
  InProgress
  Completed
End Enum

Public Class Result
  Private m_allValues As IList
  Private m_incrementalValues As IList
  Private m_status As SearchStatus

  Public ReadOnly Property AllValues() As IList
    Get
      Return m_allValues
    End Get
  End Property

  Public ReadOnly Property IncrementalValues() As IList
    Get
      Return m_incrementalValues
    End Get
  End Property

  Public ReadOnly Property SearchStatus() As SearchStatus
    Get
      Return m_status
    End Get
  End Property

  Sub New(ByVal allValues As IList, _
    ByVal incrementalValues As IList, _
    ByVal status As SearchStatus)
    m_allValues = allValues
    m_incrementalValues = incrementalValues
    m_status = status
  End Sub
End Class

Public Class IncrementalSearch
  Private m_enumerable As IEnumerator
  Private m_allFound As New List(Of Object)
  Private m_status As SearchStatus = SearchStatus.InProgress
  Private m_delay As TimeSpan
  Public ReadOnly Property SearchStatus() As SearchStatus
    Get
      Return m_status
    End Get
  End Property

  Public Property Delay() As TimeSpan
    Get
      Return m_delay
    End Get
    Set(ByVal value As TimeSpan)
      m_delay = value
    End Set
  End Property

  Public Sub New(ByVal e As IEnumerable)
    m_delay = TimeSpan.FromSeconds(0.1)
    m_enumerable = e.GetEnumerator()
  End Sub

  Public Function Search() As Result
    If m_status = SearchStatus.Completed Then
      Return New Result(m_allFound, New List(Of Object), m_status)
    End If

    Dim found As New List(Of Object)
    Dim start = DateTime.Now
    Do
      If Not m_enumerable.MoveNext() Then
        m_enumerable = Nothing
        m_status = SearchStatus.Completed
        Exit Do
      End If

      found.Add(m_enumerable.Current)
    Loop Until DateTime.Now - start > m_delay

    m_allFound.AddRange(found)
    Return New Result(m_allFound, found, m_status)
  End Function

End Class
The Search method returns a Result object based on the data it finds, including the values found in the current call to Search, the list of all values found in all previous calls to Search, and the current status of the search. This means that the UI can now return data in chunks. The maximum time you can delay will be for getting the next element instead of the entire set of rows:
Dim found As New List(Of Object)
Dim start = DateTime.Now
Do
  If Not m_enumerable.MoveNext() Then
    m_enumerable = Nothing
    m_status = SearchStatus.Completed
    Exit Do
  End If

  found.Add(m_enumerable.Current)
Loop Until DateTime.Now - start > m_delay
The goal is for the UI to display the data through data binding and to provide an efficient search. Displaying the data should not cause the UI to hang. Return values are displayed in a custom Data­GridView using the incremental search pattern. The example implementation is called DataGridView­IncrementalSearch and derives from DataGridView (see Figure 5). This implementation works great with data binding and is designed to perform well with large sets of data.
Public Class DataGridViewIncrementalSearch
  Inherits DataGridView

  Private m_search As IncrementalSearch
  Private WithEvents m_timer As New Timer

  Public Property IncrementalSearch() As IncrementalSearch
    Get
      Return m_search
    End Get
    Set(ByVal value As IncrementalSearch)
      m_search = value
      m_timer.Start()
    End Set
  End Property

  Public Sub New()
    m_timer.Interval = CInt(TimeSpan.FromSeconds(0.2).TotalMilliseconds)
  End Sub

  Protected Overrides Sub Dispose(ByVal disposing As Boolean)
    If disposing Then
      m_timer.Dispose()
    End If
    MyBase.Dispose(disposing)
  End Sub

  Private Sub OnTick(ByVal sender As Object, ByVal e As EventArgs) _
    Handles m_timer.Tick
    m_timer.Stop()
    If m_search Is Nothing OrElse m_search.SearchStatus = _
      SearchStatus.Completed Then
      Return
    End If

    Dim result = m_search.Search()
    DataSource = Nothing
    DataSource = result.AllValues
    m_timer.Start()
  End Sub

End Class
There is one extra property of the same type called IncrementalSearch. Whenever this value is set, the control will call the Search method and continually update the displayed set of values. To achieve this, the control will have to call the Search method at regular intervals until the data source is exhausted.
A Windows® Forms timer is ideal for this scenario. It can be configured to call a method on the UI thread at a particular interval and can be enabled or disabled. When the IncrementalSearch property is updated, it enables the timer. The handler function will call Search. Once the search is complete, the timer will be disabled.
One issue with a Windows Forms timer is that the interval is calculated independently of how long it takes a handler to perform an action on the Tick event. That means that if the time it takes to execute the handler is longer than the interval, there will be another Tick event waiting to go off almost immediately. In the worst case scenario, these events will stack up and make the UI unresponsive. In order to work around this, you can disable the timer while calling Search and re-enable it if there is more data to process (see Figure 6). No explicit column selection functionality was created for this DataGridView. For this example, I'll let data binding choose the columns.
Private Sub OnTick(ByVal sender As Object, ByVal e As EventArgs) _
  Handles m_timer.Tick
  m_timer.Stop()
  If m_search Is Nothing _
    OrElse m_search.SearchStatus = SearchStatus.Completed Then
    ' Search is completed don't restart the timer
    Return
  End If

  ' Get the next update from the incremental search
  Dim result = m_search.Search()
  DataSource = Nothing
  DataSource = result.AllValues

  ' Restart the timer to perform another search
  m_timer.Start()
End Sub
One quirk in the code is the updating of the data source. The underlying IncrementalSearch object creates a List(Of Object) to hold the return values from the query. Every Result object contains a reference to this list instead of a copy. When updating the Data source property, the DataGridView will see that it is the same Object and will not refresh the displayed results. Setting the reference to Nothing in between will force it to reload the data.
While this is a nice, easy-to-use solution, it won't work in all scenarios. The underlying problem here is user delay. Because this is a single-threaded solution, there is no way to break the execution once you call MoveNext. Therefore, the longest you will delay the user from typing is now the DelayTime value for the IncrementalSearch class plus the longest time to find one element in the search.

Sorting and Lazy Evaluation
Most LINQ queries operate with lazy evaluation, which means that only a single element is processed each time MoveNext is called on the enumerator. The time it takes to find one element in the search is the same time it takes to find the next element. This allows for the result to be efficiently broken up into usable blocks of data:
Dim e = From method In doc...<Member> _
        Where method.@Name.Contains(name) _
        Where String.Compare(method.@Kind, "Method") = 0 _
        Select MethodName = method.@Name
' Returns when the first matching <Member> is found
Dim first = e.First()
This model will break down if the underlying LINQ query processes a substantial portion or all of the values during a single call to Move­Next. This form of query is known as eager evaluation. An example of this is the Order By operator, which returns the results in a sorted fashion. Because it is sorted, the first value cannot be returned until all parts of the query have finished executing. Otherwise it couldn't know which was the first element to return because the last element returned could be the first in terms of sorting. If you try the incremental search pattern with a query using eager evaluation, you are right back at the original problem:
Dim e = From method In doc...<Member> _
        Where method.@Name.Contains(name) _
        Where String.Compare(method.@Kind, "Method") = 0 _
        Order By method.@Name _
        Select MethodName = method.@Name
' The following line won't finish until the entire query is finished
' processing
Dim first = e.First()
Once again, the advantage here is in only dealing with a subset of the rows returned. Since the code is only displaying a subset of the data to the user, that is the only data that needs to be sorted at any given time. The underlying LINQ query can return the data unsorted, and then the code can sort it as more comes along.
It is easy to modify the IncrementalSearch class to sort the data at the end of each Search call. There are two approaches available to searching the data. The first approach is to take advantage of the built-in List(Of T).Sort method. This has an average complexity of O(N LogN) and is, generally speaking, fast enough for most applications:
Dim search As New List(Of String)(SearchForMethodNames)
search.Sort(StringComparer.OrdinalIgnoreCase)
The second method takes advantage of data being sorted at the start of the call. Instead of appending the data to the end of the existing data list, you could search for the proper place to insert and place the newly found data there. Finding the proper index is a relatively quick operation when compared to fully sorting the list, so at first this seems like the more logical choice. But there are two reasons why you cannot keep the list ordered in this manner.
First, while the IncrementalSearch class itself keeps the list sorted, there is nothing stopping the caller from reordering the values. So you cannot assume this is sorted. Second, there is an added cost to inserting into the middle of a List(Of T). Under the hood it is backed by an Array, so every element to the right of the value being inserted has to be shifted one element to the right. For large data structures, the time to do the shift is non-trivial and eventually makes this method of sorting much less efficient.
To use the collection's Sort method, you need a method for comparing the values. Ideally, you will be able to use existing comparers in the framework to sort well-known data types. However, the IncrementalSearch class is weakly typed, and most comparers are strongly typed to their object. You could not use String.Compare for a query that returned Strings, for instance, because they would be stored in a weakly typed object collection. To allow for searching and reuse of existing classes, you will need to make the Incremental­Search class more flexible by providing both a weakly and strongly typed implementation.

Making Incremental Search more Flexible
Right now the IncrementalSearch class is very flexible because it works against untyped collections. This limits the usefulness of the results, though, because everything is typed as object. LINQ queries typically return collections of anonymous types, and Intelli­Sense makes it much easier to use the types. So you must make IncrementalSearch strongly typed in order to allow for more processing on the results.
There is a problem with making IncrementalSearch itself generic. If IncrementalSearch is generic, then the control wrapping Incrementa­lSearch must be generic as well, or you'll have to create an instance of IncrementalSearch that is bound to a particular type. However, the Windows Forms designer does not support unbound generic controls, meaning that you would lose all designer support; hence, it is not an option. Having the control bind Incremental­Search to a specific type works well, but this will limit the reusability of the control to searches of that specific type.
The solution is a hybrid approach. The IncrementalSearch class remains non-generic and exposes the search results through non-generic collection interfaces such as IList. It then is made into an abstract class where the Search method becomes abstract. The Result class likewise will only expose non-generic collections. Keeping IncrementalSearch non-generic and providing abstract methods allows it to be reusable in a Windows Forms control and still have strongly typed collections.
IncrementalSearch(Of T) will be strongly typed on the search result and will derive from IncrementalSearch. It will override the Search method and provide the underlying strongly typed collection classes. See Figure 7 for the full definition of the refactored IncrementalSearch class.
Public MustInherit Class IncrementalSearch
  Protected m_status As SearchStatus = SearchStatus.InProgress
  Protected m_delay As TimeSpan

  Public MustOverride ReadOnly Property AllValues() As IList

  Public ReadOnly Property SearchStatus() As SearchStatus
    Get
      Return m_status
    End Get
  End Property

  Public Property Delay() As TimeSpan
    Get
      Return m_delay
    End Get
    Set(ByVal value As TimeSpan)
      m_delay = value
    End Set
  End Property

  Public MustOverride Function Search() As Result

  Public Shared Function Create(Of T)(ByVal e As IEnumerable(Of T)) _
    As IncrementalSearch(Of T)
    Return New IncrementalSearch(Of T)(e)
  End Function
End Class

Public NotInheritable Class IncrementalSearch(Of T)
  Inherits IncrementalSearch
  Private m_enumerator As IEnumerator(Of T)
  Private m_allFound As New List(Of T)
  Private m_comparer As IComparer(Of T)

  Public Overrides ReadOnly Property AllValues() _
    As System.Collections.IList
    Get
      Return m_allFound
    End Get
  End Property

  Public Property Comparer() As IComparer(Of T)
    Get
      Return m_comparer
    End Get
    Set(ByVal value As IComparer(Of T))
      m_comparer = value
    End Set
  End Property

  Public Sub New(ByVal e As IEnumerable(Of T))
    m_delay = TimeSpan.FromSeconds(0.1)
    m_enumerator = e.GetEnumerator()
  End Sub

  Public Overrides Function Search() As Result
    If m_status = SearchStatus.Completed Then
      Return New Result(m_allFound, New List(Of T), m_status)
    End If

    Dim found As New List(Of T)
    Dim start = DateTime.Now
    Do
      If Not m_enumerator.MoveNext() Then
        ' IEnumerable(Of T) must be disposed
        m_enumerator.Dispose()
        m_enumerator = Nothing
        m_status = SearchStatus.Completed
        Exit Do
      End If

      found.Add(m_enumerator.Current)
    Loop Until DateTime.Now - start > m_delay

    m_allFound.AddRange(found)
    If m_comparer IsNot Nothing Then
      m_allFound.Sort(m_comparer)
    End If
    Return New Result(m_allFound, found, m_status)
  End Function
End Class

Public NotInheritable Class IncrementalObjectSearch
  Inherits IncrementalSearch
  Private m_enumerator As IEnumerator
  Private m_allFound As New ArrayList
  Private m_comparer As IComparer

  Public Overrides ReadOnly Property AllValues() As _
    System.Collections.IList
    Get
      Return m_allFound
    End Get
  End Property

  Public Property Comparer() As IComparer
    Get
      Return m_comparer
    End Get
    Set(ByVal value As IComparer)
      m_comparer = value
    End Set
  End Property

  Public Sub New(ByVal e As IEnumerable)
    m_enumerator = e.GetEnumerator()
  End Sub

  Public Overrides Function Search() As Result
    If m_status <> SearchStatus.InProgress Then
      Return New Result(m_allFound, New List(Of Object), m_status)
    End If

    Dim found As New List(Of Object)
    Dim start = DateTime.Now
    Do
      If Not m_enumerator.MoveNext() Then
        m_enumerator = Nothing
        m_status = SearchStatus.Completed
        Exit Do
      End If
      found.Add(m_enumerator.Current)
    Loop Until DateTime.Now - start > m_delay

    m_allFound.AddRange(found)
    If m_comparer IsNot Nothing Then
      m_allFound.Sort(m_comparer)
    End If
    Return New Result(m_allFound, found, m_status)
  End Function
End Class
The IncrementalSearch(Of T) class will now have an additional property named Comparer of type IComparer(Of T). When this is present, the results will be sorted after each call to Search:
Public NotInheritable Class IncrementalSearch(Of T)
  Inherits IncrementalSearch
  Private m_enumerator As IEnumerator(Of T)
  Private m_allFound As New List(Of T)
  Private m_comparer As IComparer(Of T)
Now it's possible to reuse existing mechanisms to sort a query without introducing eager evaluation:
Private Function GetFullMethodNameSorted( _
  ByVal doc As XDocument, ByVal name As String) _
  As IncrementalSearch

  Dim e = From method In doc...<Member> _
    Where method.@Name.Contains(name) _
    Where 0 = String.Compare(method.@Kind, "Method") _
    Join type In doc...<Type> _
      On type.@Id Equals method.@TypeId _
      And type.@AssemblyId Equals method.@AssemblyId _
    Select type.@Namespace & "." & type.@Name & "." & method.@Name
  Dim search = IncrementalSearch.Create(e)
  search.Comparer = StringComparer.Ordinal
  Return search
End Function
For handwritten queries, this solution often is enough. However, this is primarily designed to hold the result of LINQ queries that often contain anonymous types.
In code you cannot refer to the type of an anonymous type. Instead you must depend on type inference to create strongly typed generic classes. This can be accomplished by adding a factory method named Create onto the IncrementalSearch base class. Since there is only one generic parameter and the argument satisfies this parameter, the compiler can infer the argument and there is no need to explicitly write it out:
Dim e = From method In doc...<Member> _
  Where String.Compare(method.@Kind, "Method") = 0 _
  Select MethodName = method.@Name

' What is the type name of the element?
Dim s1 As New IncrementalSearch(Of ???)(e)

' No need to write the type name
Dim s2 = IncrementalSearch.Create(e)  
To continue supporting weakly typed collections, you can implement another child of IncrementalSearch named IncrementalObject­Search. This will be similar to the current implementation except that it will be based on an ArrayList instead of a List(Of Object):
Public NotInheritable Class IncrementalObjectSearch
  Inherits IncrementalSearch
  Private m_enumerator As IEnumerator
  Private m_allFound As New ArrayList
  Private m_comparer As IComparer

Sorting Anonymous Types
Queries often return a collection of anonymous types. Because you cannot describe the type of an anonymous type in code, it is impossible to write an implementation of IComparer(Of T).
Luckily you're working in Visual Basic here. You know what members are present on the underlying values returned by the query. Using late binding you can access these directly. Figure 8 contains an example of how to create a sorted version of the original query. Since you are using a Comparer instead of an order by statement, this will be a lazy evaluation and will perform well in the UI.
Private Function GetQueryMethodsOfNameSorted( _
  ByVal doc As XDocument, ByVal name As String) As IncrementalSearch
  Dim e = From method In doc...<Member> _
          Where method.@Name.Contains(name) _
          Where 0 = String.Compare(method.@Kind, "Method") _
          Join type In doc...<Type> On type.@Id Equals method.@TypeId _
          And type.@AssemblyId Equals method.@AssemblyId _
          Select Type = type.@Name, Method = method.@Name
  Dim x = New IncrementalObjectSearch(e)
  x.Comparer = New MethodsOfNameComparer()
  Return x
End Function

Private Class MethodsOfNameComparer
  Implements IComparer

  Public Function Compare(ByVal x As Object, _
    ByVal y As Object) As Integer Implements _
    System.Collections.IComparer.Compare

    Dim comp = StrComp(x.Type, y.Type)
      If comp <> 0 Then
        Return comp
      End If

    Return StrComp(x.Method, y.Method)
  End Function
End Class
This was a fun pattern to implement and write about, but I only have so much space in a column. There are still many other ways to make the UI even more efficient. Here are some ideas that you might want to experiment with on your own:
  • Reorganizing the source data to be more efficient for queries.
  • Using LINQ to create more efficiently organized data sets in memory, letting further queries operate against this better-structured data.
  • Offloading the query onto another thread.
  • Providing UI to indicate to the user a search is in progress.
  • Putting the DataGridView into VirtualMode.
  • Searching only against previous results rather than the entire document as the user types more text.
  • Introducing Parallel LINQ queries.
Hopefully I'll have time to expand on these in my blog. Also, be sure to check out the Visual Basic Team Blog (blogs.msdn.com/vbteam) and the MSDN® Visual Basic Developer Center (msdn.com/vbasic) for more information.

Send your questions and comments to instinct@microsoft.com.

Jared Parsons is a Software Engineer at Microsoft working on the Visual Basic Debugger and IDE. His passion lies in code, coding, and pretty much anything related to programming. He blogs regularly at blogs.msdn.com/jaredpar.

No comments:

Post a Comment

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...