Increase LINQ Query Performance
Jared Parsons
LINQ is a powerful tool enabling quick filtering data based on a standard query language. It can tear through a structured set of data using a simple and straightforward syntax. This makes it ideal for filtering data that is to be plugged into a data-binding scenario.
There is a catch, though. While it is true that LINQ is powerful and very efficient, large sets of data can still cause unexpected performance problems. This column will focus on a technique to achieve maximum performance from LINQ queries on large sets of data in order to create a responsive UI.
The Problem
In a recent project, I devised a word wheel search UI for a set of XML data using a LINQ-to-XML query for filtering the data. A word wheel search allows users to do a word-based search over a set of data. It will display results that contain a substring matching the text the user has typed in. The results are updated as the user types, providing immediate feedback.
I designed and implemented the word wheel UI using generated XML files. Users could type into a textbox, and the results would be narrowed to those elements whose name contained a substring of the typed-in text. The UI met the design specification and responded efficiently in my test environment. When the rest of the project was completed, my code began producing production XML files, I checked in the new code, and went home. Unfortunately, the next morning I found that QA had gifted me with many UI bugs. The UI had become unresponsive to the point of being unusable.
Figuring I had a bottleneck in one of my query conditions, I thought this would be an easy fix. But after profiling the UI and optimizing my query, I found that I couldn't improve the performance to an acceptable level. The efficiency of the query wasn't the issue; I was being blocked by the sheer amount of data. The test XML files only consisted of 5,000 to 10,000 elements, yet the production XML files had around 40 times as much data. Even with an ultra-fast query, searching several hundred thousand rows with no built-in indexing gets very sluggish.
I needed to find a way to speed up the search process, so I decided to take advantage of some special features of LINQ to improve performance: deferred execution and lazy evaluations. With these features, LINQ queries are not executed at the point at which they are defined. Instead, they are executed as you enumerate through the results. This gives a great deal of flexibility in how you use query results to efficiently scroll through data.
Building a Demo Application
In order to demonstrate my use of deferred execution and lazy evaluations, I'm going to walk you through building a simplified version of my word wheel search using a set of XML files and LINQ-to-XML queries. To make sure that you fully utilize the XML integration for Visual Basic®, it is necessary for you to add an XML schema to the project for the documents that you plan to process. XML schemas are similar to Microsoft® .NET Framework metadata in that they provide an outline for how the XML must look. This allows the Visual Studio® IDE to provide IntelliSense® for LINQ-to-XML queries.
Getting Visual Studio to provide IntelliSense for a schema is straightforward. Visual Basic will recognize all XML schemas that are in the current project. Simply add the schema to your project and then add an import statement for the XML namespace in any file where you will be defining a LINQ-to-XML query:
Imports <xmlns="http://tempuri.org/MsdnWheelSearch.xsd">
It is perfectly possible to use Xlinq without a schema. However, it's kind of like using Visual Studio with IntelliSense turned off. It's possible, but not any fun.
In the word wheel search application, the XML files contain information concerning assemblies, types, and members from the base class libraries (BCLs). Each XML element, or row, will contain an attribute named Id that uniquely identifies the row in its respective parent. This will be a generated number for the assembly rows. For all other rows, it will be the underlying metadata token. In addition, each row will also contain the unique ID of its parent. See Figure 1 for the XML schema of this data.
Figure 1 XML Schema
<?xml version="1.0" encoding="utf-8"?> <xs:schema id="MsdnWheelSearch" targetNamespace="http://contoso.com/MsdnWheelSearch.xsd" elementFormDefault="qualified" xmlns="http://contoso.com/MsdnWheelSearch.xsd" xmlns:mstns="http://contoso.com/MsdnWheelSearch.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" > <xs:element name="Info"> <xs:complexType> <xs:sequence> <xs:element name="Assemblies" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="Assembly" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Name" type="xs:string" /> <xs:attribute name="CodeBase" type="xs:string" /> <xs:attribute name="Id" type="xs:int" /> <xs:attribute name="FullName" type="xs:string" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Types" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="Type" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Name" type="xs:string" /> <xs:attribute name="Namespcae" type="xs:string" /> <xs:attribute name="Id" type="xs:int" /> <xs:attribute name="AssemblyId" type="xs:int" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Members" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="Member" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="Name" type="xs:string" /> <xs:attribute name="Kind" type="xs:string" /> <xs:attribute name="Id" type="xs:int" /> <xs:attribute name="TypeId" type="xs:int" /> <xs:attribute name="AssemblyId" type="xs:int" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
You can download the sample project associated with this column to get several XML files that fit this schema. Included are both a small and a very large data set. You can use the UI to easily perform the same queries across both sets of data to get an idea of the relative performance impact they have.
The main query on which I am going to focus for this column returns all members where the name contains the specific substring currently entered by the user. It also will display the type information for the type that owns this member. This is accomplished by filtering the method rows based on a name match and then joining the results with all types having the identical Id and AssemblyId attributes, as shown here:
Private Function GetQueryMethodsOfName( _ ByVal doc As XDocument, ByVal name As String) As IEnumerable Return From method In doc...<Member> _ Where method.@Name.Contains(name) _ Where 0 = String.Compare(method.@Kind, "Method") _ Join type In doc...<Type> _ On type.@Id Equals method.@TypeId _ And type.@AssemblyId Equals method.@AssemblyId _ Select Type = type.@Name, Method = method.@Name End Function
The sample code has other queries that explore joining the rows to probe for data in the BCL, and I encourage you to try them out.
Basic User Interface
Figure 2 shows the example application. The primary interface for the user is the textbox. As the user types, the application filters the query results to include only rows that contain data matching the input. As the user types, the application constantly updates this value to produce a more refined result for the user.Figure 3 shows the code for my wheel search class.
Figure 2 Sample Word Wheel Application (Click the image for a larger view)
Figure 3 Wheel Search Class
Public Class WheelSearch Private m_doc As XDocument = New XDocument() Private m_query As Query = Query.MethodsOfName Public Property Query() As Query Get Return m_query End Get Set(ByVal value As Query) m_query = value OnSearchTextChanged(Nothing, EventArgs.Empty) End Set End Property Public Property Xml() As XDocument Get Return m_doc End Get Set(ByVal value As XDocument) If value Is Nothing Then value = New XDocument End If m_doc = value OnSearchTextChanged(Nothing, EventArgs.Empty) End Set End Property Public Sub New() ' This call is required by the Windows Form Designer. InitializeComponent() ' Add any initialization after the InitializeComponent() call. m_dataGridView.AutoGenerateColumns = True End Sub Protected Overrides Sub OnLoad(ByVal e As System.EventArgs) MyBase.OnLoad(e) OnSearchTextChanged(Nothing, EventArgs.Empty) End Sub #Region "Event Handlers" Private Sub OnSearchTextChanged(ByVal sender As Object, _ ByVal e As EventArgs) Handles m_searchTb.TextChanged If m_doc Is Nothing Then Return End If m_dataGridView.IncrementalSearch = _ QueryContainer.Search(m_query, m_doc, m_searchTb.Text) End Sub #End Region End Class
The results of the query will be shown in a DataGridView. And while the results of a LINQ query typically work great with data binding, it's necessary for you to convert the results to a type such as List(Of Object) when using the DataGridView. It will be the responsibility of each individual query to return an enumeration of a type that is easily plugged into the data-binding infrastructure.
Now you should load up the UI and test out the performance of the word wheel search. For the smaller data set, you will notice that the performance is snappy. With the larger data set, the performance becomes very sluggish and almost unusable.
The reason for the poor performance can be traced directly to the conversion of the LINQ query into a List(Of Object). This can be verified with a profiler. What happens under the hood here is the List(Of Object) must add each result from the query immediately, forcing the query to fully process each piece of data it encompasses. With the smaller data set, this occurs quickly enough not to be noticed. With the larger data set, this is just too much to process in between keystrokes.
Methods for Improving Performance
Assuming that the structure of the data cannot be modified, how can you increase the responsiveness of the UI? One option is to not search as the user types but instead wait until he explicitly invokes a search (for example, by pressing the Enter key or clicking a button). This solution, however, provides a much poorer user experience because it doesn't provide any immediate feedback. In addition, it doesn't solve the problem that any given search could take a significant amount of time.
Could you improve performance by capping the number of rows returned by the query? Sure, but this still detracts from the user experience because it's not showing all of the data. You'd have to design more UI to tell the user you didn't finish the search and provide a mechanism to do a full search. The full search would leave you right back where you started—with a slow experience.
So you need to show all the data, and you can't modify the structure. But do you have to show all of the data at once? What if you only give the user a small subset of the data at a time, and if a key press is detected you abandon the current search and start a new one? This would be close to other instant-search technologies, such as that found in Microsoft Outlook®.
Incremental Search
A great feature of LINQ is that it supports deferred execution. Queries are not executed at the point that they are written in code; they are merely defined at that point. They are not actually executed until the call to MoveNext on the returned enumerator. You can take advantage of this to search the data incrementally and only spend a minimal amount of time possibly blocking the UI.
I defined a class called IncrementalSearch that works on any object implementing IEnumerable. All LINQ queries implement this interface. IncrementalSearch supports the concept of stopping a search after a certain time period expires. The algorithm is to get the next value by calling MoveNext until the specified time period has expired. At that point, the data found is returned in a new Result object (see Figure 4).
Figure 4 IncrementalSearch Class
Public Enum SearchStatus InProgress Completed End Enum Public Class Result Private m_allValues As IList Private m_incrementalValues As IList Private m_status As SearchStatus Public ReadOnly Property AllValues() As IList Get Return m_allValues End Get End Property Public ReadOnly Property IncrementalValues() As IList Get Return m_incrementalValues End Get End Property Public ReadOnly Property SearchStatus() As SearchStatus Get Return m_status End Get End Property Sub New(ByVal allValues As IList, _ ByVal incrementalValues As IList, _ ByVal status As SearchStatus) m_allValues = allValues m_incrementalValues = incrementalValues m_status = status End Sub End Class Public Class IncrementalSearch Private m_enumerable As IEnumerator Private m_allFound As New List(Of Object) Private m_status As SearchStatus = SearchStatus.InProgress Private m_delay As TimeSpan Public ReadOnly Property SearchStatus() As SearchStatus Get Return m_status End Get End Property Public Property Delay() As TimeSpan Get Return m_delay End Get Set(ByVal value As TimeSpan) m_delay = value End Set End Property Public Sub New(ByVal e As IEnumerable) m_delay = TimeSpan.FromSeconds(0.1) m_enumerable = e.GetEnumerator() End Sub Public Function Search() As Result If m_status = SearchStatus.Completed Then Return New Result(m_allFound, New List(Of Object), m_status) End If Dim found As New List(Of Object) Dim start = DateTime.Now Do If Not m_enumerable.MoveNext() Then m_enumerable = Nothing m_status = SearchStatus.Completed Exit Do End If found.Add(m_enumerable.Current) Loop Until DateTime.Now - start > m_delay m_allFound.AddRange(found) Return New Result(m_allFound, found, m_status) End Function End Class
The Search method returns a Result object based on the data it finds, including the values found in the current call to Search, the list of all values found in all previous calls to Search, and the current status of the search. This means that the UI can now return data in chunks. The maximum time you can delay will be for getting the next element instead of the entire set of rows:
Dim found As New List(Of Object) Dim start = DateTime.Now Do If Not m_enumerable.MoveNext() Then m_enumerable = Nothing m_status = SearchStatus.Completed Exit Do End If found.Add(m_enumerable.Current) Loop Until DateTime.Now - start > m_delay
The goal is for the UI to display the data through data binding and to provide an efficient search. Displaying the data should not cause the UI to hang. Return values are displayed in a custom DataGridView using the incremental search pattern. The example implementation is called DataGridViewIncrementalSearch and derives from DataGridView (see Figure 5). This implementation works great with data binding and is designed to perform well with large sets of data.
Figure 5 DataGridViewIncrementalSearch
Public Class DataGridViewIncrementalSearch Inherits DataGridView Private m_search As IncrementalSearch Private WithEvents m_timer As New Timer Public Property IncrementalSearch() As IncrementalSearch Get Return m_search End Get Set(ByVal value As IncrementalSearch) m_search = value m_timer.Start() End Set End Property Public Sub New() m_timer.Interval = CInt(TimeSpan.FromSeconds(0.2).TotalMilliseconds) End Sub Protected Overrides Sub Dispose(ByVal disposing As Boolean) If disposing Then m_timer.Dispose() End If MyBase.Dispose(disposing) End Sub Private Sub OnTick(ByVal sender As Object, ByVal e As EventArgs) _ Handles m_timer.Tick m_timer.Stop() If m_search Is Nothing OrElse m_search.SearchStatus = _ SearchStatus.Completed Then Return End If Dim result = m_search.Search() DataSource = Nothing DataSource = result.AllValues m_timer.Start() End Sub End Class
There is one extra property of the same type called IncrementalSearch. Whenever this value is set, the control will call the Search method and continually update the displayed set of values. To achieve this, the control will have to call the Search method at regular intervals until the data source is exhausted.
A Windows® Forms timer is ideal for this scenario. It can be configured to call a method on the UI thread at a particular interval and can be enabled or disabled. When the IncrementalSearch property is updated, it enables the timer. The handler function will call Search. Once the search is complete, the timer will be disabled.
One issue with a Windows Forms timer is that the interval is calculated independently of how long it takes a handler to perform an action on the Tick event. That means that if the time it takes to execute the handler is longer than the interval, there will be another Tick event waiting to go off almost immediately. In the worst case scenario, these events will stack up and make the UI unresponsive. In order to work around this, you can disable the timer while calling Search and re-enable it if there is more data to process (see Figure 6). No explicit column selection functionality was created for this DataGridView. For this example, I'll let data binding choose the columns.
Figure 6 Enabling and Disabling the Search Timer
Private Sub OnTick(ByVal sender As Object, ByVal e As EventArgs) _ Handles m_timer.Tick m_timer.Stop() If m_search Is Nothing _ OrElse m_search.SearchStatus = SearchStatus.Completed Then ' Search is completed don't restart the timer Return End If ' Get the next update from the incremental search Dim result = m_search.Search() DataSource = Nothing DataSource = result.AllValues ' Restart the timer to perform another search m_timer.Start() End Sub
One quirk in the code is the updating of the data source. The underlying IncrementalSearch object creates a List(Of Object) to hold the return values from the query. Every Result object contains a reference to this list instead of a copy. When updating the Data source property, the DataGridView will see that it is the same Object and will not refresh the displayed results. Setting the reference to Nothing in between will force it to reload the data.
While this is a nice, easy-to-use solution, it won't work in all scenarios. The underlying problem here is user delay. Because this is a single-threaded solution, there is no way to break the execution once you call MoveNext. Therefore, the longest you will delay the user from typing is now the DelayTime value for the IncrementalSearch class plus the longest time to find one element in the search.
Sorting and Lazy Evaluation
Most LINQ queries operate with lazy evaluation, which means that only a single element is processed each time MoveNext is called on the enumerator. The time it takes to find one element in the search is the same time it takes to find the next element. This allows for the result to be efficiently broken up into usable blocks of data:
Dim e = From method In doc...<Member> _ Where method.@Name.Contains(name) _ Where String.Compare(method.@Kind, "Method") = 0 _ Select MethodName = method.@Name ' Returns when the first matching <Member> is found Dim first = e.First()
This model will break down if the underlying LINQ query processes a substantial portion or all of the values during a single call to MoveNext. This form of query is known as eager evaluation. An example of this is the Order By operator, which returns the results in a sorted fashion. Because it is sorted, the first value cannot be returned until all parts of the query have finished executing. Otherwise it couldn't know which was the first element to return because the last element returned could be the first in terms of sorting. If you try the incremental search pattern with a query using eager evaluation, you are right back at the original problem:
Dim e = From method In doc...<Member> _ Where method.@Name.Contains(name) _ Where String.Compare(method.@Kind, "Method") = 0 _ Order By method.@Name _ Select MethodName = method.@Name ' The following line won't finish until the entire query is finished ' processing Dim first = e.First()
Once again, the advantage here is in only dealing with a subset of the rows returned. Since the code is only displaying a subset of the data to the user, that is the only data that needs to be sorted at any given time. The underlying LINQ query can return the data unsorted, and then the code can sort it as more comes along.
It is easy to modify the IncrementalSearch class to sort the data at the end of each Search call. There are two approaches available to searching the data. The first approach is to take advantage of the built-in List(Of T).Sort method. This has an average complexity of O(N LogN) and is, generally speaking, fast enough for most applications:
Dim search As New List(Of String)(SearchForMethodNames) search.Sort(StringComparer.OrdinalIgnoreCase)
The second method takes advantage of data being sorted at the start of the call. Instead of appending the data to the end of the existing data list, you could search for the proper place to insert and place the newly found data there. Finding the proper index is a relatively quick operation when compared to fully sorting the list, so at first this seems like the more logical choice. But there are two reasons why you cannot keep the list ordered in this manner.
First, while the IncrementalSearch class itself keeps the list sorted, there is nothing stopping the caller from reordering the values. So you cannot assume this is sorted. Second, there is an added cost to inserting into the middle of a List(Of T). Under the hood it is backed by an Array, so every element to the right of the value being inserted has to be shifted one element to the right. For large data structures, the time to do the shift is non-trivial and eventually makes this method of sorting much less efficient.
To use the collection's Sort method, you need a method for comparing the values. Ideally, you will be able to use existing comparers in the framework to sort well-known data types. However, the IncrementalSearch class is weakly typed, and most comparers are strongly typed to their object. You could not use String.Compare for a query that returned Strings, for instance, because they would be stored in a weakly typed object collection. To allow for searching and reuse of existing classes, you will need to make the IncrementalSearch class more flexible by providing both a weakly and strongly typed implementation.
Making Incremental Search more Flexible
Right now the IncrementalSearch class is very flexible because it works against untyped collections. This limits the usefulness of the results, though, because everything is typed as object. LINQ queries typically return collections of anonymous types, and IntelliSense makes it much easier to use the types. So you must make IncrementalSearch strongly typed in order to allow for more processing on the results.
There is a problem with making IncrementalSearch itself generic. If IncrementalSearch is generic, then the control wrapping IncrementalSearch must be generic as well, or you'll have to create an instance of IncrementalSearch that is bound to a particular type. However, the Windows Forms designer does not support unbound generic controls, meaning that you would lose all designer support; hence, it is not an option. Having the control bind IncrementalSearch to a specific type works well, but this will limit the reusability of the control to searches of that specific type.
The solution is a hybrid approach. The IncrementalSearch class remains non-generic and exposes the search results through non-generic collection interfaces such as IList. It then is made into an abstract class where the Search method becomes abstract. The Result class likewise will only expose non-generic collections. Keeping IncrementalSearch non-generic and providing abstract methods allows it to be reusable in a Windows Forms control and still have strongly typed collections.
IncrementalSearch(Of T) will be strongly typed on the search result and will derive from IncrementalSearch. It will override the Search method and provide the underlying strongly typed collection classes. See Figure 7 for the full definition of the refactored IncrementalSearch class.
Figure 7 Refactored IncrementalSearch Code
Public MustInherit Class IncrementalSearch Protected m_status As SearchStatus = SearchStatus.InProgress Protected m_delay As TimeSpan Public MustOverride ReadOnly Property AllValues() As IList Public ReadOnly Property SearchStatus() As SearchStatus Get Return m_status End Get End Property Public Property Delay() As TimeSpan Get Return m_delay End Get Set(ByVal value As TimeSpan) m_delay = value End Set End Property Public MustOverride Function Search() As Result Public Shared Function Create(Of T)(ByVal e As IEnumerable(Of T)) _ As IncrementalSearch(Of T) Return New IncrementalSearch(Of T)(e) End Function End Class Public NotInheritable Class IncrementalSearch(Of T) Inherits IncrementalSearch Private m_enumerator As IEnumerator(Of T) Private m_allFound As New List(Of T) Private m_comparer As IComparer(Of T) Public Overrides ReadOnly Property AllValues() _ As System.Collections.IList Get Return m_allFound End Get End Property Public Property Comparer() As IComparer(Of T) Get Return m_comparer End Get Set(ByVal value As IComparer(Of T)) m_comparer = value End Set End Property Public Sub New(ByVal e As IEnumerable(Of T)) m_delay = TimeSpan.FromSeconds(0.1) m_enumerator = e.GetEnumerator() End Sub Public Overrides Function Search() As Result If m_status = SearchStatus.Completed Then Return New Result(m_allFound, New List(Of T), m_status) End If Dim found As New List(Of T) Dim start = DateTime.Now Do If Not m_enumerator.MoveNext() Then ' IEnumerable(Of T) must be disposed m_enumerator.Dispose() m_enumerator = Nothing m_status = SearchStatus.Completed Exit Do End If found.Add(m_enumerator.Current) Loop Until DateTime.Now - start > m_delay m_allFound.AddRange(found) If m_comparer IsNot Nothing Then m_allFound.Sort(m_comparer) End If Return New Result(m_allFound, found, m_status) End Function End Class Public NotInheritable Class IncrementalObjectSearch Inherits IncrementalSearch Private m_enumerator As IEnumerator Private m_allFound As New ArrayList Private m_comparer As IComparer Public Overrides ReadOnly Property AllValues() As _ System.Collections.IList Get Return m_allFound End Get End Property Public Property Comparer() As IComparer Get Return m_comparer End Get Set(ByVal value As IComparer) m_comparer = value End Set End Property Public Sub New(ByVal e As IEnumerable) m_enumerator = e.GetEnumerator() End Sub Public Overrides Function Search() As Result If m_status <> SearchStatus.InProgress Then Return New Result(m_allFound, New List(Of Object), m_status) End If Dim found As New List(Of Object) Dim start = DateTime.Now Do If Not m_enumerator.MoveNext() Then m_enumerator = Nothing m_status = SearchStatus.Completed Exit Do End If found.Add(m_enumerator.Current) Loop Until DateTime.Now - start > m_delay m_allFound.AddRange(found) If m_comparer IsNot Nothing Then m_allFound.Sort(m_comparer) End If Return New Result(m_allFound, found, m_status) End Function End Class
The IncrementalSearch(Of T) class will now have an additional property named Comparer of type IComparer(Of T). When this is present, the results will be sorted after each call to Search:
Public NotInheritable Class IncrementalSearch(Of T) Inherits IncrementalSearch Private m_enumerator As IEnumerator(Of T) Private m_allFound As New List(Of T) Private m_comparer As IComparer(Of T)
Now it's possible to reuse existing mechanisms to sort a query without introducing eager evaluation:
Private Function GetFullMethodNameSorted( _ ByVal doc As XDocument, ByVal name As String) _ As IncrementalSearch Dim e = From method In doc...<Member> _ Where method.@Name.Contains(name) _ Where 0 = String.Compare(method.@Kind, "Method") _ Join type In doc...<Type> _ On type.@Id Equals method.@TypeId _ And type.@AssemblyId Equals method.@AssemblyId _ Select type.@Namespace & "." & type.@Name & "." & method.@Name Dim search = IncrementalSearch.Create(e) search.Comparer = StringComparer.Ordinal Return search End Function
For handwritten queries, this solution often is enough. However, this is primarily designed to hold the result of LINQ queries that often contain anonymous types.
In code you cannot refer to the type of an anonymous type. Instead you must depend on type inference to create strongly typed generic classes. This can be accomplished by adding a factory method named Create onto the IncrementalSearch base class. Since there is only one generic parameter and the argument satisfies this parameter, the compiler can infer the argument and there is no need to explicitly write it out:
Dim e = From method In doc...<Member> _ Where String.Compare(method.@Kind, "Method") = 0 _ Select MethodName = method.@Name ' What is the type name of the element? Dim s1 As New IncrementalSearch(Of ???)(e) ' No need to write the type name Dim s2 = IncrementalSearch.Create(e)
To continue supporting weakly typed collections, you can implement another child of IncrementalSearch named IncrementalObjectSearch. This will be similar to the current implementation except that it will be based on an ArrayList instead of a List(Of Object):
Public NotInheritable Class IncrementalObjectSearch Inherits IncrementalSearch Private m_enumerator As IEnumerator Private m_allFound As New ArrayList Private m_comparer As IComparer
Sorting Anonymous Types
Queries often return a collection of anonymous types. Because you cannot describe the type of an anonymous type in code, it is impossible to write an implementation of IComparer(Of T).
Luckily you're working in Visual Basic here. You know what members are present on the underlying values returned by the query. Using late binding you can access these directly. Figure 8 contains an example of how to create a sorted version of the original query. Since you are using a Comparer instead of an order by statement, this will be a lazy evaluation and will perform well in the UI.
Figure 8 Custom Sorting
Private Function GetQueryMethodsOfNameSorted( _ ByVal doc As XDocument, ByVal name As String) As IncrementalSearch Dim e = From method In doc...<Member> _ Where method.@Name.Contains(name) _ Where 0 = String.Compare(method.@Kind, "Method") _ Join type In doc...<Type> On type.@Id Equals method.@TypeId _ And type.@AssemblyId Equals method.@AssemblyId _ Select Type = type.@Name, Method = method.@Name Dim x = New IncrementalObjectSearch(e) x.Comparer = New MethodsOfNameComparer() Return x End Function Private Class MethodsOfNameComparer Implements IComparer Public Function Compare(ByVal x As Object, _ ByVal y As Object) As Integer Implements _ System.Collections.IComparer.Compare Dim comp = StrComp(x.Type, y.Type) If comp <> 0 Then Return comp End If Return StrComp(x.Method, y.Method) End Function End Class
This was a fun pattern to implement and write about, but I only have so much space in a column. There are still many other ways to make the UI even more efficient. Here are some ideas that you might want to experiment with on your own:
- Reorganizing the source data to be more efficient for queries.
- Using LINQ to create more efficiently organized data sets in memory, letting further queries operate against this better-structured data.
- Offloading the query onto another thread.
- Providing UI to indicate to the user a search is in progress.
- Putting the DataGridView into VirtualMode.
- Searching only against previous results rather than the entire document as the user types more text.
- Introducing Parallel LINQ queries.
Hopefully I'll have time to expand on these in my blog. Also, be sure to check out the Visual Basic Team Blog (blogs.msdn.com/vbteam) and the MSDN® Visual Basic Developer Center (msdn.com/vbasic) for more information.
Send your questions and comments to instinct@microsoft.com.
Jared Parsons is a Software Engineer at Microsoft working on the Visual Basic Debugger and IDE. His passion lies in code, coding, and pretty much anything related to programming. He blogs regularly at blogs.msdn.com/jaredpar.
No comments:
Post a Comment