Does the “LINQ to Objects” provider have built-in performance optimization?

Article
01/26/2009

Let’s start with the basics and maybe repeat some information that many of you already know. One of the most important concepts in LINQ performance and optimization is, of course, deferred execution. It simply means that when you declare a variable and assign it a query expression, that expression is not executed immediately.

// Query is not executed.

var query = from item in storage select item;

The variable query now stores the command, and the query execution is deferred until you request the actual data from the query. This usually happens either within a foreach loop or when you call an aggregate method such as Min, Max, and Average, or when you cache the query results using the ToList or ToArray methods.

// foreach loop.

foreach (var item in query)

Console.WriteLine(item);

// Count method.

int total = query.Count();

// ToArray method.

var cachedQuery = query.ToArray();

Now let’s look at what else is happening behind the scenes. Does any compiler-level optimization happen during the query execution? The answer is yes. However, there is a catch. From now on we will talk only about queries for IEnumerable and IEnumerable<T> collections that use the “LINQ to Objects” LINQ provider. For other LINQ providers, including LINQ to SQL and LINQ to XML, different optimization rules might apply.

Note: It is often believed that because of deferred execution it takes longer to execute a query for the first time. However, in the case of LINQ to Objects queries, there is no difference between the first execution and subsequent ones. With other LINQ providers the rules might be different (for example, there might be some caching going on), but you need to refer to the particular provider's documentation for details.

The LINQ to Objects queries are optimized in the following cases:

· Some method calls are optimized if the data source implements a necessary interface. The following table lists these optimizations.


	IEnumerable<T> T
	ICollection ICollection<T>
	ICollection ICollection<T>
	IList IList<T> Count

· If there is a sequence of one or more Where operators immediately followed by a sequence of one or more Select operators, the query creates a single IEnumerable or IEnumerable<T> object and generates no intermediate ones.
Consider the following query:

var query = from item in storage

where item.Category = "Food"

where item.Price < 100

select item;

In this case, only one IEnumerable object is be generated for the query.

· If you query an array or a list, the enumerator from the IEnumerable or IEnumerable<T> interface is not used in foreach loops. Instead, a simple for loop over the length of the array or list is created behind the scenes, and the elements are accessed directly.

Furthermore, the where operators are implemented as simple if statements, so no intermediate enumerators or enumerable is created.

Once again, other LINQ providers might have their own optimization rules. But the above rules should give you some idea about how LINQ to Objects works.

[author: Alexandra Rusina, Programming Writer]

Does the “LINQ to Objects” provider have built-in performance optimization?

Additional resources