Tag Archives: IEnumerable

LINQ: Cast and OfType

All the LINQ operations we’ve seen so far have worked on lists of type IEnumerable<T>, where T is the data type of the objects in the list. This is fine for most of the current data types in C#, such as the generic List<T> and the C# array. However, some older data types, such as the ArrayList, do not implement IEnumerable<T>; rather they implement the older, non-generic IEnumerable interface. If we want to use these older data types with LINQ, we must convert them to IEnumerable<T>.

There are two methods that can be used to do this: Cast<T> and OfType<T>. Let’s look at Cast<T> first.

Using our list of Canadian prime ministers, we can call the method that returns an ArrayList instead of an array. To apply LINQ operators to this list, we need to cast it first:

      ArrayList pmArrayList01 = PrimeMinisters.GetPrimeMinistersArrayList();

      var sorted = pmArrayList01.Cast<PrimeMinisters>().OrderBy(pm => pm.lastName);
      Console.WriteLine("*** cast ArrayList");
      foreach (var pm in sorted)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

If we tried to call OrderBy() directly on pmArrayList01, we would find that the code wouldn’t compile. If you’re using Visual Studio’s Intellisense, you’ll also notice that most of the LINQ functions don’t show up in the list anyway. The problem is that the ArrayList is not an IEnumerable<T>.

We call Cast<PrimeMinisters> on this list first, followed by a call to OrderBy() to sort the list by last name. Thus the general rule is that the object calling Cast<T> must implement IEnumerable, and the output from Cast<T> is of type IEnumerable<T>.

With this code, we get the expected output:

Abbott, John
Bennett, Richard
Borden, Robert
Bowell, Mackenzie
Campbell, Kim
Chrétien, Jean
Clark, Joe
Diefenbaker, John
Harper, Stephen
Laurier, Wilfrid
Macdonald, John
Mackenzie, Alexander
Mackenzie King, William
Martin, Paul
Meighen, Arthur
Mulroney, Brian
Pearson, Lester
St. Laurent, Louis
Thompson, John
Trudeau, Pierre
Tupper, Charles
Turner, John

Now, an ArrayList can store items of any data type (it’s defined to accept the generic ‘object’ type), so we could mix things up a bit and add some ordinary strings onto the end of the list of prime ministers. That is, we could try something like adding this code after that above:

      pmArrayList01.Add("A string item");
      pmArrayList01.Add("Isn't this interesting?");
      pmArrayList01.Add("End of list");
      sorted = pmArrayList01.Cast<PrimeMinisters>().OrderBy(pm => pm.lastName);
      foreach (var pm in sorted)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

There’s an obvious problem in that the three strings we’ve added at the end don’t have a firstName and lastName field, so we wouldn’t expect the code to run anyway. However, we find that code does in fact compile without errors. If we try to run it, we get the following error:

Unhandled Exception: System.InvalidCastException: Unable to cast object of type
'System.String' to type 'LinqObjects01.PrimeMinisters'.

The problem is that the Cast<PrimeMinisters> method requires that all elements in the list passed to it are of type PrimeMinisters, and it throws an InvalidCastException if any elements in the input list aren’t of the correct type.

There is one important point about Cast<T>: remember that it is a deferred operator, so it isn’t actually executed until an attempt is made to enumerate its output. That is, if we omit the foreach loop in the above code, but retain the (erroneous) call to Cast<PrimeMinisters>, the code will compile and run, seemingly without errors, since we haven’t attempted to enumerate the ‘sorted’ object. The actual exception is thrown only in the foreach loop when we try to enumerate the elements of ‘sorted’.

If we want to handle lists that contain mixed types, we can use the OfType<T> method instead. This method accepts input IEnumerable objects containing any mixture of types, and looks for those of type T. It will add these objects to its output list and ignore any objects that aren’t of type T. So we can try the following on our mixed ArrayList:

      pmArrayList01.Add("A string item");
      pmArrayList01.Add("Isn't this interesting?");
      pmArrayList01.Add("End of list");

      var sorted = pmArrayList01.OfType<PrimeMinisters>().OrderBy(pm => pm.lastName);
      foreach (var pm in sorted)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

      var sortedStrings = pmArrayList01.OfType<string>().OrderBy(pm => pm);
      foreach (var pm in sortedStrings)
      {
        Console.WriteLine(pm);
      }

After adding the strings, we first call OfType<PrimeMinisters> and pass the result to OrderBy. The OfType call will look only for elements in the ArrayList of type PrimeMinisters, and ignore the string objects. Thus the list passed to OrderBy contains only the correct type, and the ordering and subsequent foreach loop both work properly. The results of this foreach loop are the same as with our original Cast above.

In the last bit of code, we use OfType<string>, which throws away all the PrimeMinisters objects and saves the three strings. Of course, we have to change the predicate in OrderBy so it operates on a simple string rather than a PrimeMinisters object, and similarly for the WriteLine() in the foreach loop. The output of this final loop is:

A string item
End of list
Isn't this interesting?

The Cast and OfType operators can also be applied to IEnumerable lists. Cast isn’t much use in this regard, since if we start off with an IEnumerable, we don’t need to convert it to the same list. However, OfType is useful as a filter, since it can be used to create a list of a specific data type from a more generic starting list.

For example, if we create an (somewhat contrived, admittedly) array of type ‘object’ which contains both PrimeMinisters objects and strings, by putting the following method in our PrimeMinisters class:

    public static object[] GetObjectArray()
    {
      object[] pmArray = new object[GetPrimeMinistersArrayList().Count + 3];
      object[] temp = (object[])GetPrimeMinistersArrayList().ToArray(typeof(PrimeMinisters));
      for (int i = 0; i < temp.Count(); i++)
      {
        pmArray[i] = temp[i];
      }
      pmArray[pmArray.Count() - 3] = "String 1";
      pmArray[pmArray.Count() - 2] = "String 2";
      pmArray[pmArray.Count() - 1] = "String 3";
      return pmArray;
    }

We can isolate the PrimeMinisters objects by using OfType on the object[] array (remember that a C# array is an IEnumerable<T>).

      object[] pmArray01 = PrimeMinisters.GetObjectArray();
      var sortedArray = pmArray01.OfType<PrimeMinisters>().OrderBy(pm => pm.lastName);
      foreach (var pm in sortedArray)
      {
        Console.WriteLine("{0}, {1}", pm.lastName, pm.firstName);
      }

Finally, it’s worth noting that there is a third conversion operator called AsEnumerable<T> which does take an IEnumerable<T> as input and produces another IEnumerable<T> as output. Although this may seem pointless, it’s actually essential when we deal with databases. But we’ll leave that until we consider the use of LINQ with databases.

LINQ – Introduction and a simple select clause

LINQ (short for Language INtegrated Query) is an addition to Microsoft’s .NET languages (C# and Visual Basic) that allows queries to be carried out on various data sources, ranging from the more primitive data types such as arrays and lists to more structured data sources such as XML and databases. Since I haven’t used Visual Basic since version 3, I’ll consider only C# code in these posts.

Deferred versus non-deferred operators

Before we start writing code, there are a few concepts that are important to understand. First, LINQ queries consist of commands that fall into two main categories: deferred and non-deferred. A query containing only deferred commands is not actually performed until the query is enumerated. What this means is that the code that specifies the query merely constructs an object containing instructions for performing the query, and the query itself is not performed until some other code (typically a foreach loop iterating through the results of the query) attempts to access the result of the query. This can be a mixed blessing. On one hand, it means that each time you access the query, an up to date version of the results is provided. If you’re querying a database, for example, then if changes are made to the database in between queries, the later query will return the updated information.

Sometimes, of course, this isn’t what you want – you want to run the query once and save these results for all future uses, even if the data source changes in the meantime. This is possible by using one of LINQ’s non-deferred commands, since placing any such command in a query forces the query to be run at the time it is defined, enabling you to save results for later use.

As you might guess, it is very important to know which LINQ commands are deferred and which are non-deferred. Failure to distinguish between them can lead to bugs in the code that are hard to find. For example, since a deferred query is not actually run until some code accesses the results of the query, any errors in the query definition will not become apparent until this later code is run.

Query expression syntax

A second important concept is that many LINQ commands can be written using two types of syntax. All LINQ commands can be written using standard query operators, which are essentially just method calls. LINQ commands are performed on data sources, and the usual way of calling an operator on such a data source is with a statement of the form dataSource.LinqOperator(parameters). In this syntax, LinqOperator() is an extension method (not that you really need to know this to use it).

Although any LINQ command can be written using standard query operators, there is an alternative syntax known as query expression syntax which can be used for the most common query operators. Query expressions essentially introduce a number of new keywords into C#, and resemble standard SQL statements more than method calls. It is important to realize, however, that not all LINQ commands can be written using query expressions. In the examples that follow, we’ll try to give both forms if it is possible to use both syntaxes to write a query.

Data sources

We mentioned above that LINQ allows you to query several types of data source, ranging from simple types up to complex structures such as databases. In fact, LINQ contains separate versions of many commands for different types of data. We won’t go into the details quite yet, but it’s important to remember that commands used for querying objects such as arrays may differ from those for querying databases, even if they have the same name.

We’ll look at LINQ for objects first and consider more complex data structures later. A data source for a LINQ for objects query must implement the IEnumerable<T> generic interface, where T is the type of data stored in the object. If this sounds frightening, don’t worry unduly. In recent versions of C#, the common data sources such as arrays and lists implement IEnumerable<T> by default, so you can apply LINQ to these data types without any problems. For legacy data sources such as the ArrayList, there are ways of converting them to the correct form so LINQ can be applied to them too. We’ll get to that in due course.

A simple LINQ query

That’s about all the background you need to start looking at some LINQ code. We’ll begin with probably the most common command, which is ‘select’. First, we need some data. We’ll use a list of all of Canada’s prime ministers, which we’ll encapsulate in a class like this:

  public class PrimeMinisters
  {
    public int id;
    public string firstName, lastName, party;

    public static ArrayList GetPrimeMinistersArrayList()
    {
      ArrayList primes = new ArrayList();

      primes.Add(new PrimeMinisters { id = 1, firstName = "John", lastName = "Macdonald", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 2, firstName = "Alexander", lastName = "Mackenzie", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 3, firstName = "John", lastName = "Abbott", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 4, firstName = "John", lastName = "Thompson", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 5, firstName = "Mackenzie", lastName = "Bowell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 6, firstName = "Charles", lastName = "Tupper", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 7, firstName = "Wilfrid", lastName = "Laurier", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 8, firstName = "Robert", lastName = "Borden", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 9, firstName = "Arthur", lastName = "Meighen", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 10, firstName = "William", lastName = "Mackenzie King", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 11, firstName = "Richard", lastName = "Bennett", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 12, firstName = "Louis", lastName = "St. Laurent", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 13, firstName = "John", lastName = "Diefenbaker", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 14, firstName = "Lester", lastName = "Pearson", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 15, firstName = "Pierre", lastName = "Trudeau", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 16, firstName = "Joe", lastName = "Clark", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 17, firstName = "John", lastName = "Turner", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 18, firstName = "Brian", lastName = "Mulroney", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 19, firstName = "Kim", lastName = "Campbell", party = "Conservative" });
      primes.Add(new PrimeMinisters { id = 20, firstName = "Jean", lastName = "Chrétien", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 21, firstName = "Paul", lastName = "Martin", party = "Liberal" });
      primes.Add(new PrimeMinisters { id = 22, firstName = "Stephen", lastName = "Harper", party = "Conservative" });

      return primes;
    }

    public override string ToString()
    {
      return id + ". " + firstName + " " + lastName + " (" + party + ")";
    }

    public static PrimeMinisters[] GetPrimeMinistersArray()
    {
      return (PrimeMinisters[])GetPrimeMinistersArrayList().ToArray(typeof(PrimeMinisters));
    }
  }

We’ve provided two forms of this data. The first method creates an old-fashioned ArrayList (which we’ll use later), and the last method converts this to a standard array. We’ve provided an override of the ToString() method as well so that we can print out each prime minister neatly.

A simple starting point is some LINQ code that just prints out the entire list of prime ministers. We can do this using a query expression as follows:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      IEnumerable<PrimeMinisters> pmList = from pm in primeMinisters
                                           select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

We retrieve the array using the static method GetPrimeMinistersArray(). Remember that a C# array already implements IEnumerable<T>, so we can use it directly in a LINQ query. The query begins with a ‘from’ command. The clause ‘from pm in primeMinisters’ means that each element of the primeMinisters array will be examined, and the element is referred to as ‘pm’ while it’s being examined. The ‘select’ clause says what is to be returned, or yielded, in response to each element passed to it. In this case, we simply return pm for each pm passed to it, so we get a sequence of PrimeMinisters objects as the result of the query. Note that we’ve declared the result of the query as ‘pmList’, which is of type IEnumerable<PrimeMinisters>. Of course, since this is an interface, it doesn’t tell you the actual data type of the sequence that is returned by the query. You can find this type by stepping through the code using the debugger, and it turns out to be something quite unfriendly (in my case {System.Linq.Enumerable.WhereSelectArrayIterator<LinqObjects01.PrimeMinisters,LinqObjects01.PrimeMinisters>}). This shouldn’t cause any problems since the IEnumerable<T> interface provides enough methods to allow you to use the data in pretty well any way you like.

The output from this code is:

1. John Macdonald (Conservative)
2. Alexander Mackenzie (Liberal)
3. John Abbott (Conservative)
4. John Thompson (Conservative)
5. Mackenzie Bowell (Conservative)
6. Charles Tupper (Conservative)
7. Wilfrid Laurier (Liberal)
8. Robert Borden (Conservative)
9. Arthur Meighen (Conservative)
10. William Mackenzie King (Liberal)
11. Richard Bennett (Conservative)
12. Louis St. Laurent (Liberal)
13. John Diefenbaker (Conservative)
14. Lester Pearson (Liberal)
15. Pierre Trudeau (Liberal)
16. Joe Clark (Conservative)
17. John Turner (Liberal)
18. Brian Mulroney (Conservative)
19. Kim Campbell (Conservative)
20. Jean Chrétien (Liberal)
21. Paul Martin (Liberal)
22. Stephen Harper (Conservative)

As mentioned above, we can also write this query using standard method notation. We get:

      IEnumerable<PrimeMinisters> pmList2 = primeMinisters.Select(pm => pm);
      foreach (PrimeMinisters pm in pmList2)
      {
        Console.WriteLine(pm);
      }

This form reveals the underlying structure of the query expression. Select() is actually an extension method with prototype

public static IEnumerable<S> Select<T, S>(
  this IEnumerable<T> source,
  Func<T, S> selector);

Select() takes a source argument of type IEnumerable<T> (which is primeMinisters in our example) and a selector which is a Func that specifies what should be returned for each element in source. We’ve used a lambda expression to provide the selector. In this case, the selector just returns the same object that was passed to it. This means that the return data type S is the same as the source data type T (they are both of type PrimeMinisters).

Note that the ‘from pm in primeMinisters’ clause in the query expression is replaced by giving primeMinisters as the source for the Select() method. In the query expression we declared the variable for the elements in the source by saying ‘from pm in…’, while in the method expression this variable is declared by giving it as the argument in the lambda expression.

In fact, the compiler translates a query expression into a method expression, so the first example will simply be translated into the second.

One final note for this introductory post. We’ve specified the data type of the result of the query explicitly by saying it’s IEnumerable<PrimeMinisters>. In many cases we won’t know the actual data type being returned; it may even be an anonymous type making it impossible to specify. In such cases, we can simply use ‘var’ to declare the return type of the query. Thus we could rewrite the first query above as:

      PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray();
      var pmList = from pm in primeMinisters
                   select pm;
      foreach (PrimeMinisters pm in pmList)
      {
        Console.WriteLine(pm);
      }

Remember that ‘var’ knows the internal data type of its object, so we can still access individual fields of each pm object if we want.