LINQ allows sequences to be combined using the usual set operations of union, intersection and difference. Using these commands is pretty straight forward, so we’ll give an example which illustrates all of them.
Using our lists of Canada’s prime ministers and their terms of office, we first use a Join() to construct a customized list where each term of office is connected with the name of the PM who served it. We then build two subsets of this list by considering terms that started in the 20th century and terms that ended in the 20th century. We can then apply the set operators to these two lists and see what we get. The code is:
PrimeMinisters[] primeMinisters = PrimeMinisters.GetPrimeMinistersArray(); Terms[] terms = Terms.GetTermsArray(); var pmList41 = primeMinisters .Join(terms, pm => pm.id, term => term.id, (pm, term) => new { first = pm.firstName, last = pm.lastName, start = term.start, end = term.end }) .OrderBy(pmTerm => pmTerm.start); var start20 = pmList41 .Where(pmTerm => pmTerm.start.Year > 1900 && pmTerm.start.Year < 2001); var end20 = pmList41 .Where(pmTerm => pmTerm.end.Year > 1900 && pmTerm.end.Year < 2001); var startOrEnd20 = start20.Union(end20) .OrderBy(pmTerm => pmTerm.start); var startAndEnd20 = start20.Intersect(end20); var startExceptEnd20 = start20.Except(end20); var endExceptStart20 = end20.Except(start20); foreach (var pmTerm in start20) { Console.WriteLine("{0} {1}: {2: dd MMM yyyy} to {3: dd MMM yyyy}", pmTerm.first, pmTerm.last, pmTerm.start, pmTerm.end); }
First, we’ll see the two sets we start with. Terms starting in the 20th century are in the list start20:
Robert Borden: 10 Oct 1911 to 10 Jul 1920 Arthur Meighen: 10 Jul 1920 to 29 Dec 1921 William Mackenzie King: 29 Dec 1921 to 28 Jun 1926 Arthur Meighen: 29 Jun 1926 to 25 Sep 1926 William Mackenzie King: 25 Sep 1926 to 07 Aug 1930 Richard Bennett: 07 Aug 1930 to 23 Oct 1935 William Mackenzie King: 23 Oct 1935 to 15 Nov 1948 Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957 John Diefenbaker: 21 Jun 1957 to 22 Apr 1963 Lester Pearson: 22 Apr 1963 to 20 Apr 1968 Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979 Joe Clark: 04 Jun 1979 to 02 Mar 1980 Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984 John Turner: 30 Jun 1984 to 16 Sep 1984 Brian Mulroney: 17 Sep 1984 to 24 Jun 1993 Kim Campbell: 25 Jun 1993 to 03 Nov 1993 Jean Chrétien: 04 Nov 1993 to 11 Dec 2003
Terms ending in the 20th century are in end20:
Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911 Robert Borden: 10 Oct 1911 to 10 Jul 1920 Arthur Meighen: 10 Jul 1920 to 29 Dec 1921 William Mackenzie King: 29 Dec 1921 to 28 Jun 1926 Arthur Meighen: 29 Jun 1926 to 25 Sep 1926 William Mackenzie King: 25 Sep 1926 to 07 Aug 1930 Richard Bennett: 07 Aug 1930 to 23 Oct 1935 William Mackenzie King: 23 Oct 1935 to 15 Nov 1948 Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957 John Diefenbaker: 21 Jun 1957 to 22 Apr 1963 Lester Pearson: 22 Apr 1963 to 20 Apr 1968 Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979 Joe Clark: 04 Jun 1979 to 02 Mar 1980 Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984 John Turner: 30 Jun 1984 to 16 Sep 1984 Brian Mulroney: 17 Sep 1984 to 24 Jun 1993 Kim Campbell: 25 Jun 1993 to 03 Nov 1993
One of the properties of a mathematical set is that it contains no duplicates. If we take the union of start20 and end20, each of the terms of office must appear only once. The way Union() works is it enumerates the first sequence, comparing each element to see if it is distinct from the others that have already been enumerated. Only distinct elements are saved. Then the second sequence is enumerated and again each element is compared with those elements saved from the first set. Thus the result of the command var startOrEnd20 = start20.Union(end20) is
Robert Borden: 10 Oct 1911 to 10 Jul 1920 Arthur Meighen: 10 Jul 1920 to 29 Dec 1921 William Mackenzie King: 29 Dec 1921 to 28 Jun 1926 Arthur Meighen: 29 Jun 1926 to 25 Sep 1926 William Mackenzie King: 25 Sep 1926 to 07 Aug 1930 Richard Bennett: 07 Aug 1930 to 23 Oct 1935 William Mackenzie King: 23 Oct 1935 to 15 Nov 1948 Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957 John Diefenbaker: 21 Jun 1957 to 22 Apr 1963 Lester Pearson: 22 Apr 1963 to 20 Apr 1968 Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979 Joe Clark: 04 Jun 1979 to 02 Mar 1980 Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984 John Turner: 30 Jun 1984 to 16 Sep 1984 Brian Mulroney: 17 Sep 1984 to 24 Jun 1993 Kim Campbell: 25 Jun 1993 to 03 Nov 1993 Jean Chrétien: 04 Nov 1993 to 11 Dec 2003 Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911
Note that although all the elements are there, the one element that is in end20 but not in start 20 (Laurier’s term) appears at the end even though by date it came first. This is because Union() yields elements in the order in which it processes them, and since end20 was the second sequence processed, its unique entries appear at the end. We have added the OrderBy() clause in the code above to fix this, and this just results in placing Laurier’s term at the start.
It’s worth pausing here to reflect on what Union() and the other set commands do when they compare two elements in the sequence to test them for equality. In the absence of any external comparer class or implementation of the IEquatable<T> interface, the equality test is done by calling the built-in Equals() method from the Object class. For reference data types (that is, objects created from classes as opposed to value types such as int), Equals() compares the references of its two objects and returns ‘true’ only if both objects have the same reference, that is, if the two objects are actually the same object, in the sense that they occupy the same location in memory. Thus two different objects that happened to have the same values for all their data fields would not be considered equal by the default Equals() method.
The code we have written here thus depends implicitly on the fact that the set operators don’t make copies of the objects they are comparing; rather they simply reorder and classify the existing objects without modifying or copying them. If we wanted to make the code a bit more iron-clad, we should provide overrides of the Equals() and GetHashCode() methods, and/or implement the IEquatable<T> interface, as described in the earlier post. In some cases, this is difficult or impossible to do, as in our example where the data type being manipulated by the set operators is an anonymous type.
With that caution in mind, we can look at the results of the other set operators. The intersection of start20 and end20 gives startAndEnd20:
Robert Borden: 10 Oct 1911 to 10 Jul 1920 Arthur Meighen: 10 Jul 1920 to 29 Dec 1921 William Mackenzie King: 29 Dec 1921 to 28 Jun 1926 Arthur Meighen: 29 Jun 1926 to 25 Sep 1926 William Mackenzie King: 25 Sep 1926 to 07 Aug 1930 Richard Bennett: 07 Aug 1930 to 23 Oct 1935 William Mackenzie King: 23 Oct 1935 to 15 Nov 1948 Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957 John Diefenbaker: 21 Jun 1957 to 22 Apr 1963 Lester Pearson: 22 Apr 1963 to 20 Apr 1968 Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979 Joe Clark: 04 Jun 1979 to 02 Mar 1980 Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984 John Turner: 30 Jun 1984 to 16 Sep 1984 Brian Mulroney: 17 Sep 1984 to 24 Jun 1993 Kim Campbell: 25 Jun 1993 to 03 Nov 1993
Laurier’s and Chrétien’s terms have been omitted since they extended outside the 20th century. In this case we didn’t need an OrderBy() since all the included terms were in the first sequence and were already ordered.
The set difference A – B produces the set that contains all elements in A that are not in B. Thus startExceptEnd contains terms that started in the 20th century but didn’t end there. The LINQ operator for set difference is Except(), and the results are:
Jean Chrétien: 04 Nov 1993 to 11 Dec 2003
Swapping start20 and end 20 produces terms that ended in the 20th century but didn’t start then:
Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911
There is a fourth operator that, although it’s not a set operator in the mathematical sense, is lumped in with them. This is Distinct(), which removes duplicates from a sequence. For example, suppose we join together end20 and start20 and then order the results by start date, as with the code:
var endPlusStart20 = end20.Concat(start20) .OrderBy(pmTerm => pmTerm.start);
We’ve used the Concat() operator which glues its argument onto the sequence that calls it. Note that Concat() is not the same as Union(), since it doesn’t exclude duplicates from its output. The result of this code is (with a loop to print out the results, as usual):
Wilfrid Laurier: 11 Jul 1896 to 06 Oct 1911 Robert Borden: 10 Oct 1911 to 10 Jul 1920 Robert Borden: 10 Oct 1911 to 10 Jul 1920 Arthur Meighen: 10 Jul 1920 to 29 Dec 1921 Arthur Meighen: 10 Jul 1920 to 29 Dec 1921 William Mackenzie King: 29 Dec 1921 to 28 Jun 1926 William Mackenzie King: 29 Dec 1921 to 28 Jun 1926 Arthur Meighen: 29 Jun 1926 to 25 Sep 1926 Arthur Meighen: 29 Jun 1926 to 25 Sep 1926 William Mackenzie King: 25 Sep 1926 to 07 Aug 1930 William Mackenzie King: 25 Sep 1926 to 07 Aug 1930 Richard Bennett: 07 Aug 1930 to 23 Oct 1935 Richard Bennett: 07 Aug 1930 to 23 Oct 1935 William Mackenzie King: 23 Oct 1935 to 15 Nov 1948 William Mackenzie King: 23 Oct 1935 to 15 Nov 1948 Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957 Louis St. Laurent: 15 Nov 1948 to 21 Jun 1957 John Diefenbaker: 21 Jun 1957 to 22 Apr 1963 John Diefenbaker: 21 Jun 1957 to 22 Apr 1963 Lester Pearson: 22 Apr 1963 to 20 Apr 1968 Lester Pearson: 22 Apr 1963 to 20 Apr 1968 Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979 Pierre Trudeau: 20 Apr 1968 to 03 Jun 1979 Joe Clark: 04 Jun 1979 to 02 Mar 1980 Joe Clark: 04 Jun 1979 to 02 Mar 1980 Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984 Pierre Trudeau: 03 Mar 1980 to 29 Jun 1984 John Turner: 30 Jun 1984 to 16 Sep 1984 John Turner: 30 Jun 1984 to 16 Sep 1984 Brian Mulroney: 17 Sep 1984 to 24 Jun 1993 Brian Mulroney: 17 Sep 1984 to 24 Jun 1993 Kim Campbell: 25 Jun 1993 to 03 Nov 1993 Kim Campbell: 25 Jun 1993 to 03 Nov 1993 Jean Chrétien: 04 Nov 1993 to 11 Dec 2003
All the terms except the first and last are duplicated. If we now feed the result of this into the Distinct() operator, it strips out the duplicates and returns the original list. The code is:
var endPlusStart20 = end20.Concat(start20) .OrderBy(pmTerm => pmTerm.start) .Distinct();
Distinct() uses the same equality test as the other set operators, so in order for it work, the above list must contain the same object duplicated in each case rather two objects, one of which is a copy of the other. Again, if you want to remove duplicates where different objects have the same data field values, you’ll need to provide a customized equality tester in some form (choose one of: implement IEquatable<T>, override Equals() and GetHashCode() from object, or provide a separate class that implements IEqualityComparer<T> for your data type).
Finally, all four of these operators have a second version in which we can pass an IEqualityComparer<T> object as a second parameter, thus allowing a custom equality test. We’ve already seen how to do this, so we won’t repeat it here.