.NET LINQ from Scratch

As software developers, we spend a lot of time extracting and displaying data from many different data sources. Whether it’s a XML webservice of some sort, or a full featured relational database, we have been forced to learn different methods of data access. Wouldn’t it be great if the method of access was the same for all data sources? Well, we are in luck because, as of the release of C# 3.0 and the .NET 3.5 Framework, LINQ has come to change the game forever.

Tutorial Details

Introduction to LINQ syntax
Projections using LINQ
Refining data
Standard operators

Current Data Access Overview

On the .NET platform we have been and still are utilizing ADO.NET
for accessing different data sources. The open source community has also provided
the developers with a number of alternatives.

Language Integrated Query is the new addition to the .NET
family and as the name suggests it’s the kind of query style data access which
is fully supported by the language to effectively unify the way we access data
and to make our lives easier. LINQ is able to target a number of different sources namely Oracle,
MSSQL, XML and a few others, but for now we will focus on the most basic of
all, LINQ to Objects.

LINQ to Objects

Normally, to process and refine the data within our lists
and various other data structures, we have used either the ‘foreach’ loop or another
type of looping method to iterate through the objects and process them one by
one according to some condition. This is fine, but frankly it requires a lot of
basic coding that we all wish we didn’t have to write. Essentially we’ve had to tell the
compiler every single detail of the process in order to manipulate the data.

This is exactly where LINQ shines best. What LINQ allows us
to do is to simply tell the compiler what we’d like to perform and let the compiler work
out the best way to actually achieve that. If you’ve used SQL syntax before, the massive resemblances
between LINQ and any dialects of SQL will be the first thing that you’ll notice.
Like SQL, LINQ too supports the “select”, “from”, “where”, “join”, “group by”
and “order by” keywords. Here is a simple example of querying a list of objects:

List initialization:

List<Car> ListOfCars = new List<Car>()
{
    new Car {name = "Toyota"    , owner = "Alex" , model = 1992},
    new Car {name = "Mitsubishi", owner = "Jeff" , model = 2005},
    new Car {name = "Land Rover", owner = "Danny", model = 2001},
    new Car {name = "BMW"       , owner = "Danny", model = 2006},
    new Car {name = "Subaru"    , owner = "Smith", model = 2003}
};

The query:

IEnumerable<Car> QueryResult = from car in ListOfCars
                               select car;

The first part of the preceding code simply populates a list
with four instance of the ‘Car’ class. The next part of the code, however, uses the
“from” and “select” keywords to select a group of objects. The main difference
between SQL and LINQ is that the “from” keyword comes before the “select”
keyword because we must first define the object we want to operate on. Finally
the “select” clause tells the compiler what we wish to extract in this query. The above
code simply extracts everything that is in the list and assigns it to the “QueryResult”
variable.

When we query things from objects (LINQ to Objects) our
queries always return an “IEnumrable<T>” list of objects. Essentially the
“IEnumerable” type is the kind of list that exposes the enumerator, which
supports a simple iteration over a non-generic collection, and <T>
is the type of each entry in the list.

Don’t worry if you aren’t familiar with “enumerators” and “generics”. Just
remember that the result of LINQ queries is always a collection like data
structure which allows for iterating through it using a loop like shown
bellow:

foreach(Car car in QueryResult)
    Console.WriteLine(car.name);

We learned that LINQ always returns a collection structure similar
to any other lists. However, the LINQ query does not execute until its result is
accessed by some other piece of code, like the “foreach” loop above. This is to
allow us to continuously define the query without the overhead by re-evaluating
each new step in the query.

Projections

So far so good; but most of the time, our queries will need
to be more complex; so let’s try projecting data. In SQL, Projection means selecting
the name of the column(s) of table(s) which one wishes to see appearing in the result
of the query. In the case of LINQ to Objects, performing Projection will result
in a different query result type than the type of object that we perform the
query on.

There are two kinds of Projections that we can do. We can
either perform a Projection based on an existing object type, or go completely
the other way by using anonymous types. The following example is of the first
kind:

IEnumerable<CarOwner> QueryResult = from car in ListOfCars
                                    select new CarOwner { owner_name = car.owner };

In the preceding code, the type of the query result is declared as
<CarOwner>, which is different to <Car>, the type that ‘ListOfCar’ variable is initialized with. We have
also used the “new” keyword and have done some assignments inside the curly
brackets. In the above code, using “select” with the “new” keyword tells the
compiler to instantiate a new ‘CarOwner’ object for every entry in the query result.
Also by assigning values to the new type we have initialized each instance
of the ‘CarOwner’ class.

Nonetheless, if you don’t already have a type defined to
use, you can still perform projections using anonymous types.

Projections using Anonymous Types

It would be a big hassle if, for every Projection, you were
forced to create a new type. That is why, as of C# 3.0, support for Anonymous
types was added to the language. An Anonymous type is declared using the “var”
keyword. It tells the compiler that the type of the variable is unknown until
it’s assigned for the first time.

var QueryResult = from car in ListOfCars
                  select new {
                      car_name = car.name,
                      owner_name = car.owner
                  };

foreach(var entry in QueryResult)
    Console.WriteLine(entry.car_name);

The above is an example of performing a query with Anonymous
types. The only catch to look out for is that the compiler will not
allow returning Anonymous types from methods.

Accessing the properties of an Anonymous type is easy. In Visual Studio 2008, the Code
Completion/Intellisense also lists the properties exposed by the Anonymous type.

Refining Data

Usually as part of the LINQ query, we also need to refine the
query result by specifying a condition. Just like SQL, LINQ too uses the “where”
clause to tell the compiler what conditions are acceptable.

IEnumerable<Car> QueryResult = from car in ListOfCars
                               where car.name == "Subaru"
                               select car;

The preceding code demonstrate the use of the “where” clause and
the condition to follow. To further to define multiple conditions, LINQ supports
the ‘and’ (&&amp) and ‘or’ (||) constructs. The “where” part of the query has to always be a
Boolean expression, otherwise the compiler will complain.

Order By

When querying objects, it’s possible to rely on the query
target being already sorted. If that isn’t the case, LINQ can take care of that
by using the “order by” clause which will ensure the result of your query is
properly sorted.

IEnumerable<Car> QueryResult = from car in ListOfCars
                               orderby car.model
                               select car;

If you run the above code, you’ll see that the result of the
query is sorted in ascending order. You can alter the order by using the “ascending” and “descending”
keywords, and further change the order by specifying more than one field to sort
by. The following code shows how:

IEnumerable<Car> QueryResult = from car in ListOfCars
                               orderby car.model descending
                               select car;

Grouping

LINQ also allows grouping the query result by the value of a
specific property as shown in this example:

var QueryResult = from car in ListOfCars
                  group car by car.owner into carOwnersGroup
                  select carOwnersGroup.Key;

As you can see, LINQ supports the “group by” clause to
specify what object and by what property to group by. The “into” keyword will
then allow us to project on a grouping result which can be accessed by the “Key”
property.

Joins

LINQ supports joining data from different collections into one
query result. You can do this using the “join” keyword to specify what objects
to join and use the “on” keyword to specify the matching relationship between
the two objects.

Initializing related list:

List<Car> ListOfCars = new List<Car>()
{
    new Car {name = "Mitsubishi", owner = "Jeff" , model = 2005},
    new Car {name = "Land Rover", owner = "Danny", model = 2001},
    new Car {name = "Subaru"    , owner = "Smith", model = 2003},
    new Car {name = "Toyota"    , owner = "Alex" , model = 1992},
    new Car {name = "BMW"       , owner = "Danny", model = 2006},
};

List<CarOwner> ListOfCarOwners = new List<CarOwner>()
{
    new CarOwner {owner_name = "Danny", age = 22},
    new CarOwner {owner_name = "Jeff" , age = 35},
    new CarOwner {owner_name = "Smith", age = 19},
    new CarOwner {owner_name = "Alex" , age = 40}
};

Query:

var QueryResult = from car in ListOfCars
                  join carowner in ListOfCarOwners on car.owner equals carowner.owner_name
                  select new {name = car.name, owner = car.owner, owner_age = carowner.age};

In the above code, using an Anonymous type, we have joined
the two objects in a single query result.

Object Hierarchies using Group Joins

So far, we’ve learned how we can use LINQ to build a flat
list query result. With LINQ, it’s also possible to achieve a hierarchical query
result using “GroupJoin”. In simple words, we could assign objects to
properties of every entry with LINQ query.

List<Car> ListOfCars = new List<Car>()
{
    new Car {name = "Mitsubishi", owner = "Jeff" , model = 2005},
    new Car {name = "Land Rover", owner = "Danny", model = 2001},
    new Car {name = "Subaru"    , owner = "Smith", model = 2003},
    new Car {name = "Toyota"    , owner = "Alex" , model = 1992},
    new Car {name = "BMW"       , owner = "Danny", model = 2006},
};

List<CarOwner> ListOfCarOwners = new List<CarOwner>()
{
    new CarOwner {owner_name = "Danny", age = 22},
    new CarOwner {owner_name = "Jeff" , age = 35},
    new CarOwner {owner_name = "Smith", age = 19},
    new CarOwner {owner_name = "Alex" , age = 40}
};

var QueryResult = from carowner in ListOfCarOwners
                  join car in ListOfCars on carowner.owner_name equals car.owner into carsGroup
                  select new {name = carowner.owner_name, cars = carsGroup};

foreach(var carOwner in QueryResult)
    foreach(var car in carOwner.cars)
        Console.WriteLine("Owner name: {0}, car name: {1}, car model: {2}", carOwner.name, car.name, car.model);

In the above example, the “Join” clause is followed by an “into”
part. This differs to the previous join operation that we looked at. Here, the “into”
clause is used to group cars by the owner (into carsGroup) and assign the grouping to the
“cars” property of the anonymous type.

Standard Query Operators

Thus far, everything that we’ve seen has been supported by the C# 3.0
syntax. However, there is still a large number of operations that C# 3.0 does not
support. The standard query operators provide query capabilities including
filtering, projection, aggregation, sorting and more. These operations are therefore supported
as methods of the LINQ library and can be executed on result of a query like shown in the
following screenshot:

These operators are listed below for your reference.

Aggregate Operators

Sum: returns the sum of all entries
Max: returns the entry with the maximum value
Min: returns the entry with the minimum value
Average: returns the average value for the collection
Aggregate: used for creating a customized aggregation
LongCount: when dealing with a large collection, this method will return a value up to the largest value supported by the “long” class
Count: returns an “integer” for the count of items in the collection

Element Operators

First: returns the first entry from the result collection
FirstOrDefault: if empty collection, will return the default value, otherwise will return the first entry from the collection
Single: will return only element from the collection
SingleOrDefault: if empty collection, will return the default value, otherwise will return only element from the collection
Last: returns the last entry from collection
LastOrDefault: if empty collection, will return the default value, otherwise returns the last entry from collection
ElementAt: returns the element at the specified position
ElementAtOrDefault: if empty collection, will return the default value, otherwise returns the element at the specified position

Set Related Operators

Except: similar to the left join in SQL, will return entries from the one set that doesn’t exist in another set
Union: returns all entries from both objects
Intersect: returns the same elements from either sets
Distinct: returns unique entries from the collection

Generation Operators

DefaultIfEmpty: if result is empty, returns default value
Repeat: repeats on returning objects for specified number of times
Empty: will return an empty IEnumerable collection
Range: returns a range of numbers for a specified starting number and count

Refining Operators

Where: will return objects that meet the specified condition
OfType: will return objects of the specified type

Conversion Operators

ToLookup: returns the result as a lookup
ToList: returns the result as a List collection
ToDictionary: returns the result as a dictionary
ToArray: returns the result as an Array collection
AsQueryable: returns the result as a IQueryable<T>
AsEnumerable: returns the result as a IEnumerable<T>
OfType: filters the collection according to the specified type
Cast: used to convert a weakly typed collection into a strongly typed collection

Partitioning Operators

Take: returns a specified number of records
Takewhile: returns a specified number of records while the specified condition evaluates to true
Skip: skips specified number of entries and returns the rest
SkipWhile: skips specified number of entries while the specified condition evaluates to true

Quantifier Operators

Any: returns true or false for a specified condition
Contains: returns true or false for existence of the specified object
All: returns true or false to all objects meeting the specified condition

Join Operators

Join: returns entries where keys in sets are the same
GroupJoin: used to build hierarchical objects based on a master and detail relationship

Equality Operators

SequenceEqual: returns true when collections are equal

Sorting Operators

Reverse: returns a reversed collection
ThenBy: used to perform further sorting
ThenByDescending: used to perform further sorting in descending order
OrderBy: used to define order
OrderByDescending: used to define descending order

Projection Operators

SelectMany: used to flatten a hierarchical collection
Select: used to identify the properties to return

Concatenation Operators

Concat: used to concatenate two collections

So What Now?

LINQ has proven itself to be very useful for querying objects, and the SQL-like syntax makes it easy to
learn and use. Also, the vast number of Standard Operators makes it possible to chain a number of operators
to perform complex queries. In a follow-up to this tutorial, we’ll review how LINQ can be used to
query databases and XML content..