Introduction

There are often cases when data needs to be grouped by multiple properties for reporting, this part of the series presents simple examples for grouping using both LINQ and lambda syntax style of writing group statements along with going from anonymous to strong typed result sets. More examples will be added over time from the existing examples in the GitHub repository which contains source for both this and part 1 of this series.

Many of the code samples presented are demonstrated using classes with mocked data or with Entity Framework 6 code first with an existing database. Data scripts are located in the root of the Visual Studio solution available in a GitHub repository with code samples for part 1 of the series. 

Part 1 of the series

Requirements 

Suggested reading

Query vs. Method Syntax in LINQ

There are two different types of syntax that you can use to query with LINQ: query syntax and method syntax. In the code samples both types are used, as one becomes familiar with with writing queries one style lends itself better than the other.

Group by one property and multiple properties

The difference between grouping one property verses multiple properties is specifying multiple properties in an anonymous array as shown in the examples below. In figure 1 (using LINQ), a single property defines the group while in figure 2 (using Lambda) a anonymous array specifies which properties to group by. 




The following screenshot shows two Lambda style queries, the first a single property, City while the second groups on City and Country.

Besides grouping a count is set as a property .PersonCount which is many time needed in reports. A similar requirement may be to not include groups with one object, in this case a person object which can be done with a where predicate e.g.
Dim groupResults = personList.
        GroupBy(Function(person) New With {Key person.City, Key person.Country}).
        Where(Function(grp) grp.Count() > 1).ToList()

Then, as in the prior examples shown in the screenshots adding a select clause.
Dim groupResults = personList.GroupBy(Function(person) New With {Key person.City, Key person.Country}).
    Where(Function(grp) grp.Count() > 1).
    Select(Function(grp) New PersonGroup With {
              .City = grp.Key.City,
              .Country = grp.Key.Country, .List = grp.ToList()
              }).ToList()

Additionally, the last query written in LINQ syntax which some developers favor although it may also be dependent on the actual query complexity.

Dim groupResults = (
        From person In personList
        Group By personCityCountry = New With {
            Key .City = person.City,
            Key .Country = person.Country} Into group = Group
        Where group.Count() > 1
        Select New PersonGroup With {
            .City = personCityCountry.City,
            .Country = personCityCountry.Country,
            .List = group.ToList()
        }).ToList()


Both of the above examples use mocked data, nothing read from a database or text file.

GroupJoin

A GroupJoin is in SQL what is called a “Left Outer JOIN” while a Join in SQL refer to “Inner Join” (see join definition). In short, a GroupJoin will do a link between 2 entities even if the right side of the link has nothing to link to. In contrast, the Join will link 2 entities only if both entities contain a link between them.

The GroupJoin operator performs the same task as Join operator except that GroupJoin returns a result in group based on specified group key. The GroupJoin operator joins two sequences based on key and groups the result by matching key and then returns the collection of grouped result and key. The following example performs a group join on customer/order relations by customer identifier which has been setup in the SQL-Server database.

Results are placed into the following container.
Public Class CustomerOrder
    Public Property Customer As Customer
    Public Property Order As IEnumerable(Of Order)
End Class

Code
Public Function GroupJoinCustomersWithCompanyNameStartsWith(
    context As NorthWindContext,
    startsWithValue As String) As List(Of CustomerOrder)
 
    context.Configuration.LazyLoadingEnabled = True
 
    '
    ' Note to use Include the following Import is needed
    '   Imports System.Data.Entity
    '
    Dim results As List(Of CustomerOrder) = context.CustomerIncludes().
            Where(Function(customer) customer.CompanyName.StartsWith(startsWithValue)).
            GroupJoin(context.Orders, Function(c) c.CustomerIdentifier,
                      Function(order) order.CustomerIdentifier,
                      Function(customer, order) New With {
                         Key .Customer = customer,
                         Key .Order = order}).
            Select(Function(customerData) New CustomerOrder With {
                      .Customer = customerData.Customer,
                      .Order = customerData.Order}).
            ToList()
 
    Return results
 
End Function

Using the following to run.
Imports NorthWindEntityLibrary
Imports NorthWindEntityLibrary.Classes
Imports NorthWindEntityLibrary.Models
 
Module Module1
    Private operation As New NorthOperations
    Sub Main()
 
        Dim standardColor = Console.ForegroundColor
 
        Using context = New NorthWindContext
 
            Dim results As List(Of CustomerOrder) =
                    operation.GroupJoinCustomersWithCompanyNameStartsWith(context, "A")
 
            For Each current As CustomerOrder In results
 
                Console.ForegroundColor = ConsoleColor.Green
                Console.WriteLine($"{current.Customer.CompanyName} - {current.Customer.Contact.FullName}")
                Console.ForegroundColor = standardColor
 
                For Each order As Order In current.Order
 
                    Console.ForegroundColor = ConsoleColor.White
                    Console.WriteLine($"   {order.OrderID}, {order.OrderDate.Value.ToShortDateString()}")
                    Console.ForegroundColor = standardColor
 
                    For Each orderDetail As OrderDetail In order.OrderDetails
                        Console.WriteLine($"      {orderDetail.Product.ProductName}, {orderDetail.Quantity}")
                    Next
                Next
 
                Console.WriteLine()
 
            Next
 
        End Using
 
    End Sub
 
End Module

Results


Working with readonly properties

There is a possibility that Entity Framework will throw an exception because the readonly property is not part of the model, in these cases an Expression builder may be needed or use .ToList off the DbSet the query is executed againsts. This will not happen when working with a Plain Old CLR Objects (POCO) class.

Statistics

If statistics are required e.g. minimum, maximum, count, sum etc then setup predicates in the GroupBy, in the following example min, max and count are returned by age of a person. The following example reads data from a SQL-Server database using Entity Framework 6.
Sub MinMaxCountForAgeProperty()
    Using context As New PeopleContext
        Dim people = context.People.ToList()
        Dim query = people.GroupBy(
            Function(person) Math.Floor(person.Age),
            Function(person) person.Age,
            Function(baseAge, ages) New With {
                      .Key = baseAge,
                      .Count = ages.Count(),
                      .Min = ages.Min(),
                      .Max = ages.Max()}
            )
 
 
        Dim output As New System.Text.StringBuilder
        ' Iterate over each anonymous type.
        For Each result In query
            output.AppendLine(vbCrLf & "Age group: " & result.Key)
            output.AppendLine("Number of people in this age group: " & result.Count)
            output.AppendLine("Minimum age: " & result.Min)
            output.AppendLine("Maximum age: " & result.Max)
        Next
 
        ' Display the output.
        Console.WriteLine(output.ToString)
    End Using
 
End Sub

Multiple tables multiple property grouping

Using the what has been presented so far the following performs a group by on three tables, Customer, Orders and Order Details. The new aspect is using three From clauses, one for each table.

To ensure there are proper relations e.g. order for each customer and customer for an order in the From a Where condition uses DefaultIfEmpty method.

Important: In the group sum there can be no null values on either (in this case) UnitPrice or Quantity properties. Since Entity Framework code first is used in the provided database, Order details table its possible to have a null value for UnitPrice so the the type was changed from Decimal to a nullable Decimal e.g. Decimal?

The query will look extremely complex (do not worry there is another one more complex than this) if time has not been spent learning the basics and practicing. It's extremely important as a query becomes complex to have decent names for property names and variables within a query as even an hour later the query can be difficult to figure out let along months later.

Public Sub ThreeTablesGrouping(north As NorthWindContext)
 
    Dim query = From order In north.Orders
                From customer In north.Customers.
                    Where(Function(c) CBool(c.CustomerIdentifier = order.CustomerIdentifier)).DefaultIfEmpty()
                From orderDetail In north.OrderDetails.Where(Function(d) d.OrderID = order.OrderID).DefaultIfEmpty()
                Group New With {
                Key .order = order,
                Key .customer = customer,
                Key .details = orderDetail
                } By GroupKey = New With {Key order.OrderDate.Value.Year, Key customer.CompanyName
                } Into group = Group
                Select New With {
                    Key .Company = GroupKey.CompanyName,
                    Key .OrderYear = GroupKey.Year,
                    Key .Amount = group.Sum(Function(e) e.details.UnitPrice * e.details.Quantity)
                }
 
    Dim queryResults = query.ToList()
    Console.WriteLine()
 
End Sub

As with any anonymous result set this one can be done strongly typed.

IEqualityComparer / IEnumerable(Of IGrouping(Of String, Product))

When a distinct results is needed for grouping, IEqualityComparer(Of In T) Interface against a list can solve this. The following example has a list of products which has duplicate entries by Make and Model.

Namespace Classes
 
    Public Class Products
        Public Shared Function List() As List(Of Product)
            Dim products As New List(Of Product) From {
                New Product With {.Id = 1, .Make = "Samsung", .Model = "Galaxy S3"},
                New Product With {.Id = 2, .Make = "Samsung", .Model = "Galaxy S4"},
                New Product With {.Id = 3, .Make = "Samsung", .Model = "Galaxy S5"},
                New Product With {.Id = 4, .Make = "Apple", .Model = "iPhone 5"},
                New Product With {.Id = 5, .Make = "Apple", .Model = "iPhone 6"},
                New Product With {.Id = 6, .Make = "Apple", .Model = "iPhone 6"},
                New Product With {.Id = 7, .Make = "Apple", .Model = "iPhone 6"},
                New Product With {.Id = 8, .Make = "HTC", .Model = "Sensation"},
                New Product With {.Id = 9, .Make = "HTC", .Model = "Desire"},
                New Product With {.Id = 11, .Make = "HTC", .Model = "Desire"},
                New Product With {.Id = 12, .Make = "Nokia", .Model = "Lumia 735"},
                New Product With {.Id = 13, .Make = "Nokia", .Model = "Lumia 930"},
                New Product With {.Id = 14, .Make = "Nokia", .Model = "Lumia 930"},
                New Product With {.Id = 15, .Make = "Sony", .Model = "Xperia Z3"}
            }
 
            Return products
 
        End Function
 
    End Class
End Namespace

ProductComparer class provides methods that carry out value comparisons which allow greater control of results in a consistent manner.

Namespace Classes
    ''' <summary>
    ''' Comparer on Make and Model properties
    ''' </summary>
    Public Class ProductComparer
        Implements IEqualityComparer(Of Product)
 
        Public Shadows Function Equals(p1 As Product, p2 As Product) As Boolean _
            Implements IEqualityComparer(Of Product).Equals
 
            If ReferenceEquals(p1, p2) Then
                Return True
            End If
 
            If ReferenceEquals(p1, Nothing) OrElse ReferenceEquals(p2, Nothing) Then
                Return False
            End If
 
            Return p1.Make = p2.Make AndAlso p1.Model = p2.Model
 
        End Function
        Public Shadows Function GetHashCode(product As Product) As Integer _
            Implements IEqualityComparer(Of Product).GetHashCode
 
            If ReferenceEquals(product, Nothing) Then
                Return 0
            End If
 
            Dim hashProductName As Integer = If(product.Make Is Nothing, 0, product.Make.GetHashCode())
            Dim hashProductCode As Integer = product.Model.GetHashCode()
 
            Return hashProductName Xor hashProductCode
 
        End Function
    End Class
End Namespace

In the following code the base source list is presented followed by results from the comparer then a lambda group by is used to group products.

Module Module1
    ''' <summary>
    ''' Example to first get distinct products in one query
    ''' followed by grouping on make property in a second query.
    ''' </summary>
    Sub Main()
        Dim productList = Products.List()
 
        Console.WriteLine("Original list")
 
        productList.ForEach(Sub(product) Console.WriteLine(product))
 
        Console.WriteLine()
        Console.WriteLine("ProductComparer results")
 
        Dim productsQuery As IEnumerable(Of Product) =
                productList.Distinct(New ProductComparer).
                OrderBy(Function(product) product.Make)
 
 
        For Each product As Product In productsQuery
            Console.WriteLine(product)
        Next
 
        Console.WriteLine()
        Console.WriteLine("Group by make")
 
        Dim grouped As IEnumerable(Of IGrouping(Of String, Product)) =
                productsQuery.GroupBy(Function(product) product.Make)
 
        For Each grouping As IGrouping(Of String, Product) In grouped
            Console.WriteLine(grouping.Key)
            For Each product As Product In grouping
                Console.WriteLine($"   {product}")
            Next
        Next
 
        Console.ReadLine()
 
    End Sub
 
End Module

Results

Extremely complex group by queries

Extremely complex grouping for business requirements can also be done once the basics have been mastered and there are other options dependent on willingness to think outside of LINQ and Lambda, especially when the work can be performed with conventional SQL.

Example, a database table is know to have duplicate rows of data where the duplicates are by multiple columns in a table which requires a business user to make decisions on which data is valid and which data is not valid along with the possibilities to merge data then delete extra rows.

In the following example a customer table has duplicates based on the following columns, CompanyName, ContactName, ContactTitle, Address, City and postal code. To get duplicates the following SELECT will work were the data may be stored in a DataTable or List(Of T).

Note that in a proper database schema there would be 
  • Proper constraints to guard against duplicates
  • Unit testing done on applications to interact with the database
  • The data should be normalized, it's not to keep the exercise easy to follow.
SELECT A.*
FROM Customers A
INNER JOIN
    (
    SELECT
        CompanyName,
        ContactName,
        ContactTitle,
        Address,
        City,
        PostalCode
    FROM
        Customers
    GROUP BY
        CompanyName,
        ContactName,
        ContactTitle,
        Address,
        City,
        PostalCode
    HAVING COUNT(*) > 1
    ) B
ON
A.CompanyName = B.CompanyName AND
A.ContactName = B.ContactName AND
A.ContactTitle = B.ContactTitle AND
A.Address = B.Address AND
A.City = B.City AND
A.PostalCode = B.PostalCode
ORDER BY
    A.CompanyName

Data presented (less controls to perform deletes and merging).



To perform the same operation with grouping is a two step process to keep code easy to read.

Using a DataTable for returning results to a control e.g. DataGridView in this case.

Step 1
Read data from the database followed by grouping data on the exact same column as per above. The variable duplicates is an anonymous type.



Step 2
Iterate results from step 1 into a pre defined DataTable.

Returning duplicates in a List of a class which inherits from a base class to represent a single customer.

Namespace Classes
    ''' <summary>
    ''' Provides a column to mark a row in a DataGridView
    ''' to act upon
    ''' </summary>
    Public Class CustomerRigger
        Inherits Customer
 
        Public Property Process() As Boolean
 
    End Class
End Namespace

Customer class
Namespace Classes
    Public Class Customer
        Public Property Identifier As Integer
        Public Property CompanyName As String
        Public Property ContactName As String
        Public Property ContactTitle As String
        Public Property Address As String
        Public Property City As String
        Public Property PostalCode As String
        Public Property Exists As Boolean
        Public ReadOnly Property ItemArray() As String()
            Get
                Return {CStr(Identifier).PadLeft(3, "0"c), CompanyName, ContactName, ContactTitle, Address, City, PostalCode}
            End Get
        End Property
        Public Overrides Function ToString() As String
            Return $"'{CompanyName}' '{ContactName}' '{ContactTitle}' '{Address}' '{City}' '{PostalCode}'"
        End Function
    End Class
End Namespace

The following code differs from the first example with a DataTable is to use a LINQ statement to create the strong type results.
Public Function GetCustomerDuplicatesAsList() As List(Of CustomerRigger)
    Dim dataOperations As New DataOperations
 
    Dim dt As DataTable = dataOperations.ReadCustomersFromDatabase()
 
    Dim duplicates As IEnumerable(Of IEnumerable(Of CustomerRigger)) = From dataRow In dt.AsEnumerable() Select item = New With {
            Key .Identifier = dataRow.Field(Of Integer)("Identifier"),
            Key .CompanyName = dataRow.Field(Of String)("CompanyName"),
            Key .ContactName = dataRow.Field(Of String)("ContactName"),
            Key .ContactTitle = dataRow.Field(Of String)("ContactTitle"),
            Key .Street = dataRow.Field(Of String)("Address"),
            Key .City = dataRow.Field(Of String)("City"),
            Key .PostalCode = dataRow.Field(Of String)("PostalCode")}
group temp = item By Key = New With {
                Key .CompanyName = item.CompanyName,
                Key .ContactName = item.ContactName,
                Key .ContactTitle = item.ContactTitle}
            Into Group Where Group.Count() > 1 Select Group.Select(Function(g) New CustomerRigger With {
                                     .Identifier = g.Identifier,
                                     .CompanyName = g.CompanyName,
                                     .ContactName = g.ContactName,
                                     .ContactTitle = g.ContactTitle,
                                     .Address = g.Street,
                                     .City = g.City,
                                     .PostalCode = g.PostalCode
                              })
 
 
    Return (From item In duplicates From row In item Select row).ToList()
 
End Function

Special note (from Microsoft documentation)
The Key keyword enables you to specify behavior for properties of anonymous types. Only properties you designate as key properties participate in tests of equality between anonymous type instances, or calculation of hash code values. The values of key properties cannot be changed.

Tips

Analysers recommendations

When a piece of code is greyed out this is an indication that the code is never coded or utilized. Most of the time this is true although there are some cases Visual Studio does not analyze code correctly. This may be frustrating when writing Group By logic. In the following snippet note the shadow properties which works as expected.

Removing the anonymous properties


Clearly Visual Studio did not make the correct recommendation so when using standard analysers and third analysers don't always trust their recommendations.

Code folding

Code folding is taking a single line of code and "folding" a long line.

Without folding as shown in figure 1 below is inherently harder to read then in figure 2. Consider writing figure 1 today then need to modify figure 1 six or more months down the road verses the code in figure 2. 

Figure 1
Dim groupResults = (
        From person In personList Group By personCityCountry = New With {Key .City = person.City, Key .Country = person.Country} Into group = Group
        Where group.Count() > 1 Select New PersonGroup With {.City = personCityCountry.City, .Country = personCityCountry.Country, .List = group.ToList()}).ToList()
Figure 2 
Dim groupResults = (
        From person In personList
        Group By personCityCountry = New With {
            Key .City = person.City,
            Key .Country = person.Country} Into group = Group
        Where group.Count() > 1
        Select New PersonGroup With {
            .City = personCityCountry.City,
            .Country = personCityCountry.Country,
            .List = group.ToList()
        }).ToList()

Know complete results of queries

A common mistake when writing any code which interacts with any data is knowing the results beforehand. When working with LINQ and Lambda against a database first write SQL statements, review the results then write the code quires, run the queries in a unit test method and validate results against SQL. By first writing SQL a deeper knowledge of data will be when there are unexpected results in code. This also ties into optimization of queries, write a bad SQL statement is no different than writing a bad Entity Framework query that when viewing a statement looks great yet Entity Framework takes the code and writes good or bad underlying SQL dependent on a developers understanding of their data. 

Debugging

Take time to learn how to properly debug code which includes using Visual Studio breakpoints, breakpoint conditions, data tipslocal window, watch and quick watch windows. Depending on the edition of Visual Studio include learning to work with IntelliTrace which should only be turned on when needed.

Summary

Basic to advance grouping methods have been presented although not ever possible situation have been covered. By using what has been presented in this and part 1 of the series provides what is needed to write statements for most situations. One thing to be cognitive of is that if a query is written poorly the performance will suffer. There are countless post on the Internet about grouping poor performance where 95 percent of the time the query needs optimization and since each query is different so will be what needs to be done performance wise. Learn to use IntelliTrace in a debug session to assist with optimizing queries.

See also

Introduction to Grouping with LINQ/Lambda (VB.NET)

Source code

The following GitHub repository Jump contains all code samples presented along with extra code samples all guaranteed to work.