How to Query Trees Using LINQ

How to Query Trees Using LINQ

 A (directed) tree is one of the most common data types. But how do you use trees in LINQ? This article introduces a generic LINQ-Extension which solves the problem, gives an example and discusses alternatives.





Motivation

Starting point of this article was a simple question in the LINQ forum which has been unanswered for four years...

How can I select tree nodes independently from their nesting level?

So it might be worth to have a closer look at the task. Let's start with an example. A simple TreeNode class may look like this:

public class TreeNode
{
    public string Id { get; set; }
    public int Data { get; set; }
    public List<TreeNode> Children { get; private set; }
 
    public TreeNode(string id, int data)
        : this(id, data, new TreeNode[0])
    { }
 
    public TreeNode(string node, int data, params TreeNode[] children)
    {
        Id = node;
        Data = data;
        Children = new List<TreeNode>(children);
    }
}

The test data are:

TreeNode rootNode =
    new TreeNode("node-0", 35,
        new TreeNode("node-1", 17,
            new TreeNode("node-2", 20)),
        new TreeNode("node-3", 19),
        new TreeNode("node-4", 5,
            new TreeNode("node-5", 25),
            new TreeNode("node-6", 40)));

↑ Return to Top


Solution

Implement a generic extension class which can traverse over arbitrary trees. The only precondition is that the node's children are stored as an IEnumerable. The selector lambda expression avoids hard-coding of the access path and makes the extension reusable.

public static class LinqTreeExtension
{
    public static IEnumerable<T> SelectNestedChildren<T>
        (this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)
    {
        foreach (T item in source)
        {
            yield return item;
            foreach (T subItem in SelectNestedChildren(selector(item), selector))
            {
                yield return subItem;
            }
        }
    }
}

The extension method SelectNestedChildren traverses the tree depth-first, left to right.

↑ Return to Top


Usage Example


Example 1:
Flattening the tree structure  (without the root node)

var flattenedTreeWithoutRoot
    = rootNode.Children
        .SelectNestedChildren(t => t.Children).ToList();

The resulting list enumerates the nodes with IDs "note-1" to "note-6". Please note:  The root node is not contained in the resulting list. We'll fix this in the next example.


Example 2: 
Flattening the tree structure  (including the root node)

var flattenedTreeIncludingRoot
    = new TreeNode[] {rootNode }
        .SelectNestedChildren(t => t.Children).ToList();

The query is executed on an array, which contains only the root node. The resulting list enumerates the nodes with IDs "note-0" to "note-6" as expected.


Example 3: Performing a LINQ query on all nested children

var result
    = new TreeNode[] { rootNode }
        .SelectNestedChildren(t => t.Children)
        .Where(t => t.Data > 30).ToList();

The resulting list contains two nodes with the IDs "node-0" and "node-6"

↑ Return to Top


Contradictions and alternatives

The proposed LINQ extension has significant pros:
  • The extension is applicable to all tree-structures which use an IEnumerable to hold the children.
  • It fits seamlessly into other LINQ operations.
However, in some cases you may want to consider an alternative:


Cyclic graph instead of a tree


Problem: The extension will loop forever if it runs into a cycle of the graph.
Solution: Modify the extension:
  • Add a HashSet<T> to the extension which keeps track of the already visited notes.
  • Your class T should override GetHashCode() and Equals()
  • The extensions has to skip already visited nodes.

Persisted tree in a database


Problem: If the tree is stored in a database (and not in memory), the extension produces unnecessary overhead.
Solution: All tree nodes reside in a database table which can be queried directly. The effect that they represent a logical tree or even a cyclic graph can be ignored. Using the Entity Framework the query looks like this:

var databaseResult = context.TreeNodes
    .Where(t => t.Data > 30).ToList();

However, at least two small changes should be applied to the sample class TreeNode:
  • The property Children should be changed to a virtual property to enable lazy loading.
  • The Entity Framework requires an additional parameterless constructor.

XML data structures

An XML data structure is a special case of an directed tree.

Problem: LINQ to XML has specialized on XML Documents.
Solution: Use LINQ to XML. It reduces your coding effort. You stick to a standard. You may also use other benefits from LINQ to XML. Example:

var xmlElements = XDocument.Load("myfile.xml").Descendants();
 
var result = xmlElements.Where(x => ...);

XDocument.Descendents() returns an IEnumerable<XElement> which can be used in a query directly.


Conclusion:
Using trees with LINQ is simple. However, there is no "one size fits all" solution. 

↑ Return to Top


Links


↑ Return to Top


See Also

Another important place to find a huge amount of Visual C# related articles is the TechNet Wiki itself. The best entry point is Visual C# Resources on the TechNet Wiki

↑ Return to Top


Other languages

This article is also available in other languages:
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Carsten Siemens edited Revision 4. Comment: Fixed misspelling

  • Can you please explain the usage of the word contraindication www.thefreedictionary.com/contraindication in the context of this article?

  • Hello Naomi,

    according to Merriam-Webster - www.merriam-webster.com/.../contraindication - a contraindication is

    "something (as a symptom or condition) that makes a particular treatment or procedure inadvisable".

    Let’s map this definition to the article:

    * The “treatment” is the class LinqTreeExtension which is explained in the “Solution” section.

    * A condition which makes this “treatment” inadvisable is for example a cyclic graph.

    The term contraindication is most often used by physicians and pharmacists. But (at least in Germany) it is not limited to medicine.

    I like a figurative language, but I’m not a native speaker. You may have better synonyms.

    Naomi, I really appreciate your careful editing. Thank you!

  • I am also not a native speaker, so when I saw that word, I had to check it up. It may be nice if an English native speaker will verify if it is indeed permissible to be used in this context. That's why I originally changed it to more often and commonly used contradiction although may be just Pros and Cons may be used

  • Naomi is correct, it's ConTRADiction. ConTRAINdication is a rarely used medical term.

    Simple typo IMO.

    "Contradiction" is very much the right word.

    Regards,

    Pete