Using the Nested Set Data Model for Breadcrumb Links

Tuesday Jul 5th 2005 by Jason Mauss
Share:

Discover a new way to think about modeling hierarchical data using the nested set model. The more you use this model, the more you come to appreciate the power and inherent simplicity it provides.

It's likely that you have seen what I'm referring to as breadcrumb links before. For example, an online sporting goods store might display breadcrumb links (usually found around the top of the page) so you can see which area of their site you are currently browsing and how you got there:

Home > Sporting Equipment > Tennis > Racquets > Wilson

It's less likely however, that you've heard of what's known as the Nested Set Model of Hierarchies. I know it was new to me when I read about it in a book titled Joe Celko Trees and Hierarchies in SQL for Smarties. So, before I write about how to model breadcrumb link data with this model, I'm going to provide an explanation of what the nested set model is. My explanation might be an oversimplification when compared with the chapter on nested sets from Celko book (and in fact I will be borrowing some ideas and concepts from his book), but it should serve the purpose of getting you familiar with the concept. I would strongly encourage anyone interested in this article to pick up Celko book. (And no, he didn't pay me or ask me give him the ringing endorsement.)

Adjacency List Model vs. Nested Set Model

Even if you've only been a database developer for short time, you've probably either read about or encountered a data model that required storing hierarchical data. The most common example of this kind of data is an organizational chart or "org chart." Most database designers and books on database design will show you how to model an org chart using the simple adjacency list model, like this:

CREATE TABLE OrgChart
(EmpName VARCHAR(32) NOT NULL PRIMARY KEY,
 BossName VARCHAR(32),
 JobTitle VARCHAR(32));

You could add columns such as Salary, HireDate, and others to this table to make it more useful, but I'm sure you get the idea. This model produces a table like this:

EmpName BossName JobTitle
Dave NULL CEO
Steve Dave Sr. Vice President
Gary Dave VP Sales
Julia Gary Sales Rep
Kristy Gary Sales Rep
John Gary Sales Rep

Even if you were to fix the obvious normalization problems this example has, it is fraught with weaknesses when it comes to managing the insertion, deletion, and update of records within the hierarchy. It is also limited when it comes to writing self-join queries to do tree traversals. Additionally, the more you try to extend this self-join pattern, the worse query performance becomes.

So what's the answer? The nested set model, of course.

The nested set model design uses two columns to model containment, which represents the subordination that was represented previously in the adjacency list model with the BossName column acting as a foreign key. These two columns can be called "Left" and "Right" or "Begin" and "End" or anything that widely appeals to most people's concept of containment. If you decide to go with "Left" and "Right," you'll need to abbreviate them somehow because they are reserved words in SQL. Something like "lft" and "rgt," for example. I'm going to use "Lft" and "Rgt" because of the left-to-right nature of the images included along with the text of this article.

Breadcrumb Links with the Nested Set Model

Here is what the DDL looks like for your BreadCrumbLinks table. I'll explain which columns hold which pieces of data if it's not clear to you from the column names and the table data.

CREATE TABLE BreadCrumbLinks(
SiteAreaName  CHAR(32) NOT NULL PRIMARY KEY,
SiteAreaUrl   CHAR(32) NOT NULL,
Lft  INTEGER NOT NULL,
Rgt  INTEGER NOT NULL);

Which, after some INSERT statements, gives you this:

SiteAreaName SiteAreaUrl Lft Rgt
Home Default.aspx 1 24
SportingGoods SportingGoods.aspx 2 23
Tennis Tennis.aspx 3 14
Racquets TennisRacquets.aspx 4 13
Prince RacquetCategory.aspx?Brand=Prince 5 6
Wilson RacquetCategory.aspx?Brand=Wilson 7 8
Head RacquetCategory.aspx?Brand=Head 9 10
Yonex RacquetCategory.aspx?Brand=Yonex 11 12
Golf Golf.aspx 15 22
Drivers GolfClubs.aspx 16 21
Ping ClubCategory.aspx?Brand=Ping 17 18
TaylorMade ClubCategory.aspx?Brand=TaylorMade 19 20

As you can tell, I've once again used the Sporting Goods online store concept for the test data. If you're confused by the numbers in the Lft and Rgt columns, don't worry. I was also confused by them until I saw how they were used and managed. I'll discuss that shortly. First, I want to explain the columns and the data they hold.

The SiteAreaName column holds the text to be displayed for the link. This is what will go between your <a> and </a> tags. The SiteAreaUrl column holds the URL for the link. This is the href attribute value for the anchor tag. The Lft and Rgt columns hold the hierarchical information that you'll look at now.

If you look at the first row (Home), you will notice that the value in the Rgt column is (and will always be) double the number of rows in the table. Now, look at this data represented in a several different ways so that the relationships become more noticeable.

Listing 1: Data represented as XML

1   <Home>
2      <SportingGoods>
3         <Tennis>
4            <Racquets>
5               <Prince>
6               </Prince>
7               <Wilson>
8               </Wilson>
9               <Head>
10              </Head>
11              <Yonex>
12              </Yonex>
13           </Racquets>
14        </Tennis>
15        <Golf>
16           <Drivers>
17              <Ping>
18              </Ping>
19              <TaylorMade>
20              </TaylorMade>
21           </Drivers>
22        </Golf>
23   </SportingGoods>
24  </Home>

In this XML data, you can see the relationship between the Lft column values (the opening tags) and the Rgt column values (the closing tags). Now, look at some images.

Note: I've omitted one of the racquet types (Prince) so that things would fit nicely in the images; therefore, the numbers are slightly different but still correct.

Figure 1 shows the data as a nested circle diagram that looks similar to a ven diagram without intersecting areas.

Figure 2 shows the data as line intervals, similar to a horizontal bar (or Gantt) chart.

Both of these images help illustrate the relationship between the data in the Lft and Rgt columns. Hopefully, this has made you familiar with how the nested set model uses the Lft and Rgt columns to define the hierarchical structure of the data. Now, look at how you can write queries to retrieve the data you need to build breadcrumb links.

Retrieving the Data

To get the data returned in a way that's conducive to constructing the HTML needed for breadcrumb links, it's likely that you will want to find all the subordinate rows (starting from the top) that lead down to the row for the site area you're concerned with. To make the construction of the HTML as easy as possible, you'll also want the rows ordered in a top-down fashion, so that you can just loop through the result set. If you wanted to find all breadcrumb link information down to the level of Ping golf clubs, you could write a query like this:


SELECT C.*
FROM    BreadCrumbLinks AS B, BreadCrumbLinks AS C
WHERE   (B.Lft BETWEEN C.Lft AND C.Rgt)
AND     (B.SiteAreaName = 'Ping)
ORDER BY C.Lft

And see these results:

SiteAreaName SiteAreaUrl Lft Rgt
Home Default.aspx 1 24
SportingGoods SportingGoods.aspx 2 23
Golf Golf.aspx 15 22
Drivers GolfClubs.aspx 16 21
Ping ClubCategory?Brand=Ping 17 18

The nice thing about the structure of this query is that it will work no matter how many levels deep the hierarchy is. Now, look at another query to get you the row set for Tennis Racquets:


SELECT C.*
FROM    BreadCrumbLinks AS B, BreadCrumbLinks AS C
WHERE  (B.Lft BETWEEN C.Lft AND C.Rgt)
AND    (B.SiteAreaName = 'Racquets)
ORDER BY C.Lft

And the results:

SiteAreaName SiteAreaUrl Lft Rgt
Home Default.aspx 1 24
SportingGoods SportingGoods.aspx 2 23
Tennis Tennis.aspx 3 14
Racquets TennisRacquets.aspx 4 13

Now, this data could easily be used to create a series of breadcrumb links that look something like this:

Home > Sporting Goods > Tennis > Racquets

It is also probably obvious to you at this point that you don't need to select the Lft and Rgt columns in your queries. I selected those columns in these examples only to demonstrate the structure of the data.

Managing the Data: Inserting, Updating, and Deleting Rows

Another benefit of the nested set model is that it is relatively easy to manage the data in the Lft and Rgt columns. Take a look at deleting an item from the hierarchy. This could entail deleting one of the lowest level items with no subordinates or a level with several items contained beneath it. You'll need to get and then hold onto the SiteAreaName, Lft, and Rgt values for the level (or node) being deleted to use those values throughout the SQL. Because I'm a bigger fan of tennis than golf, pretend I want to delete all the golfing equipment from your BreadCrumbLinks table.


--SELECT the values for the Deleted level into variables
DECLARE DeletedAreaName CHAR(32);
DECLARE DeletedLft INTEGER;
DECLARE DeletedRgt INTEGER;
SELECT SiteAreaName, Lft, Rgt
INTO   DeletedAreaName, DeletedLft, DeletedRgt
FROM   BreadCrumbLinks
WHERE  SiteAreaName = 'Golf;
--Perform the deletion
DELETE FROM BreadCrumbLinks
WHERE Lft BETWEEN DeletedLft AND DeletedRgt;
--UPDATE the table so that the gaps between Lft
-- and Rgt values are removed
UPDATE BreadCrumbLinks
   SET Lft = CASE WHEN Lft > DeletedLft THEN
             Lft - (DeletedRgt - DeletedLft + 1)
          ELSE
             Lft
          END,
       Rgt = CASE WHEN Rgt > DeletedLft THEN
             Rgt - (DeletedRgt - DeletedLft + 1)
          ELSE
             Rgt
          END
   WHERE Lft > DeletedLft
      OR Rgt > DeletedLft;

Now, look at what it takes to insert a new tennis racquet within the Home > SportingGoods > Tennis > Racquets level (node). This is basically the Delete process, but reversed. It requires identifying a specific level (node) that the new item (node) is going to be inserted beneath, and then incrementing the numbers of the Lft and Rgt columns of levels in the path up the hierarchy chain. Lastly, you'll need to number the Lft and Rgt columns of the newly inserted item(s) to fit within the hierarchy. So, the parent level will be Racquets, and you'll insert a new brand of racquet named Volkl.


-- Declare an int var to hold the ParentLevel Rgt value
DECLARE ParentLevel INTEGER;
-- Fill the ParentLevel var w/ the Rgt value
SET ParentLevel = (SELECT Rgt FROM BreadCrumbLinks WHERE
SiteAreaName = 'Racquets);

--UPDATE the table in preparation for the INSERT statement
UPDATE BreadCrumbLinks
   SET Lft = CASE WHEN Lft > ParentLevel THEN
      Lft + 2
    ELSE
      Lft
    END,
   SET Rgt = CASE WHEN Rgt >= ParentLevel THEN
      Rgt + 2
   ELSE
      Rgt
   END
WHERE  Rgt >= ParentLevel;
--INSERT the new record w/ the values derived from the SQL above
INSERT INTO BreadCrumbLinks (SiteAreaName, SiteAreaUrl, Lft, Rgt)
VALUES ('Volkl, 'RacquetCategory.aspx?Brand=Volkl, ParentLevel,
        (ParentLevel + 1));

Not too difficult to manage, especially given the benefits the nested set model provides.

A Couple of Considerations

There are a couple of things worth mentioning about the table I've used in this article. Most experienced database designers will have noticed right away that it's far away from a perfect example of normalization. Also, there is no mention of how the pages themselves are related to the BreadCrumbLinks table. I've left these things out to try and focus on the point of the article (the nested set model) and didn't want to delve into topics that would probably just distract from that focus. If you spend a moment thinking about the relationship, though (between the page and the BreadCrumbLinks table), it's fair to assume that a something like a QueryString value in the Url of the page address or a piece of data associated with the page content would hold a key reference to one of the keys in your BreadCrumbLinks table, thus allowing you to retrieve the appropriate breadcrumb data. The reason I mention this is to remind you that this dependency should be modeled somewhere in your database and not left to be ignored until it causes trouble later on.

Conclusion

I hope I have achieved my goal with this article: providing you with a new way to think about modeling hierarchical data using the nested set model. Breadcrumb links were merely a means to show the benefits of this model. It has been my experience that the more you use the nested set model, the more you come to appreciate the power and inherent simplicity it provides. Organizational data, genealogy data, and many other types of data are perfect candidates for the nested set model. Choosing the right way to model this data is essential to providing a solid foundation to build applications on top of.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved