Monday 1 December 2014

Using Self Referencing Tables With Entity Framework

Since EF was released I have been a fan. However, every once in a while I’ll run into a table design situation that I am not sure how to handle with EF. This week, I needed to setup a self-referencing table in order to store some hierarchical data. A self referencing table is a table where the primary key on the table is also defined as a foreign key. Sounds a little confusing right?

Let’s clarify the solution with an example. Let’s say I am building an application where I have a list of categories and subcategories. One of my top level categories is “Programming Languages” and under programming languages I have to subcategories which are “C#” and “Java”. In order to store this data I can use a single table with the following structure:
The actual data would look like this:
image
Just to clarify, a top level category will have a null value for the ParentId field. For all child categories the ParentId field is used as to represent its parent’s primary key value. As a programmer you may want to think about the ParentId field as a pointer. To complete the example lets take a look at the SQL used to create the table.
01.CREATE TABLE [dbo].[Categories] (
02.[CategoryId] [int] IDENTITY(1,1) NOT NULL,   
03.[Name] [nvarchar](255) NOT NULL,   
04.[ParentId] [intNULL,
05.PRIMARY KEY CLUSTERED
06.(
07.[CategoryId] ASC
08.)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS =ON, ALLOW_PAGE_LOCKS = ONON [PRIMARY]
09.ON [PRIMARY]
10. 
11.GO
12. 
13.ALTER TABLE [dbo].[Categories]  WITH CHECK ADD  CONSTRAINT [Category_Parent] FOREIGNKEY([ParentId])
14.REFERENCES [dbo].[Categories] ([CategoryId])
15.GO
16. 
17.ALTER TABLE [dbo].[Categories] CHECK CONSTRAINT [Category_Parent]
18.GO
Upon examining the SQL, you should have noticed that the CategoryId is the primary key on the table and the ParentId field is a foreign key which points back to the CategoryId field. Since we have a key referencing a another key on the same table we can classify this this as a self-referencing table. Now that we fully understand what a self-referencing table is, we can move forward to the Entity Framework code. To get started we first need to create a simple C# object to represent the Category table. Of course, keep in mind that if you are using EF Code first you do not need to create the table or database ahead of time. I only showed the table first because I wanted to better illustrate what a self referencing table is.
1.public class Category
2.{
3.public int CategoryId { getset; }
4.public string Name { getset; }
5.public int? ParentId { getset; }
6.}
So far the Category class is very simple. However, we really want to add a few more properties in order to make this class useful. For example, if you are a child category you really want to be able to use dot notation to get the name of the parent category (e.g. subCategory.Parent.Name). Using EF, we will create a virtual property named Parent. By making the property virtual we are letting EF know that when this property is accessed we want to load some data. Based on your configuration settings and the code you use to retrieve your data (whether or not you used DbSet.Include), EF will lazy load or eager load this data.
1.public class Category
2.{
3.public int CategoryId { getset; }
4.public string Name { getset; }
5.public int? ParentId { getset; }
6.public virtual Category Parent { getset; }
7.}
Finally, we also want a property called Children so we can use dot notation to enumerate over the child categories. Once again, here is the modified class:
1.public class Category
2.{
3.public int CategoryId { getset; }
4.public string Name { getset; }
5.public int? ParentId { getset; }
6.public virtual Category Parent { getset; }
7.public virtual ICollection<Category> Children { getset; }
8.}
The final step is to let EF know how these properties are related to one another. This can be done using EF's fluent API. If you are new to EF and are unaware of the fluent API then you may want to read this article first.
01.public class CommodityCategoryMap : EntityTypeConfiguration<Category> {
02.public CommodityCategoryMap() {
03.HasKey(x => x.CategoryId);
04. 
05.Property(x => x.CategoryId)
06..HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
07. 
08.Property(x => x.Name)
09..IsRequired()
10..HasMaxLength(255);
11. 
12.HasOptional(x => x.Parent)
13..WithMany(x => x.Children)
14..HasForeignKey(x => x.ParentId)
15..WillCascadeOnDelete(false);
16.}
17.}
Hopefully you paid careful attention to the last section of code where we state the a Category has an optional Parent property. In database speak, this simply means that the ParentID field is nullable. The code also states that if a Category object can have zero or many children. In order to specify that a record is a child, we  leverage the ParentId field to hold the primary key value of the parent record. As I mentioned earlier, if you are a programmer its easier to think of the ParentId field as a pointer. Finally, I disabled the cascade on delete option. This step is optional and probably based on your own personal preferences. If you enable cascade on delete and you delete a category that has 100 children then you will effectively remove 101 records. For whatever reason this scares me a little bit. Perhaps, my short career as a DBA caused me to not trust people with large volume delete statements. However, you may decide differently depending on your circumstances.
Hopefully, this short EF tutorial will help you if you are working through a scenario where you need to capture and manipulate hierarchical data. If you have any questions please leave a comment.

No comments:

Post a Comment

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...