Wednesday 7 June 2017

Repository pattern, done right

The repository pattern has been discussed a lot lately. Especially about it’s usefulness since the introduction of OR/M libraries. This post (which is the third in a series about the data layer) aims to explain why it’s still a great choice.
Let’s start with the definition:
A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes
The repository pattern is an abstraction. It’s purpose is to reduce complexity and make the rest of the code persistent ignorant. As a bonus it allows you to write unit tests instead of integration tests. The problem is that many developers fail to understand the patterns purpose and create repositories which leak persistence specific information up to the caller (typically by exposing IQueryable<T>).
By doing so they get no benefit over using the OR/M directly.

Common misconceptions

Here are some common misconceptions regarding the purpose of the pattern.

Repositories is about being able to switch DAL implementation

Using repositories is not about being able to switch persistence technology (i.e. changing database or using a web service etc instead).
Repository pattern do allow you to do that, but it’s not the main purpose.
A more realistic approach is that you in UserRepository.GetUsersGroupOnSomeComplexQuery() uses ADO.NET directly while you in UserRepository.Create() uses Entity Framework. By doing so you are probably saving a lot of time instead of struggling with LinqToSql to get your complex query running.
Repository pattern allow you to choose the technology that fits the current use case.

Unit testing

When people talks about Repository pattern and unit tests they are not saying that the pattern allows you to use unit tests for the data access layer.
What they mean is that it allows you to unit test the business layer. It’s possible as you can fake the repository (which is a lot easier than faking nhibernate/EF interfaces) and by doing so write clean and readable tests for your business logic.
As you’ve separated business from data you can also write integration tests for your data layer to make sure that the layer works with your current database schema.
If you use ORM/LINQ in your business logic you can never be sure why the tests fail. It can be because your LINQ query is incorrect, because your business logic is not correct or because the ORM mapping is incorrect.
If you have mixed them and fake the ORM interfaces you can’t be sure either. Because Linq to Objects do not work in the same way as Linq to SQL.
Repository pattern reduces the complexity in your tests and allow you to specialize your tests for the current layer

How to create a repository

Building a correct repository implementation is very easy. In fact, you only have to follow a single rule:
Do not add anything into the repository class until the very moment that you need it
A lot of coders are lazy and tries to make a generic repository and use a base class with a lot of methods that they might need. YAGNI. You write the repository class once and keep it as long as the application lives (can be years). Why fuck it up by being lazy? Keep it clean without any base class inheritance. It will make it much easier to read and maintain.
The above statement is a guideline and not a law. A base class can very well be motivated. My point is that you should think before you add it, so that you add it for the right reasons.

Mixing DAL/Business

Here is a simple example of why it’s hard to spot bugs if you mix LINQ and business logic.
  1. var brokenTrucks = _session.Query<Truck>().Where(x => x.State == 1);
  2. foreach (var truck in brokenTrucks)
  3. {
  4. if (truck.CalculateReponseTime().TotalDays > 30)
  5. SendEmailToManager(truck);
  6. }
What does that give us? Broken trucks?
Well. No. The statement was copied from another place in the code and the developer had forgot to update the query. Any unit tests would likely just check that some trucks are returned and that they are emailed to the manager.
So we basically have two problems here:
a) Most developers will likely just check the name of the variable and not on the query.
b) Any unit tests are against the business logic and not the query.
Both those problems would have been fixed with repositories. Since if we create repositories we have unit tests for the business and integration tests for the data layer.

Implementations

Here are some different implementations with descriptions.

Base classes

These classes can be reused for all different implementations.

UnitOfWork

The unit of work represents a transaction when used in data layers. Typically the unit of work will roll back the transaction if SaveChanges()has not been invoked before being disposed.
  1. public interface IUnitOfWork : IDisposable
  2. {
  3. void SaveChanges();
  4. }
.

Paging

We also need to have page results.
  1. public class PagedResult<TEntity>
  2. {
  3. IEnumerable<TEntity> _items;
  4. int _totalCount;
  5. public PagedResult(IEnumerable<TEntity> items, int totalCount)
  6. {
  7. _items = items;
  8. _totalCount = totalCount;
  9. }
  10. public IEnumerable<TEntity> Items { get { return _items; } }
  11. public int TotalCount { get { return _totalCount; } }
  12. }
We can with the help of that create methods like:
  1. public class UserRepository
  2. {
  3. public PagedResult<User> Find(int pageNumber, int pageSize)
  4. {
  5. }
  6. }

Sorting

Finally we prefer to do sorting and page items, right?
  1. var constraints = new QueryConstraints<User>()
  2. .SortBy("FirstName")
  3. .Page(1, 20);
  4. var page = repository.Find("Jon", constraints);
Do note that I used the property name, but I could also have written constraints.SortBy(x => x.FirstName). However, that is a bit hard to write in web applications where we get the sort property as a string.
The class is a bit big, but you can find it at github.
In our repository we can apply the constraints as (if it supports LINQ):
  1. public class UserRepository
  2. {
  3. public PagedResult<User> Find(string text, QueryConstraints<User> constraints)
  4. {
  5. var query = _dbContext.Users.Where(x => x.FirstName.StartsWith(text) || x.LastName.StartsWith(text));
  6. var count = query.Count();
  7. //easy
  8. var items = constraints.ApplyTo(query).ToList();
  9. return new PagedResult(items, count);
  10. }
  11. }
The extension methods are also available at github.

Basic contract

I usually start use a small definition for the repository, since it makes my other contracts less verbose. Do note that some of my repository contracts do not implement this interface (for instance if any of the methods do not apply).
  1. public interface IRepository<TEntity, in TKey> where TEntity : class
  2. {
  3. TEntity GetById(TKey id);
  4. void Create(TEntity entity);
  5. void Update(TEntity entity);
  6. void Delete(TEntity entity);
  7. }
I then specialize it per domain model:
  1. public interface ITruckRepository : IRepository<Truck, string>
  2. {
  3. IEnumerable<Truck> FindBrokenTrucks();
  4. IEnumerable<Truck> Find(string text);
  5. }
That specialization is important. It keeps the contract simple. Only create methods that you know that you need.

Entity framework

Do note that the repository pattern is only useful if you have POCOs which are mapped using code first. Otherwise you’ll just break the abstraction using the entities. The repository pattern isn’t very useful then.
What I mean is that if you use the model designer you’ll always get a perfect representation of the database (but as classes). The problem is that those classes might not be a perfect representation of your domain model. Hence you got to cut corners in the domain model to be able to use your generated db classes.
If you on the other hand uses Code First you can modify the models to be a perfect representation of your domain model (if the DB is reasonable similar to it). You don’t have to worry about your changes being overwritten as they would have been by the model designer.
You can follow this article if you want to get a foundation generated for you.

Base class

  1. public class EntityFrameworkRepository<TEntity, TKey> where TEntity : class
  2. {
  3. private readonly DbContext _dbContext;
  4. public EntityFrameworkRepository(DbContext dbContext)
  5. {
  6. if (dbContext == null) throw new ArgumentNullException("dbContext");
  7. _dbContext = dbContext;
  8. }
  9. protected DbContext DbContext
  10. {
  11. get { return _dbContext; }
  12. }
  13. public void Create(TEntity entity)
  14. {
  15. if (entity == null) throw new ArgumentNullException("entity");
  16. DbContext.Set<TEntity>().Add(entity);
  17. }
  18. public TEntity GetById(TKey id)
  19. {
  20. return _dbContext.Set<TEntity>().Find(id);
  21. }
  22. public void Delete(TEntity entity)
  23. {
  24. if (entity == null) throw new ArgumentNullException("entity");
  25. DbContext.Set<TEntity>().Attach(entity);
  26. DbContext.Set<TEntity>().Remove(entity);
  27. }
  28. public void Update(TEntity entity)
  29. {
  30. if (entity == null) throw new ArgumentNullException("entity");
  31. DbContext.Set<TEntity>().Attach(entity);
  32. DbContext.Entry(entity).State = EntityState.Modified;
  33. }
  34. }
Then I go about and do the implementation:
  1. public class TruckRepository : EntityFrameworkRepository<Truck, string>, ITruckRepository
  2. {
  3. private readonly TruckerDbContext _dbContext;
  4. public TruckRepository(TruckerDbContext dbContext)
  5. {
  6. _dbContext = dbContext;
  7. }
  8. public IEnumerable<Truck> FindBrokenTrucks()
  9. {
  10. //compare having this statement in a business class compared
  11. //to invoking the repository methods. Which says more?
  12. return _dbContext.Trucks.Where(x => x.State == 3).ToList();
  13. }
  14. public IEnumerable<Truck> Find(string text)
  15. {
  16. return _dbContext.Trucks.Where(x => x.ModelName.StartsWith(text)).ToList();
  17. }
  18. }

Unit of work

The unit of work implementation is simple for Entity framework:
  1. public class EntityFrameworkUnitOfWork : IUnitOfWork
  2. {
  3. private readonly DbContext _context;
  4. public EntityFrameworkUnitOfWork(DbContext context)
  5. {
  6. _context = context;
  7. }
  8. public void Dispose()
  9. {
  10. }
  11. public void SaveChanges()
  12. {
  13. _context.SaveChanges();
  14. }
  15. }

nhibernate

I usually use fluent nhibernate to map my entities. imho it got a much nicer syntax than the built in code mappings. You can use nhibernate mapping generator to get a foundation created for you. But you do most often have to clean up the generated files a bit.

Base class

  1. public class NHibernateRepository<TEntity, in TKey> where TEntity : class
  2. {
  3. ISession _session;
  4. public NHibernateRepository(ISession session)
  5. {
  6. _session = session;
  7. }
  8. protected ISession Session { get { return _session; } }
  9. public TEntity GetById(string id)
  10. {
  11. return _session.Get<TEntity>(id);
  12. }
  13. public void Create(TEntity entity)
  14. {
  15. _session.SaveOrUpdate(entity);
  16. }
  17. public void Update(TEntity entity)
  18. {
  19. _session.SaveOrUpdate(entity);
  20. }
  21. public void Delete(TEntity entity)
  22. {
  23. _session.Delete(entity);
  24. }
  25. }

Implementation

  1. public class TruckRepository : NHibernateRepository<Truck, string>, ITruckRepository
  2. {
  3. public TruckRepository(ISession session)
  4. : base(session)
  5. {
  6. }
  7. public IEnumerable<Truck> FindBrokenTrucks()
  8. {
  9. return _session.Query<Truck>().Where(x => x.State == 3).ToList();
  10. }
  11. public IEnumerable<Truck> Find(string text)
  12. {
  13. return _session.Query<Truck>().Where(x => x.ModelName.StartsWith(text)).ToList();
  14. }
  15. }

Unit of work

  1. public class NHibernateUnitOfWork : IUnitOfWork
  2. {
  3. private readonly ISession _session;
  4. private ITransaction _transaction;
  5. public NHibernateUnitOfWork(ISession session)
  6. {
  7. _session = session;
  8. _transaction = _session.BeginTransaction();
  9. }
  10. public void Dispose()
  11. {
  12. if (_transaction != null)
  13. _transaction.Rollback();
  14. }
  15. public void SaveChanges()
  16. {
  17. if (_transaction == null)
  18. throw new InvalidOperationException("UnitOfWork have already been saved.");
  19. _transaction.Commit();
  20. _transaction = null;
  21. }
  22. }

Typical mistakes

Here are some mistakes which can be stumbled upon when using OR/Ms.

Do not expose LINQ methods

Let’s get it straight. There are no complete LINQ to SQL implementations. They all are either missing features or implement things like eager/lazy loading in their own way. That means that they all are leaky abstractions. So if you expose LINQ outside your repository you get a leaky abstraction. You could really stop using the repository pattern then and use the OR/M directly.
  1. public interface IRepository<TEntity>
  2. {
  3. IQueryable<TEntity> Query();
  4. // [...]
  5. }
Those repositories really do not serve any purpose. They are just lipstick on a pig.

Learn about lazy loading

Lazy loading can be great. But it’s a curse for all which are not aware of it. If you don’t know what it is, Google.
If you are not careful you could get 101 executed queries instead of 1 if you traverse a list of 100 items.

Invoke ToList() before returning

The query is not executed in the database until you invoke ToList()FirstOrDefault() etc. So if you want to be able to keep all data related exceptions in the repositories you have to invoke those methods.

Get is not the same as search

There are to types of reads which are made in the database.
The first one is to search after items. i.e. the user want to identify the items that he/she like to work with.
The second one is when the user has identified the item and want to work with it.
Those queries are different. In the first one, the user only want’s to get the most relevant information. In the second one, the user likely want’s to get all information. Hence in the former one you should probably return UserListItem or similar while the other case returns User. That also helps you to avoid the lazy loading problems.
I usually let search methods start with FindXxxx() while those getting the entire item starts with GetXxxx(). Also don’t be afraid of creating specialized POCOs for the searches. Two searches doesn’t necessarily have to return the same kind of entity information.

Summary

Don’t be lazy and try to make too generic repositories. It gives you no upsides compared to using the OR/M directly. If you want to use the repository pattern, make sure that you do it properly.
Want to generate repositories in the way that I describe here? Try my new DAL Generator

Changes

No comments:

Post a Comment

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...