Showing posts with label design. Show all posts

Friday, 6 November 2015

Dependency Injection Best Practices in an N-tier Modular Application

Introduction

Microsoft Unity is one of the most popular tools to implement Dependency Injection (DI).

Note: If you are not familiar with Inversion of Control (IoC) and DI, you can find more detail about them on Martin Flower's blog here Dependency Injection. I don't believe IoC is better explained anywhere else.

In a typical modern ASP.NET MVC Web Application, you will find a three tier layered architecture with units of isolation dependent on each other as illustrated in Figure 1.

Figure 1: A modern web three tier application

A DI framework allows you to inject the dependencies, and in our web application to be precise, it allows you to inject the Data Layer contracts in Business Layer and the Business Layer contracts in presentation without having to create concrete instances of the implementation.

Problem

The DI container creates a dependency graph (Composite Root) between various registered objects. It requires that we register our concrete implementations with the DI. The host applications like ASP.Net MVC or WCF should refer to all the assemblies so that DI container can register all the dependencies during the application start phase. It means the host application will have direct access to various implementations of the business or data layer. This leads to few major issues as it is not only violating the architectural rules of layer separation but also it can lead to bad coding practiceswhen a developer starts consuming the data layer or entity models directly in the presentation layer. This blurred layer separation can also make it hard for a compiler to check the architectural rules at compile time.

Further if the application lacks any capacity for discovering components on its own, then it must be explicitly told which components are available and should be loaded. This is typically accomplished by explicitly registering the available components in code or a configuration file. This can become amaintenance issue.

Let's discuss this in detail with an example:-

Figure 2: Implementation details of Business and Data access layers

In Figure 2, the home controller in the presentation layer depends on IUserDomain (a business layer interface) whose implementation UserDomain is internal to the business layer assembly. TheUserDomain depends on IUserData whose implementation UserData is again internal to data access assembly.

The DI container is in the ASP.Net MVC application and it is in charge of instantiating all of the registered types but in order to do so it needs access to all internal concrete implementations of IUserDomain or IUserData. This would mean we need to make those implementations public and reference all of the assemblies in the host application.

Let's Dive into the Code

I have created a sample ASP.Net MVC 4 application that uses Unity.Mvc4 to implement unity container and dependency resolver.

Visual Studio Solution design:

Figure 3: Sample ASP.Net MVC 4 Application

Common: UnityDemo.Common. It will have all the shared resources.

Domain: It is collection of modules. Each domain or module is fully capable of handling any business function. So it should know its data access layer (shared among domains or independent). Security in this case is one of the domain or module.

UnityDemo.Security: A business layer for security module or domain.

UnityDemo.Security.Data: Data access layer for security domain.

Web: It has ASP.Net MVC 4 application

Some Important Pieces of the Above Solution

IUserDomain: Business layer façade. The 'Domain' suffix means an independent business function/area. The entire business layer can be divided into multiple Domain layers, which may share or have their own data layer.

namespace UnityDemo.Common
{
public interface IUserDomain
{
IUser GetUser(int userId);
}
}

IUserData: Data layer façade.

namespace UnityDemo.Security.Data
{
public interface IUserData
{
User GetUser(int id);
}
}

UserData: Data layer façade concrete implementation.

namespace UnityDemo.Security.Data
{
public class UserData : IUserData
{
public User GetUser(int id)
{
//Just instantiate and pass..Ideally can come from a database thru EF
return new User
{
Id = id,
Name = "Manoj Kumar",
Age = 28
};
}
public IEnumerable<User> GetAllUsers()
{
return null;
}
}
}

UserDomain: Domain layer façade concrete implementation.

namespace UnityDemo.Security
{
using Data = UnityDemo.Security.Data;
public class UserDomain : IUserDomain
{
public readonly IUserData _data;
public UserDomain(IUserData data)
{
_data = data;
}
public IUser GetUser(int userId)
{
var user = _data.GetUser(userId);
return user.MapTo();
}
public IEnumerable<User> GetAllUsers()
{
return null;
}
}
}

HomeController: The default controller in ASP.Net MVC application. It depends on IUserDomain.

public class HomeController : Controller
{
private readonly IUserDomain _domain;
public HomeController(IUserDomain domain)
{
_domain = domain;
}
}

Bootstrapper: Bootstrapper registers all the concrete implementations in the web application.The Bootstrapper.Initialise() will be called from Application_Start () in Global.asax.

namespace UnityDemo.Web
{
public static class Bootstrapper
{
public static IUnityContainer Initialise()
{
var container = BuildUnityContainer();
DependencyResolver.SetResolver(new UnityDependencyResolver(container));
return container;
}
private static IUnityContainer BuildUnityContainer()
{
var container = new UnityContainer();
RegisterTypes(container);
return container;
}
public static void RegisterTypes(IUnityContainer container)
{
container.RegisterType<IUserData, UserData>();
container.RegisterType<IUserDomain, UserDomain>();
}
}
}

Drawbacks with this Design:

Consequences of this direct access of concrete business and data layer types while registering to unity container in Bootstrapper:

container.RegisterType<IUserData, UserData>();

container.RegisterType<IUserDomain, UserDomain>();

1. There is no clear separation of layers. We can't have truly independent modules.

2. The public implementations can be misused in the presentation layer (HomeController):

 public ActionResult UsersEdit()
 {
 //Started accessing domain directly
 var userDomain = new UserDomain(new UserData());
 var users = userDomain.GetAllUsers();
 //Or even bad: accessing data layer
 var data = new UserData();
 var users2 = data.GetAllUsers();
 //Or even worse: Using entity models instance directly
 using (var model = new SecurityModelContainer())
 {
 var users3 = from u in model.Users select u;
 }
 return View(users);
}

3. Direct references of business and data layer assemblies in the presentation layer to register concrete public types:

Figure 4: Ref in Web Problem

Let us now explore an approach to make this better.

Solution

If you are not using a DI container, you won’t need to reference data access assemblies of any module which might also contain Entity Framework (EF) models in an MVC application. You could have just referenced business layer assemblies. But this would mean that you lose some of the benefits of decoupling. We know that at the end of the day all assemblies would be included in the bin folder of MVC application, but the problem is the accessibility of concrete implementation of data or business implementations in the presentation layer. Let's dive into a technique where we can solve this problem by keeping concrete implementations internal while still benefiting from DI.

We can leverage the power of MEF. The Managed Extensibility Framework or MEF is a library for creating lightweight, extensible applications. It allows implicit discovering of extensions viacomposition without any configuration. An MEF component, called a part, declaratively specifies both its dependencies (known as imports) and what capabilities (known as exports) it makes available. When a part is created, the MEF composition engine satisfies its imports with what is available from other parts.

This approach solves the problems of hard-coding the references or configuring them in a fragile config file. MEF allows applications to discover and examine parts by their metadata, without instantiating them or even loading their assemblies. As a result, there is no need to carefully specify when and how extensions should be loaded. For more detail on MEF read Managed Extensibility Framework. MEF is an integral part of the .NET Framework 4.

Let's see how to implement MEF along with Unity to solve this problem:

We can define an interface IModule and its implementation ModuleInit in all the business or data layer assemblies. This interface will help to identify ModuleInit, and ModuleInit class will be responsible to register all internal implementations against defined interfaces. Let's dive straight into the code:

We will need reference of System.ComponentModel.Composition assembly to implement MEF.

Define IModuleRegistrar interface (A wrapper around IUnityContainer for registration). Allows objects implementing IModule to register types in unity.

namespace UnityDemo.Common
{
public interface IModuleRegistrar
{
void RegisterType<TFrom, TTo>() where TTo : TFrom;
}
}

Define IModule interface: To identify modules/domain using reflection.

namespace UnityDemo.Common
{
public interface IModule
{
void Initialize(IModuleRegistrar registrar);
}
}

A wrapper to register all the internal type with unity:

internal class ModuleRegistrar : IModuleRegistrar
{
private readonly IUnityContainer _container;
public ModuleRegistrar(IUnityContainer container)
{
this._container = container; //Register interception behaviour if any
}
public void RegisterType<TFrom, TTo>() where TTo : TFrom
{
this._container.RegisterType<TFrom, TTo>();
}
}

Create ModuleInit Classes in domain and data (wherever we need to register types with DI container)and Export typeof IModule [MEF implementation]thru attributes:

To Register all the Internal types of data layer - UserData:

namespace UnityDemo.Security.Data
{
[Export(typeof(IModule))]
public class ModuleInit : IModule
{
public void Initialize(IModuleRegistrar registrar)
{
registrar.RegisterType<IUserData, UserData>();
}
}
}

To Register all the Internal types of business layer/Domain:

namespace UnityDemo.Security
{
[Export(typeof(IModule))]
public class ModuleInit : IModule
{
public void Initialize(IModuleRegistrar registrar)
{
registrar.RegisterType<IUserDomain, UserDomain>();
}
}
}

To invoke registration: Identify the ModuleInit classes and call Initialize:

For this, we can define a static class ModuleLoader in Common project and call it from Bootstrapperof the host application. This class will have a method called LoadContainer, which will find all the MEF parts using import definitions. All the ModuleInit classes have export defined for the type IModule. In this way we can find all the modules and register them by calling the Initialize() method of ModuleInit classes.

Find all ModuleInit and call initialize to register types in various modules [MEF implementation]:

namespace UnityDemo.Common
{
public static class ModuleLoader
{
public static void LoadContainer(IUnityContainer container, string path, string pattern)
{
var dirCat = new DirectoryCatalog(path, pattern);
var importDef = BuildImportDefinition();
try
{
using (var aggregateCatalog = new AggregateCatalog())
{
aggregateCatalog.Catalogs.Add(dirCat);
using (var componsitionContainer = new CompositionContainer(aggregateCatalog))
{
IEnumerable<Export> exports = componsitionContainer.GetExports(importDef);
IEnumerable<IModule> modules =
exports.Select(export => export.Value as IModule).Where(m => m != null);
var registrar = new ModuleRegistrar(container);
foreach (IModule module in modules)
{
module.Initialize(registrar);
}
}
}
}
catch (ReflectionTypeLoadException typeLoadException)
{
var builder = new StringBuilder();
foreach (Exception loaderException in typeLoadException.LoaderExceptions)
{
builder.AppendFormat("{0}\n", loaderException.Message);
}
throw new TypeLoadException(builder.ToString(), typeLoadException);
}
}
private static ImportDefinition BuildImportDefinition()
{
return new ImportDefinition(
def => true, typeof(IModule).FullName, ImportCardinality.ZeroOrMore, false, false);
}
}

Call LoadContainer() of ModuleLoader from Bootstrapper in host application by passing assemblies path and file pattern (mostly project name's prefix and .dll as suffix):

Register all the internal type with unity by invoking ModuleLoader:

public static void RegisterTypes(IUnityContainer container)
{
//Comment the old way
//container.RegisterType<IUserData, UserData>();
//container.RegisterType<IUserDomain, UserDomain>();
//Module initialization thru MEF
ModuleLoader.LoadContainer(container, ".\\bin", "UnityDemo.*.dll");
}

Advantage/Benefits

1. Automatic registration of type exposed by any module. If we need to change implementation of any interface consumed by MVC Web application or a business layer, we can just define a new assembly and add to the bin folder of the Web application.

2. This application can easily be extended by combining various modules, which can be registered to Unity by MEF extensibility support during runtime.

3. All implementations of module or data layer remain internal to respective assemblies. So, it can't be misused (See the breaking code due to absence of domain and data): The concrete types are not accessible in controller now.

public ActionResult UsersEdit()
{
//Started accessing domain directly (Can't access now)
var userDomain = new UserDomain(new UserData());
var users = userDomain.GetAllUsers();
//Or even bad: accessing data layer (Can't access now)
var data = new UserData();
var users2 = data.GetAllUsers();
//Or even worse: Using entity models instance in UI layer (Can't access now)
using (var model = new SecurityModelContainer())
{
var users3 = from u in model.Users select u;
}
return View(users);
}

4. No reference of Data layer in MVC application: The business layer (UnityDemo.Security) is referenced only because we need all the assemblies in bin folder of MVC app, but it won't harm as we have all implementations in business layer internal to that assembly.

Figure 5: Ref in Web Solved

Source Code

I have developed this sample application; you can download it here. You can also find it on github. This application is in ASP.Net MVC and demonstrates the technique discussed above. The source code is not perfect in all sense as its primary focus is to demo this MEF based unity container initialization.

Github source code location: https://github.com/manoj-kumar1/DI-Best-Practices-in-N-tier

To run this application, you would need Visual Studio 2010 with MVC 4 installed. Once you open the solution you can go to Manage Nuget Package on UnityDemo.Web project and click following button to restore the missing packages.

Figure 5: Restore Package

Conclusion

In this article, you have learned a different technique to configure DI (Dependency Injection) using unity container and MEF. This technique solves a couple of problems. First, it helps to keep the business or data layer implementations internal to respective assemblies and second, the architectural rules are not violated while utilizing the benefits of a DI container. This leads to much cleaner modular application which can be extended quite easily.

Friday, 19 June 2015

Is Entity Framework Suitable For High-Traffic Websites?

All of the following things (roughly in order of importance) are going to affect throughput, and all of them are handled (sometimes in different ways) by most of the major ORM frameworks out there:

Database Design and Maintenance

This is, by a wide margin, the single most important determinant of the throughput of a data-driven application or web site, and often totally ignored by programmers.

If you don't use proper normalization techniques, your site is doomed. If you don't have primary keys, almost every query will be dog-slow. If you use well-known anti-patterns such as using tables for Key-Value Pairs (AKA Entity-Attribute-Value) for no good reason, you'll explode the number of physical reads and writes.

If you don't take advantage of the features the database gives you, such as page compression, FILESTREAM storage (for binary data), SPARSE columns, hierarchyid for hierarchies, and so on (all SQL Server examples), then you will not see anywhere near the performance that you could be seeing.

You should start worrying about your data access strategy after you've designed your database and convinced yourself that it's as good as it possibly can be, at least for the time being.
Eager vs. Lazy Loading

Most ORMs used a technique called lazy loading for relationships, which means that by default it will load one entity (table row) at a time, and make a round-trip to the database every time it needs to load one or many related (foreign key) rows.

This isn't a good or bad thing, it rather depends on what's actually going to be done with the data, and how much you know up-front. Sometimes lazy-loading is absolutely the right thing to do. NHibernate, for example, may decide not to query for anything at all and simply generate a proxy for a particular ID. If all you ever need is the ID itself, why should it ask for more? On the other hand, if you are trying to print a tree of every single element in a 3-level hierarchy, lazy-loading becomes an O(N²) operation, which is extremely bad for performance.

One interesting benefit to using "pure SQL" (i.e. raw ADO.NET queries/stored procedures) is that it basically forces you to think about exactly what data is necessary to display any given screen or page. ORMs and lazy-loading features don't prevent you from doing this, but they do give you the opportunity to be... well, lazy, and accidentally explode the number of queries you execute. So you need to understand your ORMs eager-loading features and be ever vigilant about the number of queries you're sending to the server for any given page request.
Caching

All major ORMs maintain a first-level cache, AKA "identity cache", which means that if you request the same entity twice by its ID, it doesn't require a second round-trip, and also (if you designed your database correctly) gives you the ability to use optimistic concurrency.

The L1 cache is pretty opaque in L2S and EF, you kind of have to trust that it's working. NHibernate is more explicit about it (Get/Load vs. Query/QueryOver). Still, as long as you try to query by ID as much as possible, you should be fine here. A lot of people forget about the L1 cache and repeatedly look up the same entity over and over again by something other than its ID (i.e. a lookup field). If you need to do this then you should save the ID or even the entire entity for future lookups.

There's also a level 2 cache ("query cache"). NHibernate has this built-in. Linq to SQL and Entity Framework have compiled queries, which can help reduce app server loads quite a bit by compiling the query expression itself, but it doesn't cache the data. Microsoft seems to consider this an application concern rather than a data-access concern, and this is a major weak point of both L2S and EF. Needless to say it's also a weak point of "raw" SQL. In order to get really good performance with basically any ORM other than NHibernate, you need to implement your own caching façade.

There's also an L2 cache "extension" for EF4 which is okay, but not really a wholesale replacement for an application-level cache.
Number of Queries

Relational databases are based on sets of data. They're really good at producing large amounts of data in a short amount of time, but they're nowhere near as good in terms of query latency because there's a certain amount of overhead involved in every command. A well-designed app should play to the strengths of this DBMS and try to minimize the number of queries and maximize the amount of data in each.

Now I'm not saying to query the entire database when you only need one row. What I'm saying is, if you need the Customer, Address, Phone, CreditCard, and Order rows all at the same time in order to serve a single page, then you should ask for them all at the same time, don't execute each query separately. Sometimes it's worse than that, you'll see code that queries the same Customerrecord 5 times in a row, first to get the Id, then the Name, then the EmailAddress, then... it's ridiculously inefficient.

Even if you need to execute several queries that all operate on completely different sets of data, it's usually still more efficient to send it all to the database as a single "script" and have it return multiple result sets. It's the overhead you're concerned with, not the total amount of data.

This might sound like common sense but it's often really easy to lose track of all the queries that are being executed in various parts of the application; your Membership Provider queries the user/role tables, your Header action queries the shopping cart, your Menu action queries the site map table, your Sidebar action queries the featured product list, and then maybe your page is divided into a few separate autonomous areas which query the Order History, Recently Viewed, Category, and Inventory tables separately, and before you know it, you're executing 20 queries before you can even start to serve the page. It just utterly destroys performance.

Some frameworks - and I'm thinking mainly of NHibernate here - are incredibly clever about this and allow you to use something called futures which batch up entire queries and try to execute them all at once, at the last possible minute. AFAIK, you're on your own if you want to do this with any of the Microsoft technologies; you have to build it into your application logic.
Indexing, Predicates, and Projections

At least 50% of devs I speak to and even some DBAs seem to have trouble with the concept of covering indexes. They think, "well, the Customer.Name column is indexed, so every lookup I do on the name should be fast." Except it doesn't work that way unless the Name index covers the specific column you're looking up. In SQL Server, that's done with INCLUDE in the CREATE INDEXstatement.

If you naïvely use SELECT * everywhere - and that is more or less what every ORM will do unless you explicitly specify otherwise using a projection - then the DBMS may very well choose to completely ignore your indexes because they contain non-covered columns. A projection means that, for example, instead of doing this:
```
from c in db.Customers where c.Name == "John Doe" select c
```
You do this instead:
```
from c in db.Customers where c.Name == "John Doe"
select new { c.Id, c.Name }
```
And this will, for most modern ORMs, instruct it to only go and query the Id and Name columns which are presumably covered by the index (but not the Email, LastActivityDate, or whatever other columns you happened to stick in there).

It's also very easy to completely blow away any indexing benefits by using inappropriate predicates. For example:
```
from c in db.Customers where c.Name.Contains("Doe")
```
...looks almost identical to our previous query but in fact will result in a full table or index scan because it translates to LIKE '%Doe%'. Similarly, another query which looks suspiciously simple is:
```
from c in db.Customers where (maxDate == null) || (c.BirthDate >= maxDate)
```
Assuming you have an index on BirthDate, this predicate has a good chance to render it completely useless. Our hypothetical programmer here has obviously attempted to create a kind of dynamic query ("only filter the birth date if that parameter was specified"), but this isn't the right way to do it. Written like this instead:
```
from c in db.Customers where c.BirthDate >= (maxDate ?? DateTime.MinValue)
```
...now the DB engine knows how to parameterize this and do an index seek. One minor, seemingly insignificant change to the query expression can drastically affect performance.

Unfortunately LINQ in general makes it all too easy to write bad queries like this because sometimesthe providers are able to guess what you were trying to do and optimize the query, and sometimes they aren't. So you end up with frustratingly inconsistent results which would have been blindingly obvious (to an experienced DBA, anyway) had you just written plain old SQL.

Basically it all comes down to the fact that you really have to keep a close eye on both the generated SQL and the execution plans they lead to, and if you're not getting the results you expect, don't be afraid to bypass the ORM layer once in a while and hand-code the SQL. This goes for any ORM, not just EF.
Transactions and Locking

Do you need to display data that's current up to the millisecond? Maybe - it depends - but probably not. Sadly, Entity Framework doesn't give you nolock, you can only use READ UNCOMMITTED at the transaction level (not table level). In fact none of the ORMs are particularly reliable about this; if you want to do dirty reads, you have to drop down to the SQL level and write ad-hoc queries or stored procedures. So what it boils down to, again, is how easy it is for you to do that within the framework.

Entity Framework has come a long way in this regard - version 1 of EF (in .NET 3.5) was god-awful, made it incredibly difficult to break through the "entities" abstraction, but now you haveExecuteStoreQuery and Translate, so it's really not too bad. Make friends with these guys because you'll be using them a lot.

There's also the issue of write locking and deadlocks and the general practice of holding locks in the database for as little time as possible. In this regard, most ORMs (including Entity Framework) actually tend to be better than raw SQL because they encapsulate the unit of Work pattern, which in EF isSaveChanges. In other words, you can "insert" or "update" or "delete" entities to your heart's content, whenever you want, secure in the knowledge that no changes will actually get pushed to the database until you commit the unit of work.

Note that a UOW is not analogous to a long-running transaction. The UOW still uses the optimistic concurrency features of the ORM and tracks all changes in memory. Not a single DML statement is emitted until the final commit. This keeps transaction times as low as possible. If you build your application using raw SQL, it's quite difficult to achieve this deferred behaviour.

What this means for EF specifically: Make your units of work as coarse as possible and don't commit them until you absolutely need to. Do this and you'll end up with much lower lock contention than you would using individual ADO.NET commands at random times.

In Conclusion:

EF is completely fine for high-traffic/high-performance applications, just like every other framework is fine for high-traffic/high-performance applications. What matters is how you use it. Here's a quick comparison of the most popular frameworks and what features they offer in terms of performance (legend: N = Not supported, P = Partial, Y = yes/supported):

                                | L2S | EF1 | EF4 | NH3 | ADO
                                +-----+-----+-----+-----+-----
Lazy Loading (entities)         |  N  |  N  |  N  |  Y  |  N
Lazy Loading (relationships)    |  Y  |  Y  |  Y  |  Y  |  N
Eager Loading (global)          |  N  |  N  |  N  |  Y  |  N
Eager Loading (per-session)     |  Y  |  N  |  N  |  Y  |  N
Eager Loading (per-query)       |  N  |  Y  |  Y  |  Y  |  Y
Level 1 (Identity) Cache        |  Y  |  Y  |  Y  |  Y  |  N
Level 2 (Query) Cache           |  N  |  N  |  P  |  Y  |  N
Compiled Queries                |  Y  |  P  |  Y  |  N  | N/A
Multi-Queries                   |  N  |  N  |  N  |  Y  |  Y
Multiple Result Sets            |  Y  |  N  |  P  |  Y  |  Y
Futures                         |  N  |  N  |  N  |  Y  |  N
Explicit Locking (per-table)    |  N  |  N  |  N  |  P  |  Y
Transaction Isolation Level     |  Y  |  Y  |  Y  |  Y  |  Y
Ad-Hoc Queries                  |  Y  |  P  |  Y  |  Y  |  Y
Stored Procedures               |  Y  |  P  |  Y  |  Y  |  Y
Unit of Work                    |  Y  |  Y  |  Y  |  Y  |  N

As you can see, EF4 (the current version) doesn't fare too badly, but it's probably not the best if performance is your primary concern. NHibernate is much more mature in this area and even Linq to SQL provides some performance-enhancing features that EF still doesn't. Raw ADO.NET is often going to be faster for very specific data-access scenarios, but, when you put all the pieces together, it really doesn't offer a lot of important benefits that you get from the various frameworks.

And, just to make completely sure that I sound like a broken record, none of this matters in the slightest if you don't design your database, application, and data access strategies properly. All of the items in the chart above are for improving performance beyond the baseline; most of the time, the baseline itself is what needs the most improvement.

Perfect MVC code