I did it again! That’s why I’m writing this blog post...I do it almost every time and the main reason is: trying to optimize NHibernate queries without thinking.

As the title of the posts says the problem here is related to fetching the data of different collections attached to a single entity. To show you up the problem let’s first see the test domain, it is very simple:

public class Post : Entity<int>
{
	protected Post()
	{
	}

	public Post(string slug)
	{
		Slug = slug;
		Categories = new List<Category>();
		Tags = new List<Tag>();
	}

	public string Slug { get; set; }

	public IList<Category> Categories { get; set; }

	public IList<Tag> Tags { get; set; }	
}

public class Category : Entity<int>
{
	public string Name { get; set; }
}

public class Tag : Entity<int>
{
	public string Name { get; set; }
}

I generated some test data consisting of a single post which has 2 categories and 2 tags associated to it.

We want to query the Post object together with all it’s data, the first thing we can do is this:

[Test]
public void NoFetch_Success()
{
	using (ISession s = _nh.SessionFactory.OpenSession())
	{
		// look for a non existing revision
		using (var tx = s.BeginTransaction())
		{
			var p = s.CreateQuery("from Post p where p.Slug = :slug")
				.SetParameter("slug", "Slug")
				.UniqueResult<Post>();

			tx.Commit();

			Assert.IsNotNull(p);
			Assert.AreEqual(2, p.Categories.Count);
			Assert.AreEqual(2, p.Tags.Count);
		}
	}
}

Straightforward and simple; this thing produces 3 different queries, one for each kind of object:

select post0_.Id as Id0_, post0_.Slug as Slug0_ from Post post0_ where post0_.Slug=@p0;@p0 = 'Slug' [Type: String (4000)]

SELECT categories0_.PostId as PostId1_, categories0_.Id as Id1_, categories0_.Id as Id1_0_, categories0_.Name as Name1_0_ FROM Category categories0_ WHERE categories0_.PostId=@p0;@p0 = 1 [Type: Int32 (0)]

SELECT tags0_.PostId as PostId1_, tags0_.Id as Id1_, tags0_.Id as Id2_0_, tags0_.Name as Name2_0_ FROM Tag tags0_ WHERE tags0_.PostId=@p0;@p0 = 1 [Type: Int32 (0)]
		

The returned data are all correct.

But we really have too many queries here and we want to optimize it a bit (in order to limit/prevent the impact of the n+1 selects problem), so in my optimization berserker fury I started using fetch joins to retrieve more data in a single pass and I wrote this query:

from Post p left join fetch p.Categories left join fetch p.Tags where p.Slug = :slug

which is an absolute mistake, because it generates duplicated data; let’s see some test code:

[Test]
public void FetchALL_Failure()
{
	using (ISession s = _nh.SessionFactory.OpenSession())
	{
		// look for a non existing revision
		using (var tx = s.BeginTransaction())
		{
			// try adding a: select distinct p 
			var p = s.CreateQuery("from Post p left join fetch p.Categories left join fetch p.Tags where p.Slug = :slug")
				.SetParameter("slug", "Slug")
				.UniqueResult<Post>();

			tx.Commit();

			Assert.IsNotNull(p);
			Assert.AreEqual(2, p.Categories.Count); // <- big exception... we have 4 rows
			Assert.AreEqual(2, p.Tags.Count);
		}
	}
}

which produces 1 single query:

select post0_.Id as Id0_0_, categories1_.Id as Id1_1_, tags2_.Id as Id2_2_, post0_.Slug as Slug0_0_, categories1_.Name as Name1_1_, categories1_.PostId as PostId0__, categories1_.Id as Id0__, tags2_.Name as Name2_2_, tags2_.PostId as PostId1__, tags2_.Id as Id1__ from Post post0_ left outer join Category categories1_ on post0_.Id=categories1_.PostId left outer join Tag tags2_ on post0_.Id=tags2_.PostId where post0_.Slug=@p0;@p0 = 'Slug' [Type: String (4000)]

But the data are all wrong! In fact the Assert that should check for the correct numbers of categories are telling me that I have 4 rows inside that collection. I should have reminded earlier what the NHibernate documentation says about this situation:

It is possible to create a cartesian product by join fetching more than one collection in a query, so take care in this case. Join fetching multiple collection roles is also disabled for bag mappings. Note also that the fetch construct may not be used in queries called using Enumerable(). Finally, note that full join fetch and right join fetch are not meaningful. (Reference documentation, 13.3 Associations and joins)

Note that adding a ‘select distinct p’ will not help in this case.

To fix this we need to have just one fetch join at each level of the hierarchy; we can write something like this:

var p = s.CreateQuery("from Post p left join fetch p.Categories where p.Slug = :slug")
						.SetParameter("slug", "Slug")
						.UniqueResult<Post>();

This code will generate 2 queries (one to get the Post and the Categories and one to get the Tags) and the test will pass:

select post0_.Id as Id12_0_, categories1_.Id as Id13_1_, post0_.Slug as Slug12_0_, categories1_.Name as Name13_1_, categories1_.PostId as PostId0__, categories1_.Id as Id0__ from Post post0_ left outer join Category categories1_ on post0_.Id=categories1_.PostId where post0_.Slug=@p0;@p0 = 'Slug' [Type: String (4000)]
		
SELECT tags0_.PostId as PostId1_, tags0_.Id as Id1_, tags0_.Id as Id14_0_, tags0_.Name as Name14_0_ FROM Tag tags0_ WHERE tags0_.PostId=@p0;@p0 = 1 [Type: Int32 (0)]

But...wait! .UniqueResult<T>() here is smart enough to hide us a fetch join side effect; if we use .List<T>() instead we will see it clearly: the fetch introduces ‘ghosts’ in our returned resultset; the data will be a combination of Posts and Categories fields, so for any single Post object we will have ‘n’ records (one for each Category bound to the Post) and NHibernate will output ‘n’ Post entities, all of them containing the same data.

The solution here is pretty simple: just add the ‘distinct’ keyword to the select statement (or use the DistinctRootEntityTransformer), this way we are telling NHibernate to give us back all the unique instances of Post objects together with their Categories, here is the code:

[Test]
public void FetchSingleCollection_Distinct_Success()
{
	using (ISession s = _nh.SessionFactory.OpenSession())
	{
		// look for a non existing revision
		using (var tx = s.BeginTransaction())
		{
			var p = s.CreateQuery("select distinct p from Post p left join fetch p.Categories where p.Slug = :slug")
				.SetParameter("slug", "Slug")
				.UniqueResult<Post>();
			tx.Commit();

			Assert.IsNotNull(p);
			Assert.AreEqual(2, p.Categories.Count);
			Assert.AreEqual(2, p.Tags.Count);
		}
	}
}

If I remind it well, there’s no need to use the distinct on the queries generated using Linq to NHibernate, because the provider does it for you.

As usual here’s the test project for those who want to play with it:

I am hoping, now that I’ve wrote it down, to not make the same mistake again Open-mouthed smile...

Related Content