A WinDbg Debugging Journey - NHibernate Memory Leak

by Rasmus Kromann-Larsen December 19, 2008 20:29

Disclaimer: This is not a stab at the NHibernate team. They are doing an awesome job, it might as well (and for a long time I thought it was) have been in my own code. In addition - the memory leak is already solved on the NHibernate trunk.

Introduction

A few weeks back, an ASP.NET application (using NHibernate 2.0.0.4000) I am running got under heavier load than usual. I had noted that the memory usage was slightly high earlier, but it had never been a real problem - this is all the server is doing. However, under heavier load, memory pressure started approaching 700-800mb and the dreaded OutOfMemoryException started appearing when doing big chunks of work.

To be honest, I have never done much memory debugging - learning opportunity! If you do a sweep of the web these days on .NET and debugging, you will no doubt find the blog of Tess Ferrandez, who is an ASP.NET Escalation Engineer working at Microsoft. She has even done a lab series aptly named buggy bits that ease you through debugging and identifying various kinds of application problems.

After reading through her articles and watching her TechEd presentation on the subject, I downloaded WinDbg, configured it as Tess had described and started experimenting. This blog post will describe my journey and hopefully help other solve similar problems.

The Puzzle

The first thing I did was to grab a memory with adplus, one of the tools included with WinDbg. From my understanding, it stops the application momentarily and just writes the entire contents of memory to disk. This produced a huge .DMP file - a memory dump. My managed heap was at around 800mb at the time, but the dump file was slightly bigger.

Working with WinDbg is not your standard draggy-droppy windows application, it looks sort of like a console and you type bizarre command and it produces even more bizarre output for you to reason about. I started out using the "!eeheap -gc" command, which produces some basic information about your heaps.

image

As you can see, my heap size was around 814mb. If you dig into the information (not all shown on screenshot), you will find that my garbage collectors generation 2 is much bigger than generation 0 and 1. (You can read more about garbage collection and generational garbage collection here.)

After looking at this, I fired off the "!dumpheap -stat" command to get an overview of the objects in the heap.

image

The output looked like that, the first column denoting the type of object, second is number of objects, third is the shallow size of the objects, that is, not including whatever it references - the fourth is the type name.

Now, the first time I looked at this, I noted the NHibernate objects but focused more on the 385mb of strings - usually, SELECT isn't broken - I was convinced this was a problem in my code. I dug a bit deeper but didn't really find much, partly because WinDbg isn't super easy. This lead me to find some other places in my code that needed StringBuilders, but this proved not to be the root cause (thanks anyway Søren).

Later, when thinking about the problem, it came to me that that maybe 1.9 million NHibernate SqlStrings was a wee bit too many. I decided to did deeper into this - I found the SqlString in my list and found that the type is denoted with 0eafd714. Now, !dumpheap can do more that just give you statistics, it can give you filtered lists using various arguments. I wanted to sample some instances of these SqlStrings to see where they were bounded, so I used the command "!dumpheap -mt 0eafd714". This makes WinDbg give me a list of all the instances of the NHibernate SqlString - this is a very long list.

image

Now, the second column denotes the type, the third is the shallow object size and the first is what we are looking for - the instance address. I picked a few of them at random and used the !gcroot command to show where they were rooted. That is, give me the chain of references that lead to this object. An example is "!gcroot 57c2f130", which produces the following output.

image

You can see the actual instance at the bottom and then follow the chain upwards. It seems this current SqlString is rooted in a QueryPlanCache in the NHibernate SessionFactory.

At this point I actually downloaded the NHibernate source and started looking around. Conceptually, the NHibernate SessionFactory keeps a cache of recent HQL queries, so it doesn't have to rebuild them. According to the source code, it would store the 128 most recently used queries.

Now WinDbg can actually tell you the "deep" size of an object, object size + objects it references. This is done using the !objsize command. Now this literally took several hours of processing, so I don't have a screenshot for the blog post, but executing "!objsize 067016bc" command should give me the memory size of my SessionFactory. According to my log file, it told me:

sizeof(067016bc) =    716798348 (  0x2ab9798c) bytes (NHibernate.Impl.SessionFactoryImpl)

That is one big SessionFactory (~700mb). I dug further down the reference chain to try and figure out what was wrong with the cache. Remember I said that this cache was supposed to hold 128 queries. When I got to the hashtable in the cache and dumped it using the "!do 067030d4" command, it revealed the following:

image

According to this, my cache contains 92000 queries. After digging around in the code, writing a few unit tests and getting some help from the NHibernate user group, I finally found out that it was a bug in the object indexer in the LRUMap, such that it didn't enforce the 128 limit properly.

It was a small innocent bug, but having a 700mb (and growing) hashtable hanging around in your system forever is not really that pleasant. I ended up writing a hack that used reflection to access the field that contained the cache and clearing it periodically. It is already fixed on the NHibernate trunk, but I haven't gotten around to updating yet.

Since implementing my clear hack, I haven't seen memory usages above 50mb.

Conclusion

I've told my small debugging tale of how I got introduced to WinDbg and how it helped me track down a major issue in my application and reduce memory usage from ~800mb to ~50mb. It's a funky tool and can be quite scary at first, but if it helps me remove memory leaks, I am all for it. The second lesson learned is that sometimes - although I still won't look there first - SELECT is broken.

kick it on DotNetKicks.com

Tags: , , ,

Development | NHibernate

Comments

12/19/2008 8:34:25 PM #

trackback

Trackback from DotNetKicks.com

A WinDbg Debugging Journey - NHibernate Memory Leak

DotNetKicks.com

12/20/2008 10:17:43 AM #

Brian Holmgård Kristensen

Hi Rasmus,

Very interesting article - we are also using NHibernate on one of the projects that I'm currently working on. However we are firing only a very small count of SQL-queries per lifetime of the SessionFactory instance, so I guess that we hopefully don't have to apply a patch/fix/upgrade for that in the near future.

Have a nice Christmas Smile

/Brian

Brian Holmgård Kristensen Denmark

12/20/2008 12:04:43 PM #

Rasmus Kromann-Larsen

No it shouldn't be a problem then I guess.

Instantiating the SessionFactory is just expensive, so what I have found at least is that the recommended practice is to keep it around for a long time.

I'm guessing you have good reasons for re-creating it Smile

Rasmus Kromann-Larsen Denmark

1/30/2009 9:05:17 AM #

nico

Thanks for the very informative Blog-Post,
If possible, could you further specify or post a code snippet that frees the particular structure? That would be a great help for me, i'm using nhibernate
binary version 1.2.1.4000 which probably still contains the leak.
Regards nico

nico Germany

1/30/2009 9:35:36 AM #

Rasmus Kromann-Larsen

Hi nico.

This problem is specific to NHibernate 2.0 - so I don't think you have the problem in version 1.2.1.4000.

Rasmus Kromann-Larsen Denmark

2/2/2009 4:25:45 PM #

Carl Lamm


I have run into a similar problem. We have just introduced NHibernate in a few pages of .NET based web app, and now the aspnet_wp memory keeps growing until it recycles due to OutOfMemoryException every third day or so.

Hoping for a solution to the problem, I got the latest NHibernate source (version 2.1.0.1001), compiled and deployed it in my app. But I see that memory is still growing...

Doing a similar wndbg analysis as above I notice that the size of SessionFactoryIpmpl is about 1 MB. But then I looked at the count of "NHibernate.Impl.SessionFactoryImpl" objects, and there are 163 of them... explaining why the GC Heap is about 163 MB. Shouldn't there only be one, or is there one for every NHibernate session that is open?

Brgds,
CC

Carl Lamm Sweden

2/2/2009 6:09:57 PM #

Rasmus Kromann-Larsen

Hi Carl,

You should check the lifecycle of your session factories - there is only supposed to be one. The memory leak I described is fixed in the NHibernate trunk.

Having multiple session factories will both cost you a lot of memory (like you already experienced) and it is a very expensive object to create, so it's almost certain to hurt performance quite a bit.

Rasmus Kromann-Larsen Denmark

2/3/2009 8:59:38 PM #

Carl Lamm

Hi again,

By the way, thanks for the valuable article which headed me in the right direction!

It seems that I have now foud a solution to my OutOfMemoryException problem.

Investigating the creation of SessionFcatory instances led me to the "Init()" function in global.asax where a call initializes NHibernate and creates the SessionFactory as a part of a singleton class (We have based the session handling on an existing best practice). However, reading about the Init() function I realize that it is called once for every instance of the httpApplication class - which indeed can exist in multiple instances. It is the Application_OnStart() event which is triggered only once per application in Global.asax.

So - I simply moved the NHibernate initialization from Init() to Application_OnStart(), and the aspnet_wp process memory now stays at around 60MB without growing - and we do not run into the OOM exception anymore!

I am not yet fully sure if there is something else I am missing when I do this change, but it seems more logical if we are to stay with just ONE instance of the SessionFactory. Just out of curiosity - in your implementation, where are you triggering the creation of the SessionFactory?

Best regards,
CC

Carl Lamm Sweden

2/3/2009 9:57:26 PM #

Rasmus Kromann-Larsen

Hi Carl

I'm happy that my post helped you to solve your problem (even though it wasn't the same problem :-P).

I'm not even aware of the effects of using init in global.asax, but my NHibernate configuration is indeed also initiated from application start.

I know that the bug I talked about it fixed on the NHibernate trunk and I guess in your version too. I believe that it's in both NHibernate version 2.0.0.4000 and 2.0.1.4000 though - checked the sources yesterday.

Rasmus Kromann-Larsen Denmark

Comments are closed

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant | Adjusted by Rasmus Kromann-Larsen

About Me

I am a danish .NET developer blogging about the technical side of my life, mostly .NET stuff, but also fundamental topics like design patterns, principles and productivity boosters.

In addition, I am a core group member of Aarhus .NET User Group.