It seems Tomcat and Java are gushing memory on us. I’ve been kicking the tires on YourKit Java Profiler, which thus far has been running without incident on one of our production servers. Over the past 48 hours, the memory usage has grown by 34 MB, which is consistent with past evidence of memory leaks in our system. (Soundtrack for this post: “When Animals Attack” - Institute)
The results are fairly interesting - the number one culprit is org.xbill.DNS.*, which accounts for over half of the growth. Apparently Tomcat 3.3 (we’re upgrading to 5.5 soon) has DNS lookups and caching - that cache seems to be unlimited in size, with no expiration, and cannot be turned off [in 3.3]. So in an application that receives callbacks from high-traffic corporate web sites, with a broad range if inbound IP addresses, this becomes a large memory leak. This is why JVMTI is such a leap over JVMPI - my first experience with a Java profiler was JProbe, and my main impression of it was how tediously slow it was. With JVMTI agents, our app runs pretty much at full speed, and can be profiled on demand. Finding this leak in a development or QA environment with a JVMPI profiler would be nearly impossible, because you just don’t tend to test with thousands of distinct IP addresses in QA. The other interesting leak is a JVM Bug that will be fixed in Java 6. It seems that ObjectInputStream uses a SoftCache that results in retained references to classes that have been serialized. This explains the wealth of new HashMap$Entry and byte[] objects that are sitting in memory. In a way, the results are encouraging - over the past 48 hours, running at high volume, the memory used by classes we’ve written has actually dropped by 150 kb. So the up side is that the memory growth isn’t our fault. The down side is that it’s much harder for us to resolve the issues since they’re not in our code.Feed
Recent Twitter Activity
- rkischuk: @wei_yang What you have to ask is if the risk of betting on Amazon is higher than the risk of trying to build your own scalable HA cluster
- rkischuk: @wei_yang S3 has NO SLA. Uptime is whatever they decide it is. Such risk w/ no guarantees makes it hard for some to make big bets on S3
- rkischuk: "No ETA" on S3 outage as of 1:30 means I'll be context switching to working on this law firm web site
- rkischuk: S3 being down is a very bad thing. Amazon acknowledegs an issue, no ETA: http://is.gd/YkH SQS also down.
- rkischuk: @mmealling I have heard nothing but pain & suffering about "Dream"host
Recent Posts
Archives
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- August 2007
- July 2007
- May 2007
- April 2007
- February 2007
- January 2007
- December 2006
- October 2006
- September 2006
- June 2006
- May 2006
- April 2006
- March 2006
- January 2006
- November 2005
- October 2005
- September 2005
- August 2005
- July 2005
- June 2005
- May 2005
- April 2005
- March 2005
- February 2005
- January 2005
- December 2004
- November 2004
- October 2004
- September 2004
- August 2004
- July 2004
- June 2004
- May 2004
- April 2004
- March 2004
- February 2004
- January 2004
- November 2003