Writings

Documents

Weblogs

Java Memory Management: A Tragedy of the Commons (April 19, 2004)

I was working with NetBeans the other day, debugging some of my own code. I had been running the debugger repeatedly, tracing through the code over and again. NetBeans gradually slowed down, and stepping through lines of code became downright painful. I finally stopped and restarted it.

Some time later, maybe the next day, I thought maybe this was due to a memory limitation--that perhaps NB was running out of memory and the garbage collector was working overtime trying to free up some small portion near the top that wasn't spoken for. Sure enough, some time in the past I had capped the memory at 96M in my configuration settings. Increasing it to 128M cleared the problem, at least for the time being.

I'm not writing to bag on NB. It's a good tool--the 3.6 release is particularly good--and I use it, as well as jEdit, regularly. What I started to think about was the problem with memory usage in Java applications, particularly GUI applications using the Swing toolkit.

The thought I had was that, one the one hand, Java frees us from worrying about memory management, which is great. On the other hand, perhaps this apparent freedom leads us to treat memory too casually--particularly, as an indifferentiated, massive pool from which we can draw without restraint.

If you are reading this blog, you probably don't need to be told of the advantages of memory management in Java. But just to point out--we have safe access to arrays, protection from invalid pointer references, automatic memory allocation and deallocation. This is all good. It seems to save a lot of time. The garbage collectors have gotten much better over the last few years, and the dreaded, "GC! Stop and put down your weapons!" pause is less noticeable even in complex GUI apps these days.

On the other hand, we now create programs within a safe environment--an environment which, while more simple and less dangerous than where we used to live, also insulates us from the realities of the world our programs run in. These realities cannot be done away with just by locking the door and taping the windows. There are costs to using memory on most operating systems that most programmers develop for these days. There is a cost, in time, to allocate memory, track it and to release it; there are actual limitation to physical memory and virtual memory on disk is incredibly expensive to use (performance-wise). But we are isolated from these costs because in the sandbox where we live, they are invisible. We have to look for them, test for them, probe for their presence. When they stick their ugly head up, as happened with me, finding out the root cause of the problem is almost pointless, given the growing complexity of our applications. Not impossible, just very time consuming.

What makes this even worse is that in using Java, a great deal of our power comes not from the expressive power of the language itself, but from the large, and growing, library of packages we have access to. This includes not only the large and impressive JDK, but a multitude of free software, open source and commercial packages we may end up pulling into our projects. It takes time enough to find and learn how to use any of these packages. In almost all of those I can think of, there is nothing in the documentation related to how much memory any given class will use, and, by extension, no information about how much a combination of classes in that package will use. I have a 10KB XML file. How much memory will it take when loaded into an XML DOM? What about using toolkit X versus toolkit Y? What about if I include or exclude comments from the XML file? Do I have options for reducing the footprint at all? In my experience, we just don't know. At best we might get a general comment on a toolkit's readme, something like, "Memory footprint reduced 10% in this release."

And yet the tools for figuring this out, for making the problem visible, are employed after the fact--memory profilers. Some of the new information available from the VM in JDK 1.5 might make this process a little easier.

I suggest the root problem is that inherent in the language design is a message: don't worry about it. Don't worry about how memory is allocated, by which process, using which API call in the O/S. Don't worry about how much space a given class will take. Just start coding. When you need an object, just instantiate the class. Once your program is written, you can handle major memory problems by just increasing the size of the heap, or by running a profiler and punishing the worst offenders.

To make matters worse, in current coding trends programmers often recommend that we use caching to improve performance across an application. So not only do we not know how much memory we are using--we grab it and hold on to it for the long term. I suspect this is design feature of NB that aggravated the problem. The garbage collector was doing its job--there just was barely any memory it could free up, because most of it was spoken for, and would not be released.

So I was thinking of this as a modern version of "The Tragedy of the Commons". I won't repeat that chestnut here. The point is that memory is a shared resource, in two senses. First, it is shared between programs running on your PC, and with the OS itself. Second, it is shared between you (writing your program), other people you are coding with, and every person who had a hand in all those libraries and toolkits you are using. All of them are drawing from the same pool of memory. All of them are acting as if, in general, there was no real cost to using that memory. And even if they did think about it, we probably don't know anyway.

My general thought here is not that there is a fundamental problem in Java's memory management model, just that it gives us a false sense of complete isolation and freedom. We are not completely isolated or completely free from worrying about memory. It's similar to what Joel Spolsky calls "leaky abstractions". JDBC doesn't isolate us from differences in database engines. You could write completely generic SQL using JDBC (I think) but in the real world you find you can't--you have to optimize for access paths to tables, just to take one example, and that may take advantages of indexing features available on one database platform but not another. Your code is pretty portable, but will run differently on another RDBMS, because the JDBC abstraction is "leaky": in this case, the underlying RDBMS shows through in how the application performs.

So I think the danger is this illusion of freedom. What's not clear to me is if this is a problem that would be less prevalent if my code were sprinkled with calls to allocate and deallocate memory (thus reminding me of what I was using).

I'm not sure, actually, I have few ideas, as I write this, for how to properly name the problem, much less to suggest an alternative or a solution. The problems seem to include:

  1. Without extensive runtime profiling, I have no idea how much memory my application will require at startup, using various configuration parameters, on different JVMs, etc.
  2. Without extensive runtime profiling, I have no idea how much memory classes, or combinations of classes, will use. That includes my own classes and those in many other packages I reference in my application, as well as all the classes referenced indirectly by those packages I know nothing about.
  3. I have few options for controlling memory use once I find a problem. Repeat after me: "I will not recode javax.swing.text to be more memory efficient, I will not recode..." and so on.
  4. If I do accurately profile an application--or, if I do profile a single class--I have no idea how much of the memory use is data dependent, and how much is data independent.
and so on.

Finally, it's possible that even if I did have all this information, it's mostly pointless. Outside of very memory-constrained applications, we usually don't care unless our users are impacted (by being able to run fewer apps side by side without performance degradation due to GC and virtual memory swapping), or unless we are in danger of running out of memory.

You can comment on this on the jRoller Website site, the the host for the blog entry above.