linugb118--java space

Java

java内存泄漏

Don't we all remember the days when we programmed C or C++? You had to use new and delete to explicitly create and remove objects. Sometimes you even had to malloc() an amount of memory. With all these constructs you had to take special care that you cleaned up afterwards, else you were leaking memory.

Now however, in the days of Java, most people aren't that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all normal cases. But sometimes, the Garbage Collector can't clean up, because you still have a reference, even though you didn't know that.

I stumbled across this small program while reading JavaPedia, which clearly shows that Java is also capable of inadvertent memory leaks.

public class TestGC {
  private String large = new String(new char[100000]);

 

  public String getSubString() {
    return this.large.substring(0,2);
  }

  public static void main(String[] args) {
    ArrayList<String> subStrings = new ArrayList<String>();
    for (int i = 0; i <1000000; i++) {
      TestGC testGC = new TestGC();
      subStrings.add(testGC.getSubString());
    }
  }
}

 

Now, if you run this, you'll see that it crashes with something like the following stacktrace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.String.(String.java:174)
at TestGC.(TestGC.java:4)
at TestGC.main(TestGC.java:13)

Why does this happen? We should only be storing 1,000,000 Strings of length 2 right? That would amount to about 40Mb, which should fit in the PermGen space easily. So what happened here? Let's have a look at the substring method in the String class.

public class String {
  // Package private constructor which shares value array for speed.
  String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
  }

 

  public String substring(int beginIndex, int endIndex) {
    if (beginIndex <0) {
      throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex> count) {
      throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex> endIndex) {
      throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
      new String(offset + beginIndex, endIndex - beginIndex, value);
  }

 

We see that the substring call creates a new String using the given package protected constructor. And the one liner comment immediately shows what the problem is. The character array is shared with the large string. So instead of storing very small substrings, we were storing the large string every time, but with a different offset and length.

This problem extends to other operations, like String.split() and . The problem can be easily avoided by adapting the program as follows:

public class TestGC {
  private String large = new String(new char[100000]);

 

  public String getSubString() {
    return new String(this.large.substring(0,2)); // <-- fixes leak!
  }

  public static void main(String[] args) {
    ArrayList<String> subStrings = new ArrayList<String>();
    for (int i = 0; i <1000000; i++) {
      TestGC testGC = new TestGC();
      subStrings.add(testGC.getSubString());
    }
  }
}

 

I have many times heard, and also shared this opinion that the String copy constructor is useless and causes problems with not interning Strings. But in this case, it seems to have a right of existence, as it effectively trims the character array, and keeps us from keeping a reference to the very large String.

8 Responses to “Leaking Memory in Java”

  1. GadgetGadget.info - Gadgets on the web » Leaking Memory in Java Says:

    […] Devlib wrote an interesting post today!.Here’s a quick excerptNow however, in the days of Java, most people aren’t that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all … […]

  2. Sherif Mansour Says:

    Hi There,
    Thanks for the insightful article! I found this quite useful - especially in understanding why Java OutOfMemory’s work…
    Sherif

  3. Jos Hirth Says:

    Well, that’s not a memory leak. See:
    http://en.wikipedia.org/wiki/Memory_leak

    The behavior is intentional - it trades memory for performance. As most things in the standard library (eg collections) it’s optimized for general usage and, well, generally it’s alright. But you certainly shouldn’t tokenize a really big string this way.

    The classic type of memory leaks doesn’t exist in managed languages. The only thing we can produce are so called reference leaks. That is… referencing stuff (and thus preventing em from being GCed) for longer as necessary (or for all eternity).

    Fortunately it’s easy to avoid - for the most part.

    The important things to know:

    Locally defined objects can be GCed as soon as there are no more no more references to it. Typically it’s the end of the block they are defined in (if you don’t store the reference anywhere). If you do store references, be sure to remove em if you don’t need em anymore.

    If you overwrite a reference with a new object, the object is first created and /then/ the reference is overwritten, which means the object can be only GCed /after/ the new object has been created.

    Usually this doesn’t matter. However, if you want to overwrite an object which is so big that it only fits once into the memory, you’ll need to null the reference before creating/assigning the new instance.

    Eg:
    //FatObject fits only once into memory
    FatObject fatty;
    fatty=new FatObject();
    fatty=new FatObject();

    Will bomb with OOME. Whereas…

    FatObject fatty;
    fatty=new FatObject();
    fatty=null;
    fatty=new FatObject();

    Will be fine, because the second creation of the FatObject will trigger a full GC and the GC will be able to clear enough memory (since the old reference has been nulled).

    Well, that rarely matters, but it’s good to know.

  4. Randomly Intermittent Thoughts » A Good Reasoning to Nullify an Object! Says:

    […] Jos Hirth wrote this in response to this post by Jeroen van Erp. […]

  5. links for 2007-10-06 - smalls blogger Says:

    […] Xebia Blog Leaking Memory in Java (tags: java memoryleak programming jvm) […]

  6. James McInosh Says:

    I don’t know which version of the JVM you are sunning but when it constructs a new string using this constructor:

    String(char value[], int offset, int count)

    It sets the value using this:

    this.value = Arrays.copyOfRange(value, offset, offset+count);

  7. creyle Says:

    To be more obvious, with the underlying big char array being referenced, all the TestGC objects created in the big for-loop could not be GCed. that’s the problem.

    Thanks

  8. Jeroen van Erp Says:

    James,

    True for String(char[] value, int offset, int count), but not for String(int offset, int count, char[] value). The constructor you mention is a public constructor. The constructor that is called from the substring method is a package private constructor.

posted on 2007-11-15 13:02 linugb118 阅读(394) 评论(0)  编辑  收藏


只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问  
 

My Links

Blog Stats

常用链接

留言簿(1)

随笔档案

搜索

最新评论

阅读排行榜

评论排行榜