Gatk Indel Realignment Dies In A Memory Explosion... Any Fixes?
1
1
Entering edit mode
13.2 years ago
Wjeck ▴ 490

I got the following rather strange error while running GATK indel realignment on a considerably larger file (HiSeq) than I am used to:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
    at net.sf.picard.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:188)
    at org.broadinstitute.sting.gatk.datasources.providers.ReferenceView.getReferenceBases(ReferenceView.java:89)
    at org.broadinstitute.sting.gatk.datasources.providers.ReadReferenceView.getReferenceContext(ReadReferenceView.java:73)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:87)
    at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:48)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:69)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:175)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.executeGATK(CommandLineExecutable.java:94)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:76)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:104)

which makes it look like the program ran out of memory. I am providing the program with 4GB of memory, though, and the coverage in the regions at which it craps out is (judging from the .bam file input) not excessive in the location. I find, moreover, that the runs are variable, and that the failure occurs at different places in different files at different times.

My command line call is:

java -Xmx4g -jar /datastore/nextgenproc/projects/hybrid_selection/hybrid_selection/GenomeAnalysisTK/GenomeAnalysisTK.jar 
-T IndelRealigner 
--max_reads_at_locus 50000 
--maxReadsInRam 1000000 
-R $GEN 
-I $prefix.s.Dupmark.recal.bam 
-targetIntervals $prefix.indelrecal.intervals 
-O $prefix.s.Dupmark.recal.realign.bam 
-D $GEN.snp.rod

Has anyone else seen this?

gatk indel bam java • 5.5k views
ADD COMMENT
0
Entering edit mode

Well, I think the obvious fix is to give it more memory. Also I wouldn't read too much into tracking down where exactly the error occurs as long as it is the same type of memory error. Memory allocation is a tricky business where things can go differently upon subsequent runs.

ADD REPLY
0
Entering edit mode

The folks developing GATK are very good at answering questions on this site: http://getsatisfaction.com/gsa . In case you didn't know it.

ADD REPLY
3
Entering edit mode
13.2 years ago

That particular OutOfMemoryError indicates that you have pinned the garbage collector, meaning that the JVM is spending almost all its time trying to free memory, while hardly recovering anything of what is on the heap. This suggests that you are working with barely enough RAM for your task. The error is thrown to stop an application continuing to run when all it is doing is GC.

You can give the JVM an argument to prevent this exception, but that doesn't fix the underlying problem. Allocate more memory. Since you are not getting an ordinary OutOfMemoryError, you might be quite close to having enough RAM. You could reduce the value of maxReadsInRam as well. The trade-off is that (I think) GATK will use more temporary files and you may risk starvation of other resources (e.g. filehandles).

There is no such thing as a free lunch.

ADD COMMENT
0
Entering edit mode

This appears to be the answer: 4G is simply not enough to run indel realignment on a HiSeq lane. Quadrupled to 16G and am seeing better performance and no run failures (though I presume there is still the potential there for very deep sequencing).

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6