I got the following rather strange error while running GATK indel realignment on a considerably larger file (HiSeq) than I am used to:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
at net.sf.picard.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:188)
at org.broadinstitute.sting.gatk.datasources.providers.ReferenceView.getReferenceBases(ReferenceView.java:89)
at org.broadinstitute.sting.gatk.datasources.providers.ReadReferenceView.getReferenceContext(ReadReferenceView.java:73)
at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:87)
at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:48)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:69)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:175)
at org.broadinstitute.sting.gatk.CommandLineExecutable.executeGATK(CommandLineExecutable.java:94)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:76)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:104)
which makes it look like the program ran out of memory. I am providing the program with 4GB of memory, though, and the coverage in the regions at which it craps out is (judging from the .bam file input) not excessive in the location. I find, moreover, that the runs are variable, and that the failure occurs at different places in different files at different times.
My command line call is:
java -Xmx4g -jar /datastore/nextgenproc/projects/hybrid_selection/hybrid_selection/GenomeAnalysisTK/GenomeAnalysisTK.jar
-T IndelRealigner
--max_reads_at_locus 50000
--maxReadsInRam 1000000
-R $GEN
-I $prefix.s.Dupmark.recal.bam
-targetIntervals $prefix.indelrecal.intervals
-O $prefix.s.Dupmark.recal.realign.bam
-D $GEN.snp.rod
Has anyone else seen this?
Well, I think the obvious fix is to give it more memory. Also I wouldn't read too much into tracking down where exactly the error occurs as long as it is the same type of memory error. Memory allocation is a tricky business where things can go differently upon subsequent runs.
The folks developing GATK are very good at answering questions on this site: http://getsatisfaction.com/gsa . In case you didn't know it.