Entering edit mode
11.5 years ago
xiaoyanli82
▴
10
Hi, everyone
I use GATK to realignment the bam files, which is about 400M, the command I ues is :
java -Xmx24g -jar GenomeAnalysis.jar -T RealignerTargretCreator -rbs 10000000 -R ref.fa -I sample.bam -o sample.realigner.intervals
After running 1h, the program exit with this error report:
ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java
How can I solve this problem,and what I can do to speed up the program, thank you!
Did you try to remove -rbs 10000000 ?
I tried without the -rbs 10000000 argument, the same error was report. And I also realigned the same bam file to the CDS region of my reference genome, the program succeed. The reason maybe my reference is too large which is above 4G of wheat draft genome.
Perhaps knowing the genome size in advance might have been a good addition to the question :) Also this is probably more suitable for the GATK forums: http://gatkforums.broadinstitute.org/
The reason for this maybe too many contigs of my references, thanks for all your answers!
What version of GATK?
(GATK) v2.5-2-gf57256b
I have never needed to allocate that much memory to RealignerTargetCreator - what is the purpose of buffering ten million reads in memory at this point? It seems to me that might be a likely cause of running out of memory.
As for speeding it up, very few of these operations are 'fast' with GATK. There are multiple route to parallelising GATK, you will need to read: http://www.broadinstitute.org/gatk/guide/article?id=1988 and http://www.broadinstitute.org/gatk/guide/article?id=1975