I faced the 'java.lang.OutOfMemoryError: Java heap space' problem when running snpEff locally.
My input vcf file is 4.6 MB The header of my input vcf file meets the requirement listed on the snpEff offical github page.
"#CHROM POS ID REF ALT QUAL FILTER INFO"
I have installed snpEff in my conda environment using the command 'conda install bioconda::snpeff' in my conda environment. The version of snpEff is 5.2. I have previously generated the input vcf files from VcfAllelicPrimitives (under vcflib v1.0.3). My local computer has 32 GB RAM. I have increased the java InitialHeapSize to 10g and MaxHeapSize to 20g but still the error occurs.
My procedure:
I first activated my conda environment and do a java version check.
java -version
openjdk version "23.0.1-internal" 2024-10-15
OpenJDK Runtime Environment (build 23.0.1-internal-adhoc.conda.src)
OpenJDK 64-Bit Server VM (build 23.0.1-internal-adhoc.conda.src, mixed mode, sharing)
I then checked the java initial and maximum heap size; the default initial heap size is 0.5g and and the maxheapsize is 8g
java -XX:+PrintFlagsFinal -version | grep -e 'InitialHeapSize' -e 'MaxHeapSize'
size_t InitialHeapSize = 528482304 {product} {ergonomic}
size_t MaxHeapSize = 8392802304 {product} {ergonomic}
size_t SoftMaxHeapSize = 8392802304 {manageable} {ergonomic}
I then increased my java initial and maximum heap size to 10g and 20g respectively
export JAVA_TOOL_OPTIONS="-Xms10g -Xmx20g"
I then checked the java heap size again
java -XX:+PrintFlagsFinal -version | grep -e 'InitialHeapSize' -e 'MaxHeapSize'
Picked up JAVA_TOOL_OPTIONS: -Xms10g -Xmx20g
size_t InitialHeapSize = 10737418240 {product} {command line}
size_t MaxHeapSize = 21474836480 {product} {command line}
size_t SoftMaxHeapSize = 21474836480 {manageable} {ergonomic}
It appears that the java initial and maximum heap size is successfully increased.
Then I tried to run the SnpEff eff (I have previously downloaded the snpEff GRCh38.86 database locally)
snpEff eff -c ./anaconda3/envs/NGS_analysis/share/snpeff-5.2-1/snpEff.config GRCh38.86 -nodownload cut.vcf > snpeff_out.vcf
The problem still occurs
Picked up JAVA_TOOL_OPTIONS: -Xms10g -Xmx20g
java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects
at java.base/jdk.internal.misc.Unsafe.allocateUninitializedArray(Unsafe.java:1381)
at java.base/java.lang.StringConcatHelper.newArray(StringConcatHelper.java:451)
at java.base/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder)
at java.base/java.lang.invoke.LambdaForm$MH/0x0000737bdb019800.invoke(LambdaForm$MH)
at java.base/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder)
at org.snpeff.interval.Transcript.introns(Transcript.java:1286)
at org.snpeff.interval.Transcript.createSpliceSites(Transcript.java:724)
at org.snpeff.interval.Genes.createSpliceSites(Genes.java:129)
at org.snpeff.snpEffect.SnpEffectPredictor.createGenomicRegions(SnpEffectPredictor.java:185)
at org.snpeff.snpEffect.SnpEffectPredictor.buildForest(SnpEffectPredictor.java:134)
at org.snpeff.SnpEff.loadDb(SnpEff.java:620)
at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:890)
at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:875)
at org.snpeff.SnpEff.run(SnpEff.java:1173)
at org.snpeff.SnpEff.main(SnpEff.java:163)
I would like to know how can I solve this problem. From the SnpEff official site, they increased the java MaxHeapSize to 8g and it seems sufficient.
Thanks!
I would like to follow up on this post
somehow adding a -canon solved this running issue
This option uses only canonical transcripts. Not sure why adding the option solved the java heap size issue? (regarding the hg38 vs GRCh38.86, i tried using either one in previous runs but gave the same issue. Only by adding the -canon option this issue is resolved).
You could try -Xmx28g with the original command and see if it completes. Doing bioinformatics on a machine with only 32g is difficult however.
Also use htop/bottom / glances /bashtop etc to monitor your tools RAM usage at runtime.