I am trying to phase large genotypic data set (~330 samples and ~25 millions SNPs) by using Beagle v5.4. My command is this:
java -Xmx160g -jar ./beagle.22Jul22.46e.jar gt=./lines_ch1.vcf.gz ref=./ref_lines.bref3 chrom=1 map=./genmap_ch1.map nthreads=60 window=10.0 out=./pased_lines_ch1 impute=false
The job ends when it reachs the window 10 with the following error:
Window 10 [1:52402376-72298110]
Reference markers: 1,624,703
Study markers: 1,624,703java.lang.OutOfMemoryError: Java heap space at phase.HmmParamData.<init>(HmmParamData.java:73) at phase.PhaseLS.lambda$getParamEst$1(PhaseLS.java:129) at phase.PhaseLS$$Lambda$588/0x00007f767127f568.run(Unknown Source) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
Terminating program.
I have been decreasing the window size from the default value 40 to 10, but it still fail. Is it my data too big for being handled by Beagle? Or am I using the wrong phasing parameters? Thanks
This is the memory issue. You may have to specify a temp directory to fix it.
java -Djava.io.tmpdir=/home/temp/ -Xmx160g -jar ./beagle.22Jul22.46e.jar ......