Hello all,
I'm sorry if my question is a bit naive but i try to run haplotypecaller on human WGS 30X.
I use GATK 4.2.0.0 I want to have an idea of the "normal" run time for such data.
I run happlotypecaller in gvcf mode and by interval (see the command after).
For interval i get the "good" sequence from the gatk bundle and i split the interval into 50 sub intervals by the tool gatk SplitIntervals (defaults parameters). Then i run haplotypecaller by intervals in parrallel for the 50 sub intervals. The problem is that some intervals ended in less than 2 hours but some others ended in 12 hours... How could i improve the run time? Precision, for each sub interval i use 1 CPU and 5Go of memory.
Thanks in advance :-)
The command used:
for i in {0000..0049}
do
srun --ntasks=1 gatk --java-options "-Xmx${SLURM_MEM_PER_CPU}M" HaplotypeCaller \
-R ${REF_Genome} \
-L ${Interval_DIR}/${i}.scattered.interval_list \
-I ${BAM_INPUT_DIR}/${BAM_INPUT} \
-O ${GVCF_OUTPUT_DIR}/${GVCF_OUTPUT}.${i} \
-G StandardAnnotation -G AS_StandardAnnotation -G StandardHCAnnotation \
-GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 50 -GQB 60 -GQB 70 -GQB 80 -GQB 90 \
-ERC GVCF \
--pcr-indel-model NONE \
--tmp-dir ${TMP_DIR} &
done
wait