Hello everyone,
I need some help, I have around 50,000 WES gvcfs which I am trying to merge using (gatk genomicdbimport tool) and do the Multi-sample calling later. even though I am using the biggest node on our cluster the merging is extremely slow. it takes almost 5-6 days to merge a batch of only 5000 samples. I was just wondering if you guys have a solution to this problem. I tried different things but no success so far.
gatk --java-options "-Xmx250g -Xms250g" GenomicsDBImport --genomicsdb-workspace-path $SCRATCH/database/UKB_Database --batch-size 5000 -L $SCRATCH/interval.list --sample-name-map $SCRATCH/cohort.sample_map.txt --tmp-dir $SCRATCH/temp/ --reader-threads 6
gatk --java-options "-Xmx250g -Xms250g" GenomicsDBImport --genomicsdb-update-workspace-path $SCRATCH/database/Batch1 --batch-size 5000 --sample-name-map $SCRATCH/batch0.txt --tmp-dir $SCRATCH/temp/ --reader-threads 6
these are the commands that I am using, Since I need some selected regions so in the first command I am passing an interval list in .bed format..
looking forward to hearing from you. thanks in advance.
Kind Regards, Haider
split your $SCRATCH/interval.list per chromosome
Hi, Thanks it worked..