Hi all, I am working on exome capture data for barley (1.3Gbp). I am interested in variant calling to find out SNPs in my sample. I have used SAMTools SNP calling and things get done in ~1 hr whereas GATK (inspite of its several steps to prepare the BAM for variant caller) takes forever. I understand my reference is large and since its an exome capture the targeted region is only 60 Mbp of 1.3Gbp. Indel realigner is the step it takes forever to locate for sites where indel realignment is required. Do someone have any suggestions to speed it up? Or try any other variant caller?
Thanks, D
You actually don't need to split per chr. Just index it :-)
I do split per chromosome just after BWA: it's then faster for sorting and removing the duplicates.
Thanks both. Also I found from GATK forum that downsampling the reads based on coverage can sometimes help to speed up. I am currently trying both the approaches will get back if that has changed the runtime.