Hello,
I have a whole human genome sequence data from illumina and I am developing the pipeline to analyze it. I would like to call variants with samtools mpileup. To speed up this step I just wondering if i should call variants per chromosome or I divide the chromosomes in different region and then call the variants? which one is faster or convenient?
Thank you Jean
You should consider unified genotyper - it has built in threading. I switch back and forth between samtools and unified genotyper depending on the project. Both have pros and cons. If you are working with human data, you probably want to use the Broad's "best practices" variant calling pipeline.
I've seen an increase in samtools variant calling pipelines since GATK's new commercial license policy.
my experience: I split the output of bwa per chomosome and process each chunk on our cluster in parallel. I merge the results (BAMs/VCFs) at the end. I don't have any log to say if it's much faster