I'm trying to do variant calling (SNPs, Indels) from exome-sequencing data, and the sequencing was done with paired end reads. I would like to use BWA for mapping/alignment, followed by PiCard and GATK to do variant calling.
The question now is how to do sequencing alignment with BWA. Should I use the short paired end reads to generate a single SAM file, like this:
"BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads."
Thanks for your reply. I was reading BWA manual and thought that producing the aligned .sam file would require two steps if I use bwa aln: first, I need to align individual reads to generate .sai files and secondly I need to merge the two separate .sai files to generate .sam file. Is this strongly recommended over bam mem even if my reads are less than 100 bp?
Yup thats the way that software was written. BWA-MEM does the same task in one step but that doesn't mean that it is better for reads with less than 100 bp. If ur reads are less than 100 bp, then use bwa aln and bwa pe or the two step process.
I think the recommendation re: use of bwa-mem vs bwa aln/sampe is now to use mem for anything over 70bp (which would now be the vast majority of Illumina runs even after trimming). Not sure if anyone has assessed this on a human data set yet...
Yup you are right. That comment is pretty old when BWA MEM was still in its beta phase I think. I think everyone should use BWA-MEM now for alignment purpose given the fact that almost all the sequencers produce reads of length greater than 75 bp now.
Thanks for your reply. I was reading BWA manual and thought that producing the aligned
.sam
file would require two steps if I usebwa aln
: first, I need to align individual reads to generate.sai
files and secondly I need to merge the two separate.sai
files to generate.sam
file. Is this strongly recommended overbam mem
even if my reads are less than 100 bp?Yup thats the way that software was written. BWA-MEM does the same task in one step but that doesn't mean that it is better for reads with less than 100 bp. If ur reads are less than 100 bp, then use bwa aln and bwa pe or the two step process.
I think the recommendation re: use of bwa-mem vs bwa aln/sampe is now to use mem for anything over 70bp (which would now be the vast majority of Illumina runs even after trimming). Not sure if anyone has assessed this on a human data set yet...
Yup you are right. That comment is pretty old when BWA MEM was still in its beta phase I think. I think everyone should use BWA-MEM now for alignment purpose given the fact that almost all the sequencers produce reads of length greater than 75 bp now.