I have installed BWA. And build index of hg19 using command:
bwa index -a bwtsw hg19.fa
Now I find alignment using command:
./bwa aln hg19.fa SRR44930951.fastq > alnsa.sai
However, I want to find alignments allowing 1 or 2 mismatches.
From BWA home page, I found, I have to use XM tag.
But, can't get how to use that means what should be the command.
Can anybody please help me on this ?
Thanks.
Also asked at http://seqanswers.com/forums/showthread.php?t=21208
Do you want to align reads with maximum 1 or 2 mismatches (while running BWA) or you want to find reads (eg. in SAM file) having maximum 1 or 2 mismatches after mapping?
I want to align reads with maximum 1 or 2 mismatches (while running BWA).
also asked at : http://sourceforge.net/mailarchive/forum.php?thread_name=CAJdGSff4KwmuqfNLCUBvrYuXTOmez1nnD%3DCurriPYzohtoPkQQ%40mail.gmail.com&forum_name=bio-bwa-help
Hmm. But find no answer without pointer :).
you need combine XM and NM tags together. And also it depends on what is your definitions on mismatches. If sub, indels are all included, then bwa wont give the number directly out of your screen. Recovery the alignment with CIGAR ,read and reference would be the most correct way but time consuming.
@jingtao09, In my case "definitions on mismatch" means number of insertion/deletion/substitution occurred (as specified in BWA paper). NM (Edit distance) will be equally costly. However, I can't understand how to include XM tag in ./bwa aln hg19.fa SRR44930951.fastq > alnsa.sai command.
Another point is that, in some places I found - n 0/1/2 this tag gives specified number of mismatch. However, can't understand whether is that true or not. Little bit confusing because some places told to use XM (but don't specify how) some places told to use - n 0/1/2 .
XM tag will be reported in your sam file by default.
Means, if I want to find mismatches, allowing 0/1/2 errors, I have to just run ./bwa aln hg19.fa SRR44930951.fastq > aln.sa.sai command. Later I can found 0/1/2 mismatches from aln.sa.sai ? Actually, I am comparing, BWA vs BowTie. For, BowTie, I can specify output allowing mismatch by command ./bowtie --all -v 0 hg19 SRR4930952.fastq SRR4930952.txt (v specify number of mismatch). I want same command to compare with BWA ( it's paper say's, it allows).
so you can do one time bwa -n4, then you can subtract all n=0,1,2,3 reads. No easy way to compare to bowtie with the edit distance definitions. Cause bwa using Smith-Waterman algo, which assign different compensations to sub and indels. The best you cand do is to retrieve the alignment and calculate the mismatches by your scripts. otherwise, you can do statistically comparison. it can save your alot time