In BWA-MEM what's the best way to discriminate reads mapping uniquely from those mapping in multiple positions? The XT tags that we used in BWA are no longer available. We are thinking about classifying as uniques the reads having XS:i:0 (suboptimal alignment score of 0). Someone is also suggesting to use a difference threshold between AS and XS. This would require a somewhat heavier parsing of the file, but we can do that if the improvement of results is significant. Which of the two methods is better? Are there other, better approaches?
Ok, I think we will go with your suggestion. I also refer readers to another BS post showing a command line to filter for mapping quality using samtools. Looks very useful! bwa: "XT:A:U" and MAPQ of 0 at the same time
While digging in to the SAM file, the optional flags and the option during samtools view conversion I could also see some other flags that do not make sense
even if I use for single end reads for samtools view the options: -F 4 -F 256 -q 1
I still get the SA:Z and XA:Z optional flags. They appear to be hybrids
NOTE that there are no XT:A:U
Please check my post here:
BWA 0.7.12 and Unique reads, -r, -c options, XA: and SA: optional tags on SAM output,
Yes, that's an excellent approach! However, I was thinking that the introduction of tags related to uniqueness of mapping was due to the fact that they gave better results. I wasn't able to find anything for confirming (or rejecting) my hypothesis.
MEM is a new addition to the tool so I would expect that more features will be added to it in time - and there are other missing tags of great utility for example MD
MD can easily be added with samtools fillmd. But you're right though, others like SM:i are missing.
good point and reminder on how to work around that