I am mapping some RNA-seq data with bwa and would like to do some analysis on where multi-mapped reads fall.
I know that I can extract multi-mapped reads by looking for mapq < 23 and/or the XA flag on the reads. However, I am wondering how bwa decides which location to report for a read that can be mapped to two different locations equally well. Does it choose a random one? Does it always report the first one? Something else?
Does anybody know what exactly bwa does here?
Do you have any idea if the MAPQ will be 0 in case of multiple mapping or something else? (I read two opinions about that)
The MAPQ=0 is a convention that bwa uses and not a standard.
And even considering it a convention it is not quite right. Having a multi-mapped read does not mean that the chance of the alignment being correct is zero.
The best way to detect multimapping is the check the SAM tag for alternative mappings.