Hi, It would be great, if anyone can help me to understand how bwa-mem alignment works when there is a pseudogene. When there is a region of gene which is 100% similar to the pseudogene and the generated reads are with a variant, then how the aligner will be able to map the reads to the correct location. Ideally I expected all the reads with mapping quality zero but I can see lots of reads with MAQ nearly 27 with the variant are mapped to gene and the reads with lower mapping quality without variant mapped to pseudogene. I am not getting how BWA is clearly placing the reads irrespective of the presence of repeat region. I have the attached the MSA which include the region of gene, psudogene and read with variant.
Thanks in advance.
Not sure if I understand the full question. As far as I know BWA will choose randomly on which location it will map.
BWA with multiple references (I have seen more sources but can't find it now)
If it is 100% similar BWA can not know the correct location of course. To solve this people use long reads from pacbio for example.
So which means when the aligner is not able to find a single optimal alignment, it randomly choose the location to map the read?. In my case the reads with the variant aligned to gene and the reads with out the variant aligned to pseudogene is just by chance, am I correct? . But I am not understanding how the mapping quality is calculated. I searched a lot for a reference but didn't get a proper one so far.
Please do not use personal cloud storage services such as Google Drive or DropBox to share images on the forum. Use a free image hosting service such as imgbb instead. Read How to add images to a Biostars post for more detailed instructions.