Hello,
For an allele specific expression analysis, one of the ways to reduce ambiguous mapping reads is to extract or use only unique mapping (not referring to primary alignment or unique alignment here) reads from an alignment file.
For this, highest MAPQ score (column 5 in a BAM file) can be used as a uniquely mapped read will always have the highest mapping quality and also the NH (no. of hits) tag will always be 1 (NH:i:1).
HISAT2 versions 2.0.4 and 2.0.5 have given the uniquely mapping reads a MAPQ of 60 and their NH tag is always 1. Extracting or using uniquely mapping reads from BAM files produced from these versions of HISAT is fairly simple (can use samtools view -q 60 for considering only uniqely mapped reads). If we do not make a new BAM file consisiting of only reads having MAPQ=60 from the original BAM file, we can also instruct downstream tools (such as variant callers or read counter tools) to consider reads having MAPQ=60.
Unfortunately, some BAM files I am dealing with have been produced using HISAT2 (v2.0.0-beta) in which there is a bug where in the NH tag is not properly assigned. To be more specific, the highest MAPQ assigned is 255 (for uniqely mapped read) but the NH tag is not always 1. This bug has been solved v 2.0.4 onwards though.
Can someone help me or guide me on how to extract uniquely mapped reads in this case? Also, if I have understood or explained some concept wrongly, please let me know.
SAM specifications defines MAPQ of 255 as "indicating that the mapping quality is not available". Assuming HISAT2 follows the specifications, MAPQ of 255 does not indicate a uniquely mapped read and does not enforce NH=1.
Thanks for your reply. But that was the reason the Hisat developers changed it from 255 to 60 from v2.0.4 onwards.