Entering edit mode
8.2 years ago
EVR
▴
610
Hi,
I have mapped my RNA seq reads to the genome using tophat2. Thought eh mapping rate was 80%, it has been reported ~35 % has multiple alignments. By default, for multi-read(reads aligning to multiple locations), based on the alignment score, the best read is selected. For multi-read with same alignment score, tophat will report random alignment.
I would like to find how many multi-read has same alignment score from bam/sam file.
Also is there any possibility to assign the multi-read with same alignment score to best location.
Kindly guide me
You did not "assemble", you "mapped". And it is impossible to assign multi-mapping reads with the same alignment score to the best location, because the locations have the same alignment score. You can either assign them to all, assign them to one at random, or ignore them. For RNA-seq, I think random is typically best, though it depends on how you plan to do post-processing.
Thank you your reply. How can I find how many reads in the bam/sam file which has mapped at multiple positions and has same score. Is there any field in BAM/SAM file which denotes the mult-read and their alignment score.
If a read maps to multiple positions with the same score, it should be assigned a mapq of 3 or less. So, filtering by mapq should do the trick.
Hi Brian,
I agree with your comment. However, I think you can have the same alignment score and still assign a "best" location. This could happen since typically the alignment score accounts only for the number of matches and mismatches while the probability of incorrect mapping (i.e. the mapq) accounts for match/mismatch but also for base qualities. (Did I get it right?)