Question

Finding alignment score for multi-read frombam/sam file

0

Entering edit mode

8.3 years ago

EVR ▴ 610

Hi,

I have mapped my RNA seq reads to the genome using tophat2. Thought eh mapping rate was 80%, it has been reported ~35 % has multiple alignments. By default, for multi-read(reads aligning to multiple locations), based on the alignment score, the best read is selected. For multi-read with same alignment score, tophat will report random alignment.

I would like to find how many multi-read has same alignment score from bam/sam file.

Also is there any possibility to assign the multi-read with same alignment score to best location.

Kindly guide me

RNA-Seq Bam Sam Tophat Multi-read • 2.3k views

ADD COMMENT • link 8.3 years ago by EVR ▴ 610

2

Entering edit mode

You did not "assemble", you "mapped". And it is impossible to assign multi-mapping reads with the same alignment score to the best location, because the locations have the same alignment score. You can either assign them to all, assign them to one at random, or ignore them. For RNA-seq, I think random is typically best, though it depends on how you plan to do post-processing.

ADD REPLY • link 8.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you your reply. How can I find how many reads in the bam/sam file which has mapped at multiple positions and has same score. Is there any field in BAM/SAM file which denotes the mult-read and their alignment score.

ADD REPLY • link 8.3 years ago by EVR ▴ 610

0

Entering edit mode

If a read maps to multiple positions with the same score, it should be assigned a mapq of 3 or less. So, filtering by mapq should do the trick.

ADD REPLY • link 8.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi Brian,

it is impossible to assign multi-mapping reads with the same alignment score to the best location

I agree with your comment. However, I think you can have the same alignment score and still assign a "best" location. This could happen since typically the alignment score accounts only for the number of matches and mismatches while the probability of incorrect mapping (i.e. the mapq) accounts for match/mismatch but also for base qualities. (Did I get it right?)

ADD REPLY • link 8.3 years ago by dariober 15k