Question

unmapped reads from Tophat

0

Entering edit mode

8.3 years ago

maheetha.b ▴ 70

Hello all,

I'm new to tophat so I had a few questions. I have a reference genome hg19, and I'm aligning fastq files (pair-ended RNA). These sequences may have SNPs in them that differ from the reference. I'm getting unmapped and mapped reads, and I have a -N of 2 (default), which means that it can handle SNPs (hopefully). I just want to make sure that these reads aren't going into the 'unmapped' bam, but that they're only being given a "lower" mapping quality score in the "mapped.bam"

So far in my unmapped bam, I see a bunch of reads with flag 4. I see flags of 0, and 16, and 272 which are for single-ended and not paired so I'm not too worried about those.

My goal is to have a reasonable mapping quality score, which wouldn't introduce too much junk, but also give me some reads that have an SNP here and there, so that I can go through my reads again and use a program to realign those that need to be realigned.

Does someone have more insight into Tophat and how it handles SNPs give some input on this?

tophat alignment • 3.0k views

ADD COMMENT • link updated 8.3 years ago by WouterDeCoster 47k • written 8.3 years ago by maheetha.b ▴ 70

score 0 · Answer 1 · 2016-08-26

If the read has more than '-N' specified mismatches, it will go to unmapped bam. Flag 4 is for unmapped reads. The flags 16 and 272 are not about single-ended etc, you should check here what do they mean.

The mapping quality does not depends on the number of mismatches ( could be a SNPs ) in a read, rather it tells about the number of regions in genome it could be mapped. The alignment score tells you about the number of mismatches or indels.

Most of the cases, you can simply rely on filtering reads based on mapping quality as long as your are not altering the default parameters about mismatches or indels ( i.e. edit distance )