Question

TopHat2 number of mapped reads?

0

Entering edit mode

11.2 years ago

samuelrivero ▴ 50

Hello

I am new in RNA-seq. I am using Tophat2 to map single end reads to mm9. I am using tophat2 in this way:

tophat -p 10 --max-multihits 1 -G genes_mm9.gff -o output genome_mm9 reads.fastq

With --max-multihits 1, I assume I will get 1 alignment per read. Assuming that, the number of total reads that tophat2 uses for the mapping (19196075) should be the number of alignments in accepted_hits.bam file (8797938) (because --max-multihits 1) plus the total reads in unmapped.bam file (7538885). But that is not the case, there are 2859252 reads missing. Am I correct?

Thank you for your help

Samuel

RNA-Seq next-gen rna-seq alignment • 3.1k views

ADD COMMENT • link 11.2 years ago by samuelrivero ▴ 50

0

Entering edit mode

Were all the reads of the same length? tophat2 will filter out reads that are too short.

ADD REPLY • link 11.2 years ago by Devon Ryan 105k

0

Entering edit mode

19196075 is the reads used for mapping.

left_kept_reads.info file:

reads_in =19197489
reads_out=19196075

Tophat2 just filtered 1414 reads

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 11.2 years ago by samuelrivero ▴ 50

0

Entering edit mode

Comment deleted.

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 11.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thanks Ashtosh, that was my first thought. But reading the tophat manual is not really clear for me. According to the tophat manual:

-g/--max-multihits <int> Instructs TopHat to allow up to this many alignments to the 
                         reference for a given read, and choose the alignments based on 
                         their alignment scores if there are more than this number. The
                         default is 20 for read mapping. Unless you use
                         --report-secondary-alignments, TopHat will report the
                         alignments with the best alignment score. If there are more
                         alignments with the same score than this number, TopHat will
                         randomly report only this many alignments. In case of using
                         --report-secondary-alignments, TopHat will try to report
                         alignments up to this option value, and TopHat may randomly
                         output some of the alignments with the same score to meet 
                         this number.

With -g/--max-multihits 1, what I understand is that TopHat will report the best alignment for each read, or randomly select one alignment in case of several alignments with the same score.

Maybe my interpretation is wrong.

Thanks

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 11.2 years ago by samuelrivero ▴ 50

0

Entering edit mode

Actually , my interpretation of that parameter was wrong. I will delete my explanation above so that other people don't get confuse.

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 11.2 years ago by Ashutosh Pandey 12k