Entering edit mode
10.6 years ago
samuelrivero
▴
50
Hello
I am new in RNA-seq. I am using Tophat2 to map single end reads to mm9. I am using tophat2 in this way:
tophat -p 10 --max-multihits 1 -G genes_mm9.gff -o output genome_mm9 reads.fastq
With --max-multihits 1, I assume I will get 1 alignment per read. Assuming that, the number of total reads that tophat2 uses for the mapping (19196075) should be the number of alignments in accepted_hits.bam file (8797938) (because --max-multihits 1) plus the total reads in unmapped.bam file (7538885). But that is not the case, there are 2859252 reads missing. Am I correct?
Thank you for your help
Samuel
Were all the reads of the same length? tophat2 will filter out reads that are too short.
19196075 is the reads used for mapping.
left_kept_reads.info file:
Tophat2 just filtered 1414 reads
Comment deleted.
Thanks Ashtosh, that was my first thought. But reading the tophat manual is not really clear for me. According to the tophat manual:
With
-g
/--max-multihits 1
, what I understand is that TopHat will report the best alignment for each read, or randomly select one alignment in case of several alignments with the same score.Maybe my interpretation is wrong.
Thanks
Actually , my interpretation of that parameter was wrong. I will delete my explanation above so that other people don't get confuse.