First, I have run tophat with -r 50, however I have calculated than the mean inner distance is around 0
mean insert (195) - 2xread_size(100) = mean distance between mate pairs (-5).
I got with -5:
40915584 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
40915584 + 0 mapped (100.00%:-nan%)
40915584 + 0 paired in sequencing
20381317 + 0 read1
20534267 + 0 read2
21446582 + 0 properly paired (52.42%:-nan%)
33030608 + 0 with itself and mate mapped
7884976 + 0 singletons (19.27%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
52783519 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
and with 50:
52783519 + 0 mapped (100.00%:-nan%)
52783519 + 0 paired in sequencing
26328045 + 0 read1
26455474 + 0 read2
33127790 + 0 properly paired (62.76%:-nan%)
42606738 + 0 with itself and mate mapped
10176781 + 0 singletons (19.28%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
The library is RNA-seq mapped to hg19, the total reads in FASTQ R1 are 78751541, so I'm confused.
What are your read lengths? 100? 250? 300? Pair end or no?
paired-end 100nt
This isn't likely a sufficient answer to your question, but when I use tophat (for rough analysis), I tend to use
-r
with 1/3 the read length. Giving tophat too high a number for-r
has always caused me problems in the past. When I went to the documentation, it gives the example of entering-r 200
for 300bp (paired end) reads.Why you have different numbers of starting reads (40915584 vs 52783519) for both cases?
These I think are not starting reads but a mapped ones.
My bad. This is an output from
samtools flagstat
. I thought this is a log file from Tophat2. I definitely need some coffee. Can't say whats going on right now but I would definitely go with results from the second case.