I have paired-end sequenced RNA-seq files(Illumina, fastq).
I trimmed the reads by trimmomatic
java -jar <path>/trimmomatic-0.36.jar PE -threads 4 -phred33 1.fastq 2.fastq 1_trim1.fastq 1_unpaire d1.fastq 2_trim2.fastq 2_unpaired2.fastq ILLUMINACLIP:<path>/TruSeq3-PE-2.fa:3:30:10 SLIDINGWINDOW:5:20 MINLEN:20
around 14M paired reads were survived.
I aligned the trimmed fastq files to genome by tophat(old version = 2.1.0, new version = 2.1.1) on exactly same argments.
# This is old version (2.1.0)
tophat --num-threads 4 --read-mismatches 1 --read-edit-dist 2 --read-realign-edit-dist 1000 -a 8 -m 0 -i 30 -I 1000 -g 1 --min-segment-intron 30 --max-segment-intron 1000 --segment-mismatches 1 --segment-length 25 --library-type fr-secondstrand --max-insertion-length 3 --max-deletion-length 3 --no-coverage-search -r 100 --mate-std-dev 20 -o ./local_tophat_old_alignments <genome> 1_trim1.fastq 2_trim2.fastq
# This is new version (2.1.1)
~/script/tophat-2.1.1/tophat --num-threads 4 --read-mismatches 1 --read-edit-dist 2 --read-realign-edit-dist 1000 -a 8 -m 0 -i 30 -I 1000 -g 1 --min-segment-intron 30 --max-segment-intron 1000 --segment-mismatches 1 --segment-length 25 --library-type fr-secondstrand --max-insertion-length 3 --max-deletion-length 3 --no-coverage-search -r 100 --mate-std-dev 20 -o ./local_tophat_new_alignments <genome> 1_trim1.fastq 2_trim2.fastq
I check the number of reads
samtools veiw -c accepted_hits.bam
old version gave me 6,910,198
new version 27,645,322
I don't know why the number of reads are so different.
Just in case, I show you align_summary.txt
#old version
Left reads:
Input : 3540517
Mapped : 3449572 (97.4% of input)
Right reads:
Input : 3540517
Mapped : 3460626 (97.7% of input)
97.6% overall read mapping rate.
Aligned pairs: 3380776
1025738 (30.3%) are discordant alignments
66.5% concordant pair alignment rate.
#new version
Left reads:
Input : 14189364
Mapped : 13781744 (97.1% of input)
Right reads:
Input : 14189364
Mapped : 13863578 (97.7% of input)
97.4% overall read mapping rate.
Aligned pairs: 13503616
4100383 (30.4%) are discordant alignments
66.3% concordant pair alignment rate.
Could anyone please explain it?
All versions of TopHat are the old version. It should not be used any more - the authors themselves state this.