Dear all,
I have a problem with my RNA-seq data: a low rate of alignment of my samples using Tophat2, as can be seen below.
tophat2 -p 1 --report-secondary-alignments --no-convert-bam -G MSU/MSU.gff -o Os1_thout MSU/MSU Os1_R1_output_paired.fastq Os1_R2_output_paired.fastq
Left reads:
Input: 13644823
Mapped: 3075519 (22.5% of input)
of these: 262852 ( 8.5%) have multiple alignments (187 have >20)
Right reads:
Input: 13644823
Mapped: 3093432 (22.7% of input)
of these: 269827 ( 8.7%) have multiple alignments (164 have >20)
22.6% overall read alignment rate.
Aligned pairs: 3044687
of these: 197245 ( 6.5%) have multiple alignments
and: 15454 ( 0.5%) are discordant alignments
22.2% concordant pair alignment rate.
Sequencing was performed on the Illumina 2500 platform, paired-end 100X100. I used fastqc to view sequencing quality and used trimmomatic for removing adapters and bases with poor quality java -jar /home/willian/softwares/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 Os1_S27_L004_R1_001.fastq Os1_S27_L004_R2_001.fastq Os1_R1_output_paired.fastq Os1_R1_output_unpaired.fastq Os1_R2_output_paired.fastq Os1_R2_output_unpaired.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:70
. Attached is the result of FastQC, before and after the trimmer, respectively.
Trimmomatic results:
#Os1
Input Read Pairs: 15171431 Both Surviving: 13644823 (89.94%) Forward Only Surviving: 1287637 (8.49%) Reverse Only Surviving: 87906 (0.58%) Dropped: 151065 (1.00%)
One of the problems I see from these samples is that their RINs were low (6.0), compared to another rice cultivar in which the alignment results are good (8.0). The main difference that I see between the samples with RIN 6 in relation to those with RIN 8 is in the parameter overexpressed sequences.
I've already left the Tophat more flexible with respect to the mismatches, and I've also tested another alignment program (HPG Aligner), and the rate has increased very little.
What could I do to increase the alignment rate?
Kind Regards
Marcelo Amaral
I had a look at the syntax of the Trimmomatic command above, and it looks fine, the reads should have remained in sync.
Sometimes low rates of alignment are due to contamination, or the sample is not from the species you expect it to be. Try doing blast with some of the unaligned reads and see if they match something you didn't expect.