Low alignment rating
1
0
Entering edit mode
7.8 years ago

Dear all,

I have a problem with my RNA-seq data: a low rate of alignment of my samples using Tophat2, as can be seen below.

tophat2 -p 1 --report-secondary-alignments --no-convert-bam -G MSU/MSU.gff -o Os1_thout MSU/MSU Os1_R1_output_paired.fastq Os1_R2_output_paired.fastq

Left reads:
               Input:  13644823
              Mapped:   3075519 (22.5% of input)
            of these:    262852 ( 8.5%) have multiple alignments (187 have >20)
Right reads:
               Input:  13644823
              Mapped:   3093432 (22.7% of input)
            of these:    269827 ( 8.7%) have multiple alignments (164 have >20)
22.6% overall read alignment rate.

Aligned pairs:   3044687
     of these:    197245 ( 6.5%) have multiple alignments
          and:     15454 ( 0.5%) are discordant alignments
22.2% concordant pair alignment rate.

Sequencing was performed on the Illumina 2500 platform, paired-end 100X100. I used fastqc to view sequencing quality and used trimmomatic for removing adapters and bases with poor quality java -jar /home/willian/softwares/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 Os1_S27_L004_R1_001.fastq Os1_S27_L004_R2_001.fastq Os1_R1_output_paired.fastq Os1_R1_output_unpaired.fastq Os1_R2_output_paired.fastq Os1_R2_output_unpaired.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:70. Attached is the result of FastQC, before and after the trimmer, respectively.

Trimmomatic results:

#Os1
Input Read Pairs: 15171431 Both Surviving: 13644823 (89.94%) Forward Only Surviving: 1287637 (8.49%) Reverse Only Surviving: 87906 (0.58%) Dropped: 151065 (1.00%)

One of the problems I see from these samples is that their RINs were low (6.0), compared to another rice cultivar in which the alignment results are good (8.0). The main difference that I see between the samples with RIN 6 in relation to those with RIN 8 is in the parameter overexpressed sequences.

I've already left the Tophat more flexible with respect to the mismatches, and I've also tested another alignment program (HPG Aligner), and the rate has increased very little.

What could I do to increase the alignment rate?

Kind Regards

Marcelo Amaral

Before Trimmer

Before Trimmer2

After Trimmer

After Trimmer2

rna-seq alignment sequence • 2.8k views
ADD COMMENT
1
Entering edit mode
7.8 years ago
brent_wilson ▴ 140

I haven't used Trimmomatic, but sometimes trimming programs can cause your forward and reverse reads files to get out of synch. That is, if a read is removed from R2 but not R1, then the aligner will fail to align a huge percentage because nearly everything will be aligned without a proper pair. I would check to make sure you have the same number of reads in both files.

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENT
1
Entering edit mode

I had a look at the syntax of the Trimmomatic command above, and it looks fine, the reads should have remained in sync.

Sometimes low rates of alignment are due to contamination, or the sample is not from the species you expect it to be. Try doing blast with some of the unaligned reads and see if they match something you didn't expect.

ADD REPLY

Login before adding your answer.

Traffic: 1775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6