Optimization of RNA-Seq data mapping with tophat2
1
0
Entering edit mode
7.3 years ago
Denis ▴ 310

Hi there,

I'm trying to map Illumina PE RNA-Seq data originated from one fish genus to the genome of another related genus of the same family with tophat2 software. I started with almost default settings of tophat2 (excepting -r flag). Then i've added --mate-std-dev 4000 --read-edit-dist 20 to my command line, but mapping statistics is still bad.

Left reads:
Input : 14811423
Mapped : 7078897 (47.8% of input)
of these: 4273616 (60.4%) have multiple alignments (1521255 have >20)
Right reads:
Input : 14811423
Mapped : 6753768 (45.6% of input)
of these: 4051398 (60.0%) have multiple alignments (1521195 have >20)
46.7% overall read mapping rate.

Aligned pairs: 5026382
of these: 3371900 (67.1%) have multiple alignments
1614729 (32.1%) are discordant alignments
23.0% concordant pair alignment rate.

My questions are: Which tophat settings i have to try in my case? Which program for RNA-Seq reads mapping would be better to test besides tophat2?

RNA-Seq alignment • 2.8k views
ADD COMMENT
2
Entering edit mode

use STAR aligner

ADD REPLY
1
Entering edit mode

You may have rRNA in this data if you did not use a ribo-depletion method or it did not work well. You can take a sample of reads that don't align and blast them to see what you find. If you do have rRNA contamination then using a different aligner will not help.

ADD REPLY
0
Entering edit mode

Thanks for your reply. I've consulted with our wet lab specialists, we extracted and sequenced only poly-A RNA fraction. But anyway it's a good idea! I have to take a look at not aligned reads. Also it's interesting to check reads with multiple alignments with the reference genome.

ADD REPLY
1
Entering edit mode

Last time i had scores like this, i tried trimming according to fastqc results and it improved the numbers a lot.

ADD REPLY
0
Entering edit mode

Did you mean quality and adapter trimming of the data before mapping step?

ADD REPLY
0
Entering edit mode

If not quality at least adapter trimming for sure before mapping.

ADD REPLY
1
Entering edit mode
7.3 years ago
Satyajeet Khare ★ 1.6k

You can use HiSAT2 or STAR as recommended by @geek_y, but since you are aligning reads from one fish genus RNA-Seq to another fish genus genome, low alignment might be real. Other issues that may lead to lower alignment are, incorrect de-multiplexing and contamination during sample preparation.

ADD COMMENT
0
Entering edit mode

Thanks. Definitely make sense. How can i check my data for de-multiplexing correctness ? Contamination i could probably check by blast search of unmapped reads.

ADD REPLY
0
Entering edit mode

In my experience, samples with rRNA contamination does not affect alignment percentage. rRNA contamination does affect differential expression analysis though. I generally check rRNA contamination by looking at the FASTQ report or reads at rRNA genes. Duplication graph on FASTQ will climb with 90%> duplication levels. Demultiplexig issues will lead to cross contamination. You can BLAST reads manaully. That might help.

ADD REPLY
2
Entering edit mode

samples with rRNA contamination does not affect alignment percentage

Perhaps you include rDNA repeat in your reference? Not many do.

ADD REPLY
0
Entering edit mode

Okay, that is a good piece of information. I was not aware that rDNA regions are typically removed by most of the others.

ADD REPLY
0
Entering edit mode

Would you recommend to do repeat masking (including rDNA repeats ) before read mapping in RNA-Seq experiments? I thought, that's relevant only for DNA-Seq reads mapping.

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6