Question

Optimization of RNA-Seq data mapping with tophat2

0

Entering edit mode

7.2 years ago

Denis ▴ 310

Hi there,

I'm trying to map Illumina PE RNA-Seq data originated from one fish genus to the genome of another related genus of the same family with tophat2 software. I started with almost default settings of tophat2 (excepting -r flag). Then i've added --mate-std-dev 4000 --read-edit-dist 20 to my command line, but mapping statistics is still bad.

Left reads:
Input : 14811423
Mapped : 7078897 (47.8% of input)
of these: 4273616 (60.4%) have multiple alignments (1521255 have >20)
Right reads:
Input : 14811423
Mapped : 6753768 (45.6% of input)
of these: 4051398 (60.0%) have multiple alignments (1521195 have >20)
46.7% overall read mapping rate.

Aligned pairs: 5026382
of these: 3371900 (67.1%) have multiple alignments
1614729 (32.1%) are discordant alignments
23.0% concordant pair alignment rate.

My questions are: Which tophat settings i have to try in my case? Which program for RNA-Seq reads mapping would be better to test besides tophat2?

RNA-Seq alignment • 2.8k views

ADD COMMENT • link updated 7.2 years ago by Satyajeet Khare ★ 1.6k • written 7.2 years ago by Denis ▴ 310

2

Entering edit mode

use STAR aligner

ADD REPLY • link 7.2 years ago by GouthamAtla 12k

1

Entering edit mode

You may have rRNA in this data if you did not use a ribo-depletion method or it did not work well. You can take a sample of reads that don't align and blast them to see what you find. If you do have rRNA contamination then using a different aligner will not help.

ADD REPLY • link 7.2 years ago by GenoMax 147k

0

Entering edit mode

Thanks for your reply. I've consulted with our wet lab specialists, we extracted and sequenced only poly-A RNA fraction. But anyway it's a good idea! I have to take a look at not aligned reads. Also it's interesting to check reads with multiple alignments with the reference genome.

ADD REPLY • link 7.2 years ago by Denis ▴ 310

1

Entering edit mode

Last time i had scores like this, i tried trimming according to fastqc results and it improved the numbers a lot.

ADD REPLY • link 7.2 years ago by firatuyulur ▴ 320

0

Entering edit mode

Did you mean quality and adapter trimming of the data before mapping step?

ADD REPLY • link 7.2 years ago by Denis ▴ 310

0

Entering edit mode

If not quality at least adapter trimming for sure before mapping.

ADD REPLY • link 7.2 years ago by GenoMax 147k

score 1 · Answer 1 · 2017-08-29

1

Entering edit mode

7.2 years ago

Satyajeet Khare ★ 1.6k

You can use HiSAT2 or STAR as recommended by @geek_y, but since you are aligning reads from one fish genus RNA-Seq to another fish genus genome, low alignment might be real. Other issues that may lead to lower alignment are, incorrect de-multiplexing and contamination during sample preparation.

ADD COMMENT • link 7.2 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

Thanks. Definitely make sense. How can i check my data for de-multiplexing correctness ? Contamination i could probably check by blast search of unmapped reads.

ADD REPLY • link 7.2 years ago by Denis ▴ 310

0

Entering edit mode

In my experience, samples with rRNA contamination does not affect alignment percentage. rRNA contamination does affect differential expression analysis though. I generally check rRNA contamination by looking at the FASTQ report or reads at rRNA genes. Duplication graph on FASTQ will climb with 90%> duplication levels. Demultiplexig issues will lead to cross contamination. You can BLAST reads manaully. That might help.

ADD REPLY • link 7.2 years ago by Satyajeet Khare ★ 1.6k

2

Entering edit mode

samples with rRNA contamination does not affect alignment percentage

Perhaps you include rDNA repeat in your reference? Not many do.

ADD REPLY • link 7.2 years ago by GenoMax 147k

0

Entering edit mode

Okay, that is a good piece of information. I was not aware that rDNA regions are typically removed by most of the others.

ADD REPLY • link 7.2 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

Would you recommend to do repeat masking (including rDNA repeats ) before read mapping in RNA-Seq experiments? I thought, that's relevant only for DNA-Seq reads mapping.

ADD REPLY • link 7.2 years ago by Denis ▴ 310