Hi there,
I'm trying to map Illumina PE RNA-Seq data originated from one fish genus to the genome of another related genus of the same family with tophat2 software.
I started with almost default settings of tophat2 (excepting -r
flag). Then i've added --mate-std-dev 4000 --read-edit-dist 20
to my command line, but mapping statistics is still bad.
Left reads:
Input : 14811423
Mapped : 7078897 (47.8% of input)
of these: 4273616 (60.4%) have multiple alignments (1521255 have >20)
Right reads:
Input : 14811423
Mapped : 6753768 (45.6% of input)
of these: 4051398 (60.0%) have multiple alignments (1521195 have >20)
46.7% overall read mapping rate.
Aligned pairs: 5026382
of these: 3371900 (67.1%) have multiple alignments
1614729 (32.1%) are discordant alignments
23.0% concordant pair alignment rate.
My questions are: Which tophat settings i have to try in my case? Which program for RNA-Seq reads mapping would be better to test besides tophat2?
use STAR aligner
You may have rRNA in this data if you did not use a ribo-depletion method or it did not work well. You can take a sample of reads that don't align and blast them to see what you find. If you do have rRNA contamination then using a different aligner will not help.
Thanks for your reply. I've consulted with our wet lab specialists, we extracted and sequenced only poly-A RNA fraction. But anyway it's a good idea! I have to take a look at not aligned reads. Also it's interesting to check reads with multiple alignments with the reference genome.
Last time i had scores like this, i tried trimming according to fastqc results and it improved the numbers a lot.
Did you mean quality and adapter trimming of the data before mapping step?
If not quality at least adapter trimming for sure before mapping.