RNA-seq mapper that allows variability and handles multi-mapping?
0
0
Entering edit mode
8.7 years ago
Ekarl2 ▴ 120

I have a species that has a 4-5% variation between alleles and each RNA-seq sample has several hundreds individuals. I am mapping against a template from another, single individual that has a much more complete transcriptome assembly.

Currently, I am using bowtie + RSEM, but the way bowtie works seems sub-optimal for this dataset as a lot of reads have a few mismatches in the seed compared with reference, so reads are not really mapping that well (65% or so) despite playing around with more lenient parameter choices in bowtie, such as decreasing seed length, increasing mismatches, allowing more backtracking etc. Mapping with e. g. BWA with default parameters restores a sizable amount of this decreased mapping, so the issue seems to be with bowtie.

Are there any alternatives to bowtie that can be combined with either RSEM (by e.g. removing gapped alignments?) or any other similar program that handles multi-mapping reads? If not, are there are alternatives that might work better for this kind of dataset? Like if we give up the ability to accurately handle multi-mapping reads, what other options are there that might be suitable for datasets like this one?

RNA-Seq mapping • 3.7k views
ADD COMMENT
1
Entering edit mode

I'd recommend trying one of our tools (either sailfish or salmon). Both are very fast and accurate tools for transcript-level quantification. They both use a custom algorithmw for mapping reads to the transcriptome that is accurate and tolerant to errors / variation. Optionally, salmon can be paired with an aligner, but doesn't require removing gaps in the alignments prior to quantification.

ADD REPLY
0
Entering edit mode

Have you looked at BBMap? GMAP is also supposed to be more SNP tolerant. While changing aligners may not necessarily give you the desired outcome since you are looking for options both would be worth giving a try.

ADD REPLY
0
Entering edit mode

What do you mean 'by handles multi-mapping reads'? What is it you want to do with multi-map reads? Ignore them or count them? Neither option is ideal.

Multi-maps are a function of paralogy and sequence accuracy. The higher the paralogy or the lower the accuracy the more multi-mapping reads you'll get. By increasing the leniency you're increasing the chances of, probably artifactual, multi-maps.

You mention BWA is better, so why not just use that?

ADD REPLY
0
Entering edit mode

Given that @Ekarl2 is using Bowtie in conjunction with RSEM, I assume that "handles multi-mapping reads" is used to mean "resolves multi-mapping reads" (i.e. fractionally allocating multi-mapping reads in the manner that maximizes the likelihood of the observed reads — at least locally).

ADD REPLY
0
Entering edit mode

You might find http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3519.html to be useful. It is focused precisely on the issue of how to accurately assign multi-mapping reads for RNA-Seq abundance estimation.

ADD REPLY

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6