What % Of Reads Are You Usually Able To Align When Mapping Reads From Rna-Seq To The Genome?
3
6
Entering edit mode
13.1 years ago
Steffi ▴ 580

I observe very often percentages as low as 30%. Of course, its hard to generalize but I would be interested in other people experiences. Do anybody observe any major mapping improvements when using certain mapping programms?

Specifically, I talk of aligning paired end mRNA reads (read length 100) to the genome (mus musculus). I have already seen several mRNA experiments and I have frequently observed such a low mapping efficiency when aligning the data with Bowtie.

mapping rna mouse • 20k views
ADD COMMENT
3
Entering edit mode

Can we have some details abou the reads (length, paired, not)? What aligner are you using? What organism? What reference sequences are you using, genome or transcriptome?

ADD REPLY
0
Entering edit mode

In bacterial samples I have seen mapping rates of up to 80-90%. Depends possibly on many factors, but 30% seems very low. Your sample is maybe contaminated, or your parameters too strict. Other than that I have good experience with BFAST and mosaic.

ADD REPLY
0
Entering edit mode

Your question is hard to address without asking "align at what stringency?" Are you asking "what percentage of reads should be unique alignment of high quality"? I'm doing a RNS-seq analysis right now, and a significant minority of my reads come from repetitive regions that do not have a unique alignment.

ADD REPLY
4
Entering edit mode
13.1 years ago

Be sure that you QC your data. A simple step is to run fastqc on each file.

In terms of alignments, I would suggest that bowtie is not the tool to use. First, you are using relatively long reads. The chance of an indel occurring in the reads is quite significant and bowtie cannot deal with indels (as of the most recent version). Second, bowtie cannot align across intron/exon boundaries. Given that the "average" exon in mouse is ~100bp, a significant percentage of each read in the paired-end reads would be expected to cross an intron/exon boundary, so seeing 30% of the reads aligning is perhaps what might be expected. I would suggest using at least an RNA-seq aligner such as TopHat, GSNAP, RUM, or one of a dozen others. See this paper for some examples and an evaluation of performance:

http://www.ncbi.nlm.nih.gov/pubmed/21775302

ADD COMMENT
0
Entering edit mode

Thanks a lot for the advice. Thats what I am doing at the moment. Extensive QC and EDA, and then, in the following, try a more suitable mapper.

ADD REPLY
0
Entering edit mode

I agree with Sean. I have tried HMMsplicer, GSNAP, RUM and MapSplice, and so far with the last one I have had the better results.

ADD REPLY
3
Entering edit mode
13.1 years ago
Marina Manrique ★ 1.3k

Hey,

About the low mapping efficiency of Bowtie maybe you'd like to take a look at this question about the 'low' performance of Bowtie mapping PE reads due to a 'wrong' choice of the insert size. The answer of brentp is particularly good,

HTH,
Marina

ADD COMMENT
0
Entering edit mode

I will definitely try this (so using a rather high max insert size). Until now I have used a rather strict estimate of the insert size. Maybe beeing a bit vague at this point helps a lot. I will keep you informed.

ADD REPLY
0
Entering edit mode
13.1 years ago

For Illumina 2x100 PE to mm9 I would expect something like 60-80% with TopHat, depending on library quality, the quality of the sequencing run, etc.

ADD COMMENT

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6