Average Percentage Of Rna-Seq Reads Coming From Known Annotation (Ensembl, Refseq,...)
3
3
Entering edit mode
11.8 years ago

Hi,

I've a more general question about RNA-Seq data. So what is usually the average percentage of reads coming from known annotation.

Per example, with 2x50 bp strand-specific human data. After alignment (tophat, STAR,...) what is the percentage of reads that are mapping and what is the percentage of reads coming from known annotation (per example ensembl genes).

Thanks a lot,

N.

read annotation rnaseq • 5.6k views
ADD COMMENT
2
Entering edit mode
11.8 years ago

Here are some examples of RNA-seq libraries generated and sequenced in various ways. The same human sample was used for all 8 approaches. Starting from either total RNA or polyA RNA. Using either Nugen Ovation V2 or the Encore kit for cDNA synthesis. And finally, subjecting the library to an exome capture or not. Yes, I know this is an unusual thing to do with an RNA-seq library. ;). All sequence data are paired 2x100 bp reads. The Encore libraries are strand specific and the Ovation are not. Alignments were by Tophat v2 with Bowtie v2.

The plot shows the proportion of reads that map to known coding regions, known UTR regions, intronic regions, etc. So for total reads mapping to known transcript annotations you would add the UTR and coding components. The annotations are from Ensembl.

RNA-seq read alignments broken down by gene compartments

ADD COMMENT
0
Entering edit mode

What did you use to visualize this? Excel? the colors are chosen very well

ADD REPLY
0
Entering edit mode

Yeah, it was. Not a fan but in this case it did a decent job I guess.

ADD REPLY
0
Entering edit mode
11.8 years ago

I don't have a specific number and I don't think anyone can tell you this. If you can tell the organism you are working on and the tissue type, people may give you an idea then. The percentage of reads getting mapped to known annotation may depend on how well your organism is annotated and tissue you are studying. I have analysed mouse hippocampus data and though we think that mouse genome is well annotated but i found interesting novel transcripts getting expressed. Sometimes the library preparation method can go wrong and you may see lot of sequenced not getting mapped at all.

ADD COMMENT
0
Entering edit mode

it's human and it's T-cells. And it's total RNA, rRNA depletion with ribo-zero

ADD REPLY
0
Entering edit mode
11.8 years ago

We have used about 80% mapped and, of that, 80% "mRNA fraction" (coding, UTR) as a rough rule of thumb for poly-A selected mRNA in human and mouse tissue and cell lines. The vast majority experiments fall into 70-90% for both those metrics. That seems to fit Malachi's graph pretty well although it is much more comprehensive than this answer :-) The fraction would be expected to be much lower for rRNA depletion because you will observe a lot of of unannotated RNA species.

ADD COMMENT

Login before adding your answer.

Traffic: 2099 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6