Question

non-exonic reads in RNAseq

0

Entering edit mode

8.5 years ago

kanwarjag ★ 1.2k

I understand in RNAseq we will always get some level of reads mapping to areas . However I am seeing almost 85% of reads align to outside coding region. i was wondering what may be the cause of this. Does that mean a contamination has over- powered actual data.

RNA-Seq intergenic • 2.5k views

ADD COMMENT • link updated 8.5 years ago by Michele Busby ★ 2.2k • written 8.5 years ago by kanwarjag ★ 1.2k

3

Entering edit mode

Are you using the correct reference? ;-)

It sounds like a wet lab problem, are you sure you have removed DNA from your samples?

ADD REPLY • link 8.5 years ago by Benn 8.3k

0

Entering edit mode

yes correct genome for aligning. I am looking and the distribution of reads in Bam file

ADD REPLY • link 8.5 years ago by kanwarjag ★ 1.2k

2

Entering edit mode

Almost every time this happens it's because someone's using the wrong genome in IGV :)

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

1

Entering edit mode

you define as "intergenic" everything that couldn't be assigned to a gene? what about intronic regions? this could eventually be also caused by looking at the wrong strand, or using a different genome version for the annotation and the mapping. did you already check these cases?

ADD REPLY • link 8.5 years ago by Martombo ★ 3.1k

0

Entering edit mode

Is your genome of interest/specie well characterized?

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

0

Entering edit mode

mm9 that is well characterized

ADD REPLY • link 8.5 years ago by kanwarjag ★ 1.2k

1

Entering edit mode

Are these ribo-depleted samples or polyA-enriched? If the former, then perhaps you're seeing expressed repeat regions (there can be quite a few).

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

ribo depleted. I think it is sample preparation but I want to make sure

ADD REPLY • link 8.5 years ago by kanwarjag ★ 1.2k

1

Entering edit mode

Try aligning against the Rn45S sequence and see how many reads you get. I routinely do that with our rRNA depleted datasets to see how depleted they actually are. My guess is that Michael is right you're getting a bunch of rRNA (and probably tRNA). Just look at some of the higher-coverage areas on the UCSC genome browser with the repeatmasker track enabled. I suspect that'll be illuminating.

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

Does anyone know why this was deleted?

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

Nope, can we restore it?

ADD REPLY • link 8.5 years ago by WouterDeCoster 47k

0

Entering edit mode

I opened it but I guess the OP deleted this,

ADD REPLY • link 8.5 years ago by GouthamAtla 12k

score 2 · Answer 1 · 2016-06-01

Hi Kanwarjag,

However I am seeing almost 85% of reads align to intergenic region.

This is imo extremely unlikely and I have never seen that, even with our imperfect annotation of our model the salmon louse. I have done a little randomization experiment on our data by placing random gene models in intergenic regions, and found that for most samples the 99% confidence level background read-count to experience in intergenic regions is 1. I don't have exact figures for reads overlapping intergenic regions though, but 85% is very high. I would check the following

correct annotation version
missing ribosomal genes from the annotation and high level of rRNA
draft annotation with a large number of truncated gene models and missing or truncated UTR's (add some kb flanks to genes and check again)
leak DNA

score 0 · Answer 2 · 2016-06-01

0

Entering edit mode

8.5 years ago

Michele Busby ★ 2.2k

Try running it through RNA Seq QC. It take a while to set up but once you do you have it. It will tell you if it's rRNA or intergenic vs intronic.

ADD COMMENT • link 8.5 years ago by Michele Busby ★ 2.2k