Question

Rna-Seq: Normalize Gene Expression In 1 Sample

1

Entering edit mode

13.7 years ago

Arun 2.4k

I am working on alternative splicing (AS) events on 4 different tomato species. I am trying to find interesting AS events.

Lets say for example, the AS event is exon-skipping (ES). After mapping RNA-Seq reads to tomato genome, I look for junctions (or intron coordinates) where they are spliced normally - normal junctions (NJ) and where the 3prime end Exon is skipped. So, for every junction, I have a count of reads that map to the junction normally (exactly where the intron is and supposed splicing should occur) and count of reads where at the same junction an ES event had occurred (the 3prime exon is skipped). At the end I have a table like this for each junction (I already remove where there is no ES event in ALL 4 species).

Junction 1:

    S1  S2  S3  S4  
ES  10   0  27   0  
NJ  95  20  50 380

Then I do a fisher-test on this 2*4 table and correct for multiple-testing using Benjamini-Hochberg method (from R multtest package) to obtain those events that are significantly different across species.

Now of course the question is, what if 1) the gene where this junction (or intron) belongs is over- (or under-) expressed between these species. For ex: S2 has only a total of 20 reads mapped. 2) How about the number of reads for these species that was sequenced? 3) what about gene length? ( as the transcript abundance is also found to be positively correlated with gene length).

So, I have to somehow normalize this data. So far, with the exception of RPKM (which I am not convinced as an appropriate measure), all other methods were about finding differential expression of genes (and demand 2 or more samples), for ex: quantile normalization, TMM, the edgeR package etc. However, I would like to normalize gene expression in each of these samples.

Does anyone have an idea how to go about it?

Thank you! Arun.

gene data multiple rna • 4.3k views

ADD COMMENT • link updated 13.7 years ago by Gww ★ 2.7k • written 13.7 years ago by Arun 2.4k

score 2 · Answer 1 · 2011-08-04

2

Entering edit mode

13.7 years ago

Gww ★ 2.7k

The easiest way to normalize for exon skipping events is to convert the reads to exon inclusion scores. ie.

100 * (inc1 + inc2) / (exc)

Where:

inc1 = the number of reads mapping across the first junction
       supporting exon inclusion
inc2 = the number of reads across the second junction
exc  = the number of reads supporting exon exclusion

However, it's a good idea to make sure that inc1 and inc2 have similar scores since big differences may indicate some other additional splicing events. For other types of events you could try to find other ways to calculate similar scores.

This approach has been used before in this paper.

ADD COMMENT • link 13.7 years ago by Gww ★ 2.7k

0

Entering edit mode

Thanks for your response. I have a couple of questions reg. this, even though I am quite convinced with the way I do it for some reason. 1) What statistical test would you use to test if a particular junction is "significant" or "interesting" in my sense, i.e., between these 4 species. I have 4 values for every ES event. 2) Again the problem comes when I am comparing between species. I guess there would be a dependency on the total number of reads (or gene / exon length). Or do you find this sufficiently good measure?

thanks again!

ADD REPLY • link 13.7 years ago by Arun 2.4k