Question

Normalization Of Gene Expression Using Rnaseq Rpkm Values

2

Entering edit mode

11.2 years ago

J.F.Jiang ▴ 930

Hi all,

I am dealing with the RNASeq data. After mapping the reads to ref and obtaining the RPKM values for each gene, I want to normalized the expression values.

Starting from the RPKM values, I removed some lines with too much 0, and finally got 12K gene expression profiles.

The ranges of RPKM are 0 to 1e-6, which can not fit to the normal distribution.

I tried two methods to normalized the expression profiles:

1) assign the smallest value to 0, and then log2 transformed the data, the distribution look liked as the normal distribution, but it is actually not normalized, (do not fit N(0,1) distribution)

2) transforming the ranks of the expression values for each gene to their respective quantiles of a N(0; 1) distribution, however, the distribution profiles did not seem good enough.

So anyone has better solutions?

Thanks.

normalization expression rnaseq rpkm • 12k views

ADD COMMENT • link updated 11.2 years ago by Sean Davis 27k • written 11.2 years ago by J.F.Jiang ▴ 930

0

Entering edit mode

Just curious what is the need to transform the data into a normal distribution? There are normalization and analysis methods adapted for RNAseq data specifically -- Bullard et al (2010) is a good reference.

ADD REPLY • link 11.2 years ago by kristen.dang ▴ 10

0

Entering edit mode

Thank you for your comment. The method you referred is mainly used to detect DE genes, which they claimed better than RPKM values. The fact is that I want to use RPKM values to represent the expression of genes.

The aim of transform the data to normal distribution is due to that the variance for a gene will flucuate too much if I adopt RPKM values. Just like using quantitle normalization to normalize the array expression and then using scale to make a normal distribution, any better solutions for RNASeq?

ADD REPLY • link 11.2 years ago by J.F.Jiang ▴ 930

0

Entering edit mode

When you mention that the rank transformation (method 2) didn't produce good enough results, can you talk more about what exactly you mean by that?

ADD REPLY • link 11.2 years ago by Devon Ryan 104k

0

Entering edit mode

The ranges of RPKM are 0 to 1e-6...

That seems like a very narrow range. Did you mean 0 to 1e6 instead?

ADD REPLY • link updated 5.8 years ago by Ram 44k • written 11.2 years ago by polarise ▴ 380

0

Entering edit mode

I also met your question. And I read some paper saying in the method part that, they done log2-transforming + mean-centering, such as Yue Li(2014) and TCGA-AML(2013). And this normalization is common even in the period of the microarray. The bioconductor R package affy offers the log2-transformed expression value.

Another question is how to do this step. Many RPKM is 0, so the log2 of 0 is -Inf. So this post is about the log2-transformed RPKM, saying that you can do log2(x+1) or log2(x+0.25).

My idea is simple. Hope it's helpful.

Thanks

ADD REPLY • link updated 5.8 years ago by Ram 44k • written 9.9 years ago by zju.whw ▴ 70

Ram · Answer 1 · 2013-09-12

4

Entering edit mode

11.2 years ago

Sean Davis 27k

See the voom() function in the Bioconductor limma package or the vst functionality in DESeq2.

ADD COMMENT • link updated 5.8 years ago by Ram 44k • written 11.2 years ago by Sean Davis 27k