Normalization Of Gene Expression Using Rnaseq Rpkm Values
1
2
Entering edit mode
11.2 years ago
J.F.Jiang ▴ 930

Hi all,

I am dealing with the RNASeq data. After mapping the reads to ref and obtaining the RPKM values for each gene, I want to normalized the expression values.

Starting from the RPKM values, I removed some lines with too much 0, and finally got 12K gene expression profiles.

The ranges of RPKM are 0 to 1e-6, which can not fit to the normal distribution.

I tried two methods to normalized the expression profiles:

1) assign the smallest value to 0, and then log2 transformed the data, the distribution look liked as the normal distribution, but it is actually not normalized, (do not fit N(0,1) distribution)

2) transforming the ranks of the expression values for each gene to their respective quantiles of a N(0; 1) distribution, however, the distribution profiles did not seem good enough.

So anyone has better solutions?

Thanks.

normalization expression rnaseq rpkm • 12k views
ADD COMMENT
0
Entering edit mode

Just curious what is the need to transform the data into a normal distribution? There are normalization and analysis methods adapted for RNAseq data specifically -- Bullard et al (2010) is a good reference.

ADD REPLY
0
Entering edit mode

Thank you for your comment. The method you referred is mainly used to detect DE genes, which they claimed better than RPKM values. The fact is that I want to use RPKM values to represent the expression of genes.

The aim of transform the data to normal distribution is due to that the variance for a gene will flucuate too much if I adopt RPKM values. Just like using quantitle normalization to normalize the array expression and then using scale to make a normal distribution, any better solutions for RNASeq?

ADD REPLY
0
Entering edit mode

When you mention that the rank transformation (method 2) didn't produce good enough results, can you talk more about what exactly you mean by that?

ADD REPLY
0
Entering edit mode

The ranges of RPKM are 0 to 1e-6...

That seems like a very narrow range. Did you mean 0 to 1e6 instead?

ADD REPLY
0
Entering edit mode

I also met your question. And I read some paper saying in the method part that, they done log2-transforming + mean-centering, such as Yue Li(2014) and TCGA-AML(2013). And this normalization is common even in the period of the microarray. The bioconductor R package affy offers the log2-transformed expression value.

Another question is how to do this step. Many RPKM is 0, so the log2 of 0 is -Inf. So this post is about the log2-transformed RPKM, saying that you can do log2(x+1) or log2(x+0.25).

My idea is simple. Hope it's helpful.

Thanks

ADD REPLY
4
Entering edit mode
11.2 years ago

See the voom() function in the Bioconductor limma package or the vst functionality in DESeq2.

ADD COMMENT

Login before adding your answer.

Traffic: 2032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6