Question

Do Rpkm / Fpkm Follow Normal Distribution And Can I Run Microarray Stats On Them?

4

Entering edit mode

13.4 years ago

Student ▴ 50

Hi there,

it came to me that RPKMs/FPKMs follow a normal distribution - can you confirm that?

So that would mean I could run statistics for normal distributions on them, right? Why do people prefer negative binomial distribution then?

Is it legitimate to run statistics developed for microarrays on R/FPKMs?

Thanks

rna fpkm rpkm microarray statistics • 8.4k views

ADD COMMENT • link updated 13.4 years ago by Philippe ★ 1.9k • written 13.4 years ago by Student ▴ 50

2

Entering edit mode

Perhaps you assume that, but I expect what you mean is "it is assumed that after transforming the data to be normally distributed the intensities follow a normal distribution". It's not my experience that the data are naturally normally distributed.

ADD REPLY • link 13.4 years ago by David Quigley 11k

1

Entering edit mode

It always depends on the actual experiment and data which distribution data follows. A statement like "data from super-technology data always follows a (duper-)distribution" is void, you can ask 'can data be modelled as normally distributed, and what does that imply?' (one conclusion is that if data are normal, you can use Student's t-test). There are also tests for normality which you could apply to your data to check if your 'intuition' holds.

ADD REPLY • link 13.4 years ago by Michael 56k

1

Entering edit mode

actually this has been discussed here more than once: read the top ranked questions in this list: http://biostar.stackexchange.com/questions/tagged?tagnames=rna-seq&sort=votes&pagesize=50 In general the consensus seems to be that statistical test on counts is superior to tests on normalized data (e.g. rpkm, quantile normalized).

ADD REPLY • link 13.4 years ago by Michael 56k

0

Entering edit mode

For microarrays it is assumed that intensities follow a normal distribution.

ADD REPLY • link 13.4 years ago by Student ▴ 50

Ram · Answer 1 · 2012-03-23

Hi,

As Michael Dondrup said it depends on the data/experiment. For example a study by Hebenstreit et al reported a bimodal distribution of RPKM for mouse Th2 cells. I think I have read and seen a bit of everything concerning the law followed by RPKM distribution, from normal law to power law.

Therefore, the best thing is to determine yourself whether the data you study follow a normal law. As mentioned by Michael, some statistical tests are available for this purpose. You can see this wikipedia article for an introduction to this topic.

Nonetheless, I would like to draw your attention to the fact these tests are sensitive to samples size. As shown in this blog post, data that actually look as perfect example of a normal distribution fail to be validated by some normality test.

A good means to check the normality of your distribution is to perform a visual confirmation. This can be done looking at the shape of your distribution (is it bell-shaped?) and at QQ-plots that align your distribution to a theoretical normal distribution. You just have to be careful when you interpret them.

Finally, if you are not sure your data follow a normal law you can use some rank-based non-parametric tests such as the Mann-Whitney U test (instead of t-test) or spearman correlation (instead of pearson correlation). These tests are designed to be insensitive to the fact your distribution is normal or not. Concerning the microarrays analysis methods you just have to be sure their prerequisites are also fulfilled with you current RNA-Seq data.