Hi there,
it came to me that RPKMs/FPKMs follow a normal distribution - can you confirm that?
So that would mean I could run statistics for normal distributions on them, right? Why do people prefer negative binomial distribution then?
Is it legitimate to run statistics developed for microarrays on R/FPKMs?
Thanks
Perhaps you assume that, but I expect what you mean is "it is assumed that after transforming the data to be normally distributed the intensities follow a normal distribution". It's not my experience that the data are naturally normally distributed.
It always depends on the actual experiment and data which distribution data follows. A statement like "data from super-technology data always follows a (duper-)distribution" is void, you can ask 'can data be modelled as normally distributed, and what does that imply?' (one conclusion is that if data are normal, you can use Student's t-test). There are also tests for normality which you could apply to your data to check if your 'intuition' holds.
actually this has been discussed here more than once: read the top ranked questions in this list: http://biostar.stackexchange.com/questions/tagged?tagnames=rna-seq&sort=votes&pagesize=50 In general the consensus seems to be that statistical test on counts is superior to tests on normalized data (e.g. rpkm, quantile normalized).
For microarrays it is assumed that intensities follow a normal distribution.