Entering edit mode
5.4 years ago
ma23
▴
40
Hi everyone!
I have some data that describe gene expression of several people.
I want to understand whether the distribution of the data can be modeled as the Poisson or the Negative binomial distribution.
For the Poisson I use the next commands:
n <- length(x)
lambda = mean(x) # I use the MLE for the Poisson parameter
f.hyp = dpois(x,lambda)*n
chiSquare.pois = sum((f.obs-f.hyp)^2/f.hyp)
Am I right with this code ?
How can I estimate the parameters for the neg.binomial distribution and compare these two models (poisson and neg.binomial ) ?
I would probably start with the papers and source codes of the established tools that model RNA-seq as NB, such as
DESeq2
andedgeR
to get an impression on how/why they do it.As pointed out, check previous work to understand why people decided for a particular distribution for a given data type. Typically when trying to decide which distribution best approximates the data, visual tools (e.g. density and QQ plots) and goodness-of-fit tests are used (e.g. chi-squared test). For choosing between (families of) distributions, have a look at the R package fitdistrplus and its descdist() function.