Entering edit mode
5.3 years ago
ma23
▴
40
Hi all! I have a table of counts named countDF. I want to take one row from the table and check if it obeys the law of the negative distribution. Here is the code I use:
avector <- as.vector(as.numeric(countDF[rownames(countDF)=='ENSG00000223972',]))
nbfit = fitdistr(avector,'negative binomial')
r.fit = nbfit$estimate['size']
p.fit = r.fit / (nbfit$estimate['mu'] + r.fit)
f.obs = table(avector)
bins = as.numeric(names(f.obs))
f.hyp = dnbinom(bins,size=r.fit,prob=p.fit)*nsamples
chiSquare.nb = sum((f.obs-f.hyp)^2/f.hyp)
dof <- length(bins)-1
q <- qchisq(p = 0.95, df = dof)
So I understand if the value of chiSquare.nb is larger than the value of q, then the row can be described by the negative binomial distribution. I expect my data to fit neg.binomial distribution well, but the thing is, I get chiSquare.nb=24089.75 and q=21.02607. Is something wrong with my code ? Can anything be done to fix the problem ?
Is this RNA-seq data?
Yes, it is RNA-seq data.
I think you mean "if the value of chiSquare.nb is smaller than the value of q"? A large chisq means a less good fit.
You might find you get more helpful advice on this from stats.stackexchange.com
Have you checked that
fitdistr
is doing anything approaching a good job of fitting the distribution?The alternative, of course, is that the data is not negative bionmially distributed. Which might happen, if, for example, you had two different conditions in the data.
I've plotted it. It fits not so well of course but also not very bad..