Chi Square test for negative binomial distribution using R.

0

Entering edit mode

5.3 years ago

ma23 ▴ 40

Hi all! I have a table of counts named countDF. I want to take one row from the table and check if it obeys the law of the negative distribution. Here is the code I use:

avector <- as.vector(as.numeric(countDF[rownames(countDF)=='ENSG00000223972',]))
nbfit = fitdistr(avector,'negative binomial')
r.fit = nbfit$estimate['size']
p.fit = r.fit / (nbfit$estimate['mu'] + r.fit)

f.obs = table(avector)
bins = as.numeric(names(f.obs))
f.hyp = dnbinom(bins,size=r.fit,prob=p.fit)*nsamples
chiSquare.nb = sum((f.obs-f.hyp)^2/f.hyp)
dof <- length(bins)-1
q <- qchisq(p = 0.95, df = dof)

So I understand if the value of chiSquare.nb is larger than the value of q, then the row can be described by the negative binomial distribution. I expect my data to fit neg.binomial distribution well, but the thing is, I get chiSquare.nb=24089.75 and q=21.02607. Is something wrong with my code ? Can anything be done to fix the problem ?

R negative binomial • 2.6k views

ADD COMMENT • link 5.3 years ago by ma23 ▴ 40

0

Entering edit mode

Is this RNA-seq data?

ADD REPLY • link 5.3 years ago by ATpoint 85k

0

Entering edit mode

Yes, it is RNA-seq data.

ADD REPLY • link 5.3 years ago by ma23 ▴ 40

0

Entering edit mode

I think you mean "if the value of chiSquare.nb is smaller than the value of q"? A large chisq means a less good fit.

You might find you get more helpful advice on this from stats.stackexchange.com

ADD REPLY • link 5.3 years ago by i.sudbery 20k

0

Entering edit mode

Have you checked that fitdistr is doing anything approaching a good job of fitting the distribution?

The alternative, of course, is that the data is not negative bionmially distributed. Which might happen, if, for example, you had two different conditions in the data.