I have a time course RNAseq data from 8 time-points with 3 replicates per sample. These data were aligned with STAR, I get the featureCount output. With these file, I wan't to use MasigPro package to evaluate which genes are moving during this process. But I've got a problem with the notion of negative binomial model. I read a lot of post to understand that the RNAseq libraries follow this model due to differences between biological replicates inducing that variance differs from mean.
I've got a table with sample in column and gene in row, and effectively when I make a plot with log(row mean) vs log(var mean) I can see the deviation from the line mean=var.
But in this package, they said "θ must be specified and it can be computed by using available methods as edgeR" I understand that Theta is a parameter for the glm (seem to be the "shape" of the glm). But I don't know how to determine it. I read that MASS package can do this, I follow this tutoriel to better understand: https://stats.stackexchange.com/questions/10419/what-is-theta-in-a-negative-binomial-regression-fitted-with-r
but It not seem to work if i make
mean <- apply(data, 1, mean)
var <- apply(data, 1, var)
mod.NB<-glm.nb(mean~var)
Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
NA/NaN/Inf dans 'x'
De plus : There were 50 or more warnings (use warnings() to see the first 50)
Does-anybody can explain that, or I miss something ?
Thanks a lot nicolas
I don't know the package you're using but glm.nb estimates theta from the data when given a fixed mean using the function theta.md. Also why do you want to regress the mean on the variance ? Besides this, you should give a reproducible example including data and code.