Dear all,
I am currently researching the differences in RNA-Seq data analysis, comparing the two well known EdgeR and Voom methods. However, there is one thing I can not manage to reproduce, namely the logCPM value in the output of the LRT table of EdgeR, after analyzing a certain contrast or coefficient. I understand from the manual and helpfile, that this logCPM value is a log2 counts per million, normalized for library sizes. I also understand that it is not a simple mean that is being used, but rather the mglmOneGroup() function. However, when I try to recalculate this myself, I can not obtain the same value. My workflow in doing so looks like this:
#Calculate through the table
counts <- read.delim("file.txt")
Treat <- factor(rep(c(rep(c("Cont","DPN","OHT"),4)),each=2))[-19] #Delete removed sample 19 by authors
y.s <- DGEList(counts=counts.filtered.smart,group=Treat)
y.s <- calcNormFactors(y.s)
y.common <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y.trended <- estimateGLMTrendedDisp(y.common, design)
y.tagwise <- estimateGLMTagwiseDisp(y.trended, design,prior.df=22)
lrt <- glmLRT(fit,coef= c(5,6,9,8))
lrt$table$logCPM
#Calculate manually
cpmstest <- cpm(y, normalized.lib.sizes = TRUE)
LogCpmstest <- log(cpmstest + 0.5, base = 2)
mglmOneGroup(LogCpmstest) #not identical
mglmOneGroup(y.tagwise) #also non-identical
Has anyone got any recommendations in what should be changed in this manual calculation workflow? What am I doing wrong? Any help would be greatly appreciated.
Sincerely, Koen.
Thank you for adding this response to the user's question!