I am trying perform differential analysis with edgeR
on sequencing data across two conditions, each of which has two replicates. I first try the following scripts learned from edgeR
.
d0 <- DGEList(count_mat)
d <- calcNormFactors(d0)
mm <- model.matrix(~0 + group)
y <- voom(d, mm, plot = TRUE)
fit <- lmFit(y, mm)
contr <- makeContrasts(grouptreated - groupuntreated, levels = colnames(coef(fit)))
tmp <- contrasts.fit(fit, contr)
tmp <- eBayes(tmp)
top.table <- topTable(tmp, sort.by = "P", n = Inf)
This works well but I need to customize normalization methods because we are not working on RNA-seq data but other sequencing data and we intend to explore different normalization methods. So, I am wondering if I can prepare y
by myselft but not with voom()
, and using my customized y
as the input of lmFit()
.
I make the following attempts: if lmFit just uses the normalized reads from d0
which I think should be y$E
, then it should work if I perform fit_new <- lmFit(y$E, mm)
, and fit
should be the same as fit_new
. However, they are not equivalent. So, here are my questions:
- If I have a customized normalized reads matrix -- each row is a gene, and each column is a replicate under a condition -- how I can wrap it up as a
lmFit
accepted object. - Why
fit_new <- lmFit(y$E, mm)
andfit <- lmFit(y, mm)
are not equivalent?
Besides, I am also wondering if I use the default settings of voom()
as the script I showed above, what exactly normalization voom()
has performed? At the beginning, I believed that it is log2 CPM with prior count=0.5. However, I found that y$E
(from y <- voom(d, mm, plot = TRUE)
is different from cpm_d <- cpm(d, log=TRUE, prior.count=0.5)
(both d
here are from d <- calcNormFactors(d0)
), so I am wondering if it implies that the default settings of voom()
is different from log2 CPM with prior count=0.5.
Finally, could anyone reveal the math equations of the normalization methods of the default settings of voom()
and cpm(d, log=TRUE, prior.count=0.5)
so that I can understand more about it. I checked the source code of cpm()
but it just show out <- .Call(.cxx_calculate_cpm_log, y, lib.size, prior.count)
and I cannot find .cxx_calculate_cpm_log
.
Thank you very much!