Question

edgeR customized normalization for differential analysis

0

Entering edit mode

3 months ago

Wu-Sheng Zhang • 0

I am trying perform differential analysis with edgeR on sequencing data across two conditions, each of which has two replicates. I first try the following scripts learned from edgeR.

d0 <- DGEList(count_mat)
d <- calcNormFactors(d0)
mm <- model.matrix(~0 + group)
y <- voom(d, mm, plot = TRUE)
fit <- lmFit(y, mm)
contr <- makeContrasts(grouptreated - groupuntreated, levels = colnames(coef(fit)))
tmp <- contrasts.fit(fit, contr)
tmp <- eBayes(tmp)
top.table <- topTable(tmp, sort.by = "P", n = Inf)

This works well but I need to customize normalization methods because we are not working on RNA-seq data but other sequencing data and we intend to explore different normalization methods. So, I am wondering if I can prepare y by myselft but not with voom(), and using my customized y as the input of lmFit().

I make the following attempts: if lmFit just uses the normalized reads from d0 which I think should be y$E, then it should work if I perform fit_new <- lmFit(y$E, mm), and fit should be the same as fit_new. However, they are not equivalent. So, here are my questions:

If I have a customized normalized reads matrix -- each row is a gene, and each column is a replicate under a condition -- how I can wrap it up as a lmFit accepted object.
Why fit_new <- lmFit(y$E, mm) and fit <- lmFit(y, mm) are not equivalent?

Besides, I am also wondering if I use the default settings of voom() as the script I showed above, what exactly normalization voom() has performed? At the beginning, I believed that it is log2 CPM with prior count=0.5. However, I found that y$E (from y <- voom(d, mm, plot = TRUE) is different from cpm_d <- cpm(d, log=TRUE, prior.count=0.5) (both d here are from d <- calcNormFactors(d0)), so I am wondering if it implies that the default settings of voom() is different from log2 CPM with prior count=0.5.

Finally, could anyone reveal the math equations of the normalization methods of the default settings of voom() and cpm(d, log=TRUE, prior.count=0.5) so that I can understand more about it. I checked the source code of cpm() but it just show out <- .Call(.cxx_calculate_cpm_log, y, lib.size, prior.count) and I cannot find .cxx_calculate_cpm_log.

Thank you very much!

voom CPM normalization edgeR DGE • 504 views

ADD COMMENT • link updated 3 months ago by ATpoint 87k • written 3 months ago by Wu-Sheng Zhang • 0

score 1 · Answer 1 · 2024-12-22

voom indeed converts counts to logCPM with a constant prior of 0.5. In cpm() the prior is scaled to library size. The magic of voom is the precision weights, so a constant prior is not a problem.

For 1) If your counts are normalized and on log2-scale then simply use limma-trend, aka eBayes() (or treat) with the trend = TRUE option. The performance and inference is very similar to voom, see the limma papers.

For 2) Because the first one does not, but the second uses the precision weights from voom which are the actual magic behind the method.

For finally) see the documentation of cpm().