I've read and been told that edgeR must take in counts only. I noticed that the RPK
values are being fed into the TMM
normalization procedure. Is this a correct usage assuming all of the assumptions?
Should this be used for downstream DGE analysis?
Note: I am no expert with these methods but I just wanted to ask the community
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2246-7
# calculate RPK
rpk <- (x[,2:ncol(x)]/x[,1])
# remove length col in x
x <- x[,-1]
# for normalization purposes, no grouping of samples
group <- c(rep("A",ncol(x)))
#EdgeR
x.norm.edger <- DGEList(counts=x,group=group)
x.norm.edger <- calcNormFactors(x.norm.edger)
norm.counts.edger <- cpm(x.norm.edger)
#GeTMM
rpk.norm <- DGEList(counts=rpk,group=group)
rpk.norm <- calcNormFactors(rpk.norm)
norm.counts.rpk_edger <- cpm(rpk.norm)
# Source:
# https://static-content.springer.com/esm/art%3A10.1186%2Fs12859-018-2246-7/MediaObjects/12859_2018_2246_MOESM4_ESM.docx
For DGE, use raw counts, like the software demands. Other normalizations can be used for things like visualizations.