Hey, guys.
I am used to analyse data from raw reads. Now I got a table on GEO Datasets containing CPM normalized counts by EdgeR. Can I proceed normally from it without calcNormFactros()
, right?
Like
> x <- read.delim("Table.csv",row.names="Gene")
> group <- factor(c(1,1,1,2,2,2))
> y <- DGEList(counts=x,group=group)
> design <- model.matrix(~group)
> y <- estimateDisp(y,design)
> et <- exactTest(y)
Thank very much for any help
You need raw counts for edgeR and DESeq2, primarily because they normalize for library size. Your best bet would be to reanalyze the data if they don't have raw counts.
Thank you for your answer.
Acctually the data is already normalized by EdgeR. So it is normalized by library size. The only difference is that they provided the CPM table, not the raw table.
Even so should I reanalize the data?
Thanks again
Hi,
If you have CPM normalized counts from edgeR, then it should be already normalized for library size. You do not need to again calculate
calcNormFactors
you can see here https://reneshbedre.github.io/blog/expression_units.html#tmm-trimmed-mean-of-m-valuesYes you should reanalyze the data. CPM is considered as simple summary stats, but
calcNormFactros()
does much more. This is whatcpm()
do:Compare to what
calcNormFactros()
do:Indeed TMM from
calcNormFactros()
output is a sort of between-sample normalization method which is very important for differential expression analysis, while CPM provides within-sample normalization stats.The data was normalized using
calcNormFactors()
. That is, the CPM is based on normalized values. The only difference is that I do not have the raw data to perform the whole process, only the CPM table.In short, what I want to know is how to use teh CPM table in EdgeR. Is it possible? Can I simply use the basic code that I wrote above.
Thank you again