I want to create the normalized read count with the RLE method from edgeR or DESeq.
Below is the commands am using to normalize my raw read count for edgeR suite with RLE, but how to output that matrix with normalized counts? Am unable to find the object for which the table has to be called.
library(edgeR)
x <- read.delim("my_path/raw_counts.txt",row.names="symbol")
## filtering genes with low expression so filtering gene with a normal read count less than 50
keep <- rowSums(x)>50
x <- x[keep,]
dim(x)
norm_factors=calcNormFactors(as.matrix(x),method="RLE")
group <- c(rep("A",24),rep("B",25),rep("C",23))
y <- DGEList(counts=x,group=group,norm.factors=norm_factors)
design <- model.matrix(~group)
Now I want to extract the normalized count matrix from here. How do I do that?
Also the default normalization in DESeq is RLE method. I want to extract the normalized matrix by DESeq, by any one method I want to get the normalized matrix? how do I do that? below is the DESeq code
cds<-newCountDataSet(x, c(rep("A",25),rep("B",24),rep("C",23)))
disp<-estimateSizeFactors(cds)
disp1<-estimateDispersions(disp)
How do I now extract the matrix of normalized counts for my samples?
So can anyone help me how I can extract the normalized read count matrix from the above methods? It would be very helpful for me then.
Thanks @Devon Ryan, but however I see both giving different normalized read counts? Ideally it should be the same right? since the DESeq default is RLE method for normalization and am forcing in edgeR the same. So can you tell me why the two outputs are coming different? Is there anything wrong in my edgeR code for having normalized read counts with RLE method?
They implement RLE slightly differently, so the floating point arithmetic could lead to slight differences. Also, it looks like cpm() multiplies the the normalization factor by the library size, which is appropriate for TMM but probably not RLE. If you do the following in edgeR, are the results more similar to what you get in DESeq(2):
as.matrix(y$counts)/y$samples$norm.factors
?For DESEq with default RLE below is the normalized read count for a gene for just 3 samples from the matrix am showing
With the edge with
as.matrix(y$counts)/y$samples$norm.factors
Below is the raw read count for the same
It seems a bit different but quite nearby. I was expecting the output to be same for both edgeR and DESeq if the normalization is RLE . But this seems to be quite similar . What do you think @Devon Ryan?
Upon looking at what edgeR is doing internally, I don't see a simple way to get the equivalent of simple normalized counts as you would get from DESeq2. edgeR will get the same size factor/normalization factor, but then it divides it be the library size and recenters it around 1 before storing it, whereas DESeq2 just stores it as is. Having said that, cpm is also useful for graphing, which should be all that you're using normalized counts for anyway.