Hello all,
To normalize my read count data I used 2 different approaches:
1) normalized them with DESeq2 and then transformed them to log2 format.
2) I only transformed the read count data to log2 format (without normalizing using DESeq2).
I realized that the outputs are pretty much the same!! Can any one tell me why is that so? I am confused if the output is the same then why we use DESeq2 to normalize them? why not only do log2 transformation??
This is part of my read count data:
WT1 WT2 WT3 ACA1 ACA2 ACA3
ENSMUSG00000022857 61 27 54 733 238 332
ENSMUSG00000094478 1 321 0 0 2 0
ENSMUSG00000096410 1225 1319 648 126 32 119
1) I normalized them using DESeq2 and then transformed to log2:
my script:
cds <- DESeqDataSetFromMatrix(read.count[,-1], colData, design = ~group)
dds <- estimateSizeFactors(cds)
normalized_df <- counts(dds, normalized=TRUE)
normalized_df.log <- log2(normalized_df+1)
this is part of the output after normalizing by DESeq2 and transforming to log2:
WT1 WT2 WT3 ACA1 ACA2 ACA3
ENSMUSG00000022857 5.9533944 4.821842 5.792608 9.524640 7.902013 8.380811
ENSMUSG00000094478 0.9995925 8.345891 0.000000 0.000000 1.585730 0.000000
ENSMUSG00000096410 10.2589289 10.381332 9.353513 6.993656 5.045510 6.908315
2) This is the result after only doing log2 transformation (without normalizing using DESeq2):
WT1 WT2 WT3 ACA1 ACA2 ACA3
ENSMUSG00000022857 5.954196 4.807355 5.781360 9.519636 7.900867 8.379378
ENSMUSG00000094478 1.000000 8.330917 0.000000 0.000000 1.584963 0.000000
ENSMUSG00000096410 10.259743 10.366322 9.342075 6.988685 5.044394 6.906891
Many thanks!
DESeq2 normalizes for library depth. If your samples are all from the same library, normalization may not have a pronounced effect. Could that be the case here?
no they are from different library, because the sum of readcount for each sample is different.
Its not cpm normalization where the count is divided to per million reads, it is more normalized with size factor, to convert non-normally distributed data to normally distributed data. As the data is normal distribution to begin with, may be the size factor is close to one! You can obtain the normalizing factor by dividing counts with normalized counts.
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Why exactly you want to normalize counts by log2?
Can you also print output counts after normalization from DESeq2.
This is my data after normalization using DESeq2:
As stated below, the counts after normalization looks similar to original counts due to normal distribution of the original data.
Also I normally prefer using vst normalized data for pca or any other processing (like WGCNA)