The way I understand things, normalization (such as in DeSeq2, EdgeR, etc.) serves two purposes: 1) Model the "real" abundance in the original samples from the read counts, 2) Make the abundance distributions conform to the needs of statistical analysis by removing heteroskedasticity, dependence, dispersion, etc.
It has been stated many times here that it is very difficult to reproduce the fold-change you get from DeSeq2 by extracting the normalized counts, but you can come close. Taken at face semantic value, "fold change" sounds like it should refer to the ratio of the "real" abundance; the fold change of "actual" expression or "actual" community representation.
So if the normalized (or normalized + VST, or normalized + MLE) abundance better represents the "real" abundance, then shouldn't I use the normalized counts for ALL of my analysis steps:
- Alpha diversity
- Beta diversity
- F2B ratio
- IgA sorting analysis
- Other regression analysis
- etc.
Hi ariel, did you get any further?
My take is "Mostly yes, but it depends". I do use the normalized counts for pretty much everything, especially when comparing samples. When you look at each sample individually, there's no harm to use the original count data. Important is to know about the limitations and compare results when in doubt.