Hello,
I am using both edgeR and DESeq2 to normalize raw counts (it's not RNA-seq data or 16S amplicon seq data...but it is amplicon seq data). I just need to normalize them before creating a visualization. It's preliminary work; so the parts of these packages that calculate differential expression are not useful to me.
I have two sets of scaling factors (from edgeR using the TMM and RLE methods). My question is what is the correct approach for applying these scaling factors to my raw counts. Is it:
raw count / scaling factor
or
raw count / (library size * scaling factor)
I've been researching these methods and so far I have seen it both ways. I'm still not sure how to just get normalization factors from DESeq2, as I just got that package installed yesterday evening. But I've kept the DESeq2 tag because the question applies to both and if anyone has advice regarding DESeq2 that could be helpful to me and others.
Rookie question: the dispersion calculation would make sense for evaluating DE, not as part of the normalization, right?
Thanks for the help.
Thank you! That was very helpful.
@Kevin: Is this method still valid for scale factors generated by upper quartile or scaled median normalization? Are RLE and median of ratios described in your link the same calculation? Same question for median and scaled median methods?
I cannot say that each normalisation method just involves a division by a particular size factor - each has a different formula that may or may not involve a 'size factor'.
From what I understand, the median ratios method is an extension of RLE, and is currently the method used by DESeq2, as per the link that I gave. For 100% clarification, would suggest re-posting your question on the Bioconductor forum where the DESeq2 developers are more likely to respond.
A good practice would be to calculate the size factors manually and then via DESeq2, and then you'll have empirical evidence of how exactly it works.