Hello, I have RNAseq data with a major perturbation where we expect a large number of genes (>50% of the dataset) to be differentially expressed. I'm aware that this can be a problem for TMM normalization which assumes that the majority of genes are not DE between samples. However, I haven't seen any other approaches suggested for when this assumption isn't met.
I suspect it may be necessary in our data because, in addition to the large perturbation, we observe systematic differences in the normalization factors between conditions that are independent of library size. This leads to sign-changes where the cpm data show genes as down-regulated but the normalized data show the the genes as up-regulated. I understand this could be produced by outlier genes (what TMM is trying to correct for), but given the genome-wide differences I'm unsure if applying the correction here is appropriate.
Any advice or suggestions regarding a better way to normalize datasets with a major perturbation?
This is a great (AT)point. No need to go all the way to pathway analysis (4. in my answer below) when routine aspects of the QC pipeline should be giving related information, so long as the analyst is comfortable interpreting it.
Awesome, thank you both for the advice!