Hi everyone,
I've got some problems understanding/deciding if I should normalize samples from multiple conditions together or not.
The situation is as follows: I've got 9 RNA-seq libraries from 3 conditions, a, b and c (each in triplicates). I'm interested in differential gene expression for a vs b and a vs c.
Usually, I normalize all samples together using edgeR and the TMM method, then apply the exactTest(a,b)/exactTest(a,c). Thus, I get the same normalized expression values from both comparisions that I can use also for, e.g., cluster analyses. Furthermore, I thought, that variances are better estimated using more samples.
However, FDR (BH) adjusted p-values tend to be far worse, when normalizing all samples together, although for DE-testing, I just use samples from the respective 2 groups for a pairwise test.
Could anyone briefly explain why this is the case or if I made a logical mistake here? Thanks in advance!
Can we just get a bit more information on the nature of the data?
From what you have given so far, this might help: within/across sample/dataset normalisation. But, you may already know it all (it helped me). If you want to compare the absolute values between 2 samples ...you need across sample normalisation i.e. tmm (edgeR), vst/rlog (deseq2).
Hi BioinfGuru
regarding your questions:
My questions is, why do adjusted p-values differ so much, when I normalize all 9 samples together and then test, e.g., a vs b in comparison to normalizing only the 6 a and b samples together and then perform the DE-test? And which strategy should be used?