Hello all,
I've seen people recommend repeating DESeq analysis if you want to do study multiple contrasts for a single factor, for example: A: DESeq2 compare all levels
However, in the above, aren't you rerunning hypothesis tests repeatedly? Why isn't there an additional step of p-value adjustment after you finish running all the tests (each of which has its own p-value adjustment to account for within-test multiple hypothesis testing correction)?
Is there something different about (running 5 different tests on 5 different genes) and (running 2 tests of 5 genes apiece)? Should we as biologists refrain from using an individual test's adjusted pvalue, and instead run multiple hypothesis testing correcting on all the unadjusted p-values in all the test, taking into account that multiple tests have been performed?
Also, does the above impact whether or not to examine intersections of gene lists? For example, say your model was:
design=~condition
and the conditions were Control, treatment1, treatment2, treatment3. You then make the following contrasts:
results1 <- results(dds, contrast=c("condition", "Control", "treatment1"))
results2 <- results(dds, contrast=c("condition", "Control", "treatment2"))
results3 <- results(dds, contrast=c("condition", "Control", "treatment3"))
Let's say you want to understand what genes are statistically significantly unregulated in treatment1 as well as treatment2. Would it be appropriate to look at the intersection of results1 vs. results2 to address that question? Or is that bad form because each DE gene list was generated assuming statistical significance for that specific contrast alone? Or - would looking at the intersection actually help understand the "true DE genes" in results1 and results2 by eliminating genes erroneously called DE due to the results of multiple hypothesis testing?
Thank you very much for your help!