Entering edit mode
1 day ago
p-radwan.derbala
•
0
My question is about Multiple-testing in statistics and Deseq2-related:
Let's imagine the following scenario:
- I did a differential expression, using Deseq2, for all genes (as usual) in a specific tumor (lung) and got one biomarker, ex. ACT8, based on the p-adjusted <0.05
- If I want to confirm that this marker is differentially expressed also in (stomach), and I performed another Deseq2 analysis, should I consider the p-adjusted or only the p-value<0.05 for the ACT8?
In other words, does the specificity of my scientific question, based on solid evidence that a marker or set of markers was significant in another tumor type, limit the strictness of multiple testing to only the one gene or set of genes of the scientific question?
If I get you correctly, you're asking whether you should correct all genes for multiple testing or only a subset, or in an extreme case only the genes you care about.
Basically, I (not being a statistician at all) think that you should correct with the genes that went into the analysis. The power of DESeq2 and tools like it comes from the fact that it uses the shared information across many genes to accurately estimate variance across the full range of average expression values. Without many genes it could not generate this power. Then later cherrypicking which genes go into MT correction seems inaccurate to me. I think you can be filter a bit, for example only protein-coding to lower MT burden a bit, but selecting a handfull of genes seems off to me. Not very scientific comment, I realize this.