Differential Expression for a predefined one or multiple genes and multiple testing
1
0
Entering edit mode
6 months ago

My question is about Multiple-testing in statistics and Deseq2-related:

Let's imagine the following scenario:

  1. I did a differential expression, using Deseq2, for all genes (as usual) in a specific tumor (lung) and got one biomarker, ex. ACT8, based on the p-adjusted <0.05
  2. If I want to confirm that this marker is differentially expressed also in (stomach), and I performed another Deseq2 analysis, should I consider the p-adjusted or only the p-value<0.05 for the ACT8?

In other words, does the specificity of my scientific question, based on solid evidence that a marker or set of markers was significant in another tumor type, limit the strictness of multiple testing to only the one gene or set of genes of the scientific question?

Deseq2 Multiple-testing Differential-Expression • 1.2k views
ADD COMMENT
1
Entering edit mode

If I get you correctly, you're asking whether you should correct all genes for multiple testing or only a subset, or in an extreme case only the genes you care about.

Basically, I (not being a statistician at all) think that you should correct with the genes that went into the analysis. The power of DESeq2 and tools like it comes from the fact that it uses the shared information across many genes to accurately estimate variance across the full range of average expression values. Without many genes it could not generate this power. Then later cherrypicking which genes go into MT correction seems inaccurate to me. I think you can be filter a bit, for example only protein-coding to lower MT burden a bit, but selecting a handfull of genes seems off to me. Not very scientific comment, I realize this.

ADD REPLY
2
Entering edit mode
6 months ago
Gordon Smyth ★ 8.2k

This question has been asked many times on the Bioconductor support forum in the context of limma and edgeR analyses, for example:

If you are doing a validation analysis where you are only interested in validating differential expression of a limited number of pre-specified genes, then you should still conduct the linear modelling and empirical Bayes analysis on the whole universe of genes, but you only need to apply multiple testing to the genes of interest.

In limma or edgeR, this is easy. You just conduct the full analysis as usual, then subset at the final step when applying multiple testing by

topTable(fit[genesofinterest,])

or

topTags(fit[genesofinterest,])

If there is only one gene of interest, then you would just be looking at the unadjusted p-value.

The genes of interest must be pre-specified. Choosing the genes from the same dataset would be double-dipping or cherry-picking.

These statistical principles would also apply to DESeq2, but I don't know enough about DESeq2 to say how to implement it.

ADD COMMENT
0
Entering edit mode

Dear Dr. Gordon, I am so grateful for that answer.

As I am almost submitting my work based on the step you mentioned, is there a reference I can send to the reviewer to apply the p-adjusted only on the targets I got from a prior independent test?

Thanks again for your time; it helped a lot in finalizing the project.

ADD REPLY
1
Entering edit mode

The advice I've given you follows from the basic principles of multiple testing but I don't know of any reference that discusses your context specifically. For the purposes of a response to reviewers, you could refer to the discussion here and on Bioconductor. You have to convince the reviewer that there isn't some hidden cherry picking in your procedure, for example trying out the lung marker genes in several other tissues but only reporting stomach because they were significant for stomach.

ADD REPLY

Login before adding your answer.

Traffic: 2282 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6