Question

Forum:Specific question on torturing the data until it confesses

2

Entering edit mode

4.3 years ago

Aspire ▴ 370

One cannot use the same dataset both in exploratory data analysis, and both to test hypothesis based on that exploratory data analysis (that would be circular reasoning).

This means that one cannot play with the data to select the model that would fit his data best, and then declare his results as statistically valid.

However, a way to circumvent it was suggested to me. The person suggesting it claimed:

Software such as IPA receives a list of genes, and checks for gene-set enrichment. The results will be biologically meaningful if and only if the model we have used to generate our list of significantly DE genes is valid. The reason for this is that if we supply to such software (even a large) list of genes which is random noise, we will not see meaningful enrichment.

Hence, we can torture the data to get, say, a large set of DE genes between two conditions. At a latter stage we will "check if the confession is valid" by feeding the gene list to IPA, and check whether the results are meaningful.

Also, if one performs additional validation tests, then one could use them to check the truthfulness of "the confession" (the results)

What do you think about this claim? I don't think it's correct (and I will try to explain why below), but since I am not familiar with IPA, I would be happy for other opinions.

ipa gene-set-enrichment statistics • 1.5k views

ADD COMMENT • link updated 23 months ago by Ram 45k • written 4.3 years ago by Aspire ▴ 370

1

Entering edit mode

I'm not sure what different models are being used to generate different lists of DE genes, but every time you do that you'd have to correct for multiple testing, even at the enrichment stage

ADD REPLY • link 4.3 years ago by Jeremy Leipzig 23k

1

Entering edit mode

4.3 years ago

Aspire ▴ 370

Imho, this is invalid for two reasons : one is that "biologically meaningful" is not a well-defined term. Since it is not well defined, it can be confused with with "results that confirm the hypothesis we want".

Second, even if it were possible, the division of all possible cases into "random noise", and "biologically meaningful results" is a very gross oversimplification. Results can be partially meaningful, partially random noise. If one fits the model to the specific data at hand, and then checks for meaningfulness of, one cannot be sure what percentage of the data is random noise, and what percentage is biologically meaningful.

Regarding the possibility of validation through other means, that is always worthwhile; but that means one should treat the results one already has as exploratory data analysis, and not to use terms such as p-value when reporting its results.

ADD COMMENT • link 4.3 years ago by Aspire ▴ 370

0

Entering edit mode

I agree with your first point. Nonetheless, I do find "biologically meaningful" useful, although dangerous.

I didn't understand your second point.

I partially agree with your third point, see my answer bellow.

ADD REPLY • link 4.3 years ago by h.mon 35k

score 3 · Accepted Answer · 2020-12-29

One can perform hypothesis testing (i.e., use p-values) in an exploratory statistical analysis. The problem is not the use of p-value per se, the problem is confounding (and presenting) an exploratory analysis as if it was a hypothesis-driven analysis.

So it is perfectly fine to perform a differential gene expression analysis, use the p-values to select interesting genes and perform enrichment analysis with IPA, and so on. But keep in mind the analysis is exploratory: at best, you are searching for patterns in order to, later on, conduct a rigorous hypothesis-driven experiment, where you will test the hypothesis generated with the exploratory analysis in an independent data set, deciding in advance all the analysis steps you will perform.

A further question is: does researchers really act this way? My feeling is most researchers don't, for a plethora of reasons:

hypothesis-driven experiments can be really expensive and labor-intensive.
hypothesis-driven experiments are of higher risk, as there is publication bias against negative results.
pressure to publish drives researchers to fast exploratory analyses.