Let's say you perform a large number of statistical tests based on some high-throughput screen. Since you are afraid that some of the resulting p-values could have arisen by chance, you perform FDR-correction and only proceed with those whose p-values are < 5% likely to be wrong. Subsequent statistical tests applied to the experimental follow-up results then gives that X number of those genes indeed had p-values below your desired threshold. The question then becomes: should one stop here and consider the original screen results for these particular genes to be successfully validated; or, should one also perform FDR-correction on the validated p-values?
Yep, you should FDR adjust for the number (X) of hypotheses tested in your follow-up experiment.
Re "Subsequent statistical tests applied to the experimental follow-up results then gives that X number of those genes indeed had p-values below your desired threshold.". Does this imply that you are re-checking the expression changes in the same set of samples?
Sorry it is hard to follow...
What kind of subsequent statistical tests are you referring to?
A concrete example that may clarify what I mean:
That the original screen was an shRNA dropout screen, where the changes in the abundances of shRNAs that mediate knockdown of various genes were assessed between two time points. According to this screen, shRNAs targeting Gene 1,2,3,4 and 5 were significantly (FDR<0.05) depleted from the cell population at the later time point. This indicates that these genes may be essential to this cell type (knockdown reduces survival, and thus the corresponding shRNAs are under negative selection).
To validate this, follow-up experiments were designed to only knock down Gene 1,2,3,4 and 5, either using one of the original shRNAs targeting those genes or other ones that have been confirmed to be efficient in successfully mediating knockdown. The validation experiment had a different setup and only compared the decrease in growth rate following knockdown of a given gene. This experiment then found that some of the genes assessed in this way indeed yielded significant decreases in growth rate, according to a raw p-value threshold of 0.05. Would this be enough to conclude that the original screen hits were true, or should those validation p-values also be corrected?
The reasoning is that if the original p-values of the large scale screen were falsely significant due to the multiple testing problem, then even raw p-values of the corresponding genes in the validation experiment should not be significant by chance.
OK, thanks for the clear explanation!
Your second (validation) experiment must be seen as a new experiment, so with a new null hypothesis and so on. The null hypothesis will probably be the same as your first one (hence no difference). If your second experiment again is high-throughput, then yes you'll need to correct for multiple testing again. If you only look at a few genes (say your 5 genes of interest) then it is not really necessary.
I hope this will explain a bit?
Ok. In this case the validation experiment only considers 5 genes. So strictly speaking, multiple hypotheses are being tested and additionally the experiment uses another setup, so the null hypothesis would be different. I guess then it would be appropriate to correct for multiple testing.