Dear all,
I would like to perform a meta-analysis on DNA mutations in cancer using two completely independent cohorts and my aim is to identify events that occur more often in a subgroup of samples compared to controls.
My analysis is based on a Fisher test to find the mutations occurring significantly more often in the subgroup, which is performed separately for each gene, thus I am subsequently correcting p-values by FDR.
This analysis is currently performed independently for each cohort, so I end up with two lists of genes that are mutated in the subgroup significantly more often.
To 'integrate' both cohorts, my idea was to simply select genes with a significant FDR (say q < 0.05) in both analyses and then see how big the overlap is. However, since the second cohort is suffering from low sample size and I might lose a lot of real events, I was also thinking of a more direct way of validating, by using cohort 1 as discovery and cohort 2 as validation data set.
This is where my question comes in:
If I select genes by FDR in cohort 1 (q < 0.05) and then only test these genes in cohort 2 to reduce the number of tests, do I still need to do a p-value correction for the results of my validation? After all, I would expect most of these genes to be indeed different and if I am not mistaken, FDR / Bonferroni would in this case throw out too many true positives to control the number of false positives.
Any help is greatly appreciated!