Hi,
I am a Ph.D. student in bioinformatics with backgrounds in biochemistry, medicine, and statistics and I wonder, in regular differential analysis, why isn't it popular to find case-control comparisons to find the probes/genes associated with the disease before carrying out case-case design?
Here is a scenario, If I have raw counts of 30,000 probes, the case-control comparison could give me 8,000 associated with the disease. I would then filter my raw counts based on the 8,000 probes to perform case-case comparison (for example, tumor grade 3 vs tumor grade 1). my rational is the 22,000 probes eliminated by case-control would help me remove confounding variables which could pop significant but not associated with the disease. However, most studies opt for case-case right away even if controls are available. Articles have suggested cost cost and other factors.
I feel this is better as it would even help analytic tools like limma-voom models when fitting as the probes associated with the disease would have a uniform distribution.
Since I am training, I would like to hear from experienced bioinformatician on this.
Thanks