Entering edit mode
3.8 years ago
mfshiller
▴
20
I have a dataset with about 700 cases and only about 100 controls. That means that my power to find relevant SNPs through normal gwas analyses is very, very small (less than 10%). How can I circumvent this? Is there any literature out there regarding this? Haven't been able to find much. Appreciate any suggestions.
Is this a population that is otherwise well studied? If a model organism, there's plenty of databases of variants you can filter with. Sometimes a case/control is just not going to work and you need a mendelian trio sort of study.
This is in humans. It's a complex trait so not really mendelian. Given the class imbalance, I'm not sure using logistic regression is advisable in this case. Fisher's exact test-based approaches should be more robust to class imbalance, so maybe that is the way to go.
Can you bring in the ESP6500 and assume they're generally unaffected individuals? Not matched controls, but if your trait is low frequency then a general pop dataset will have the controls genotype.
That's actually one of the solutions I thought about. Either: