Entering edit mode
4.1 years ago
curious
▴
820
Lets say you had 100,000 samples, but were looking at some super rare diseases. At some point I would think even given the huge overall sample size that the number of cases makes interpretation hard, even if some really small p values are obtained via GWAS.
is 5 cases 99,995 controls ridiculous?
is 10 cases 99,990 controls ridiculous?
is 100 cases 99,900 controls ridiculous?
Is there an industry standard for this limit? If not how do you tell? People are often interested in the really rare stuff, so someone has to have thought about this.
It depends on the proportion of disease-associated variants and of their effect size. Look for papers discussing this.
Why are you doing an association study? It's only ever going to give you a hint of a correlation; so these numbers are about confidence in statistics. Are you really collecting 100,000 genomes? No, I think the more common thing is to allocate 100,000 dollars and then get as many cases as you can find!
It was just a dummy example to illustrate the idea. I was thinking more about biobanking where you cant just pay for more cases