Question

How Many Samples At Least Are Needed To Discover New Snps For Statistical Significance

2

Entering edit mode

12.4 years ago

frewise ▴ 30

Hi, As the title shows, how many samples at least are needed to discover new SNPs for statistical significance? And how many samples are at least needed for validating previous SNPs? Thanks!

snp • 4.8k views

ADD COMMENT • link updated 11.0 years ago by Biostar 20 • written 12.4 years ago by frewise ▴ 30

0

Entering edit mode

Are you talking about snp-calling from DNA-Seq or RNA-Seq? Sorry, but how are these two connected? Or are these samples related to accessions in GWAS?

ADD REPLY • link 12.4 years ago by Arun 2.4k

0

Entering edit mode

I'm talking about snp-calling from DNA-Seq. And I'm not sure about your other 2 questions.

ADD REPLY • link 12.4 years ago by frewise ▴ 30

0

Entering edit mode

That's what I thought. However the current answer seems to be for GWAS (due to ambiguity in the question), if I am right..? As for my 2nd question "how are these two connected"? What do you mean by samples? do you mean biological replicates? If so, what does that necessarily have to do with SNP calling, in your opinion?

ADD REPLY • link 12.4 years ago by Arun 2.4k

1

Entering edit mode

In my opinion, the type of technology/assay does not really change the basic requirements regarding sample sizes. If anything, using DNA-Seq for global novel SNP discovery would increase the number of samples required before reaching statistical significance as you can measure many more variables than e.g on a 1M SNP chip.

The minimum sample size strongly depends on the design of your study: Are you looking for SNPs in a population of unrelated individuals? Is it a family study? Do you have candidate genes/genomic regions? Are you searching genome-wide? What are your SNPs associated with? A disease phenotype which could be binary, I.e. sick/healthy? Or is it a phenotype encoded by continous variables, e.g. BMI or blood pressure?

I think frewise would be well advised to seek some help from a statistician (which I am not) in the design of the experiments - those are the people he/she will most likely turn to when the real data analysis phase begins and in my experience statisticians hate to work with data from poorly designed experiments as everyone is looking for significant p-values which they then won't be able to provide... :-)

ADD REPLY • link 12.4 years ago by Sebastian Kurscheid ▴ 300

score 3 · Answer 1 · 2012-07-09

As with so many other questions, the answer has to begin with the words "It depends...".

Unfortunately the number of things the answer depends on is much larger than the information given in your questions, therefore I would refer you to e.g. this paper in PLoS Genetics from 2009, which did some retrospective comparison of large GWAS studies and also simulations for helping with the estimation of sample sizes required for detection of association between genotype and phenotype: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000477

If you could provide more details on your planned study, I am sure that other posters would be able to provide you with more helpful answers.