Question

SNps and calculation of sample size

0

Entering edit mode

6.1 years ago

efsvdo • 0

hi, this is a new area far from my expertise and I want to check if a specific gene has SNPs but I don/t have MAF or any information to set the cohort number so I am not sure where to start? how to calculate?

snps • 3.2k views

ADD COMMENT • link updated 6.1 years ago by pltbiotech_tkarthi ▴ 180 • written 6.1 years ago by efsvdo • 0

0

Entering edit mode

You can write dbSNP or ensemble and see if gene has SNP or not.

ADD REPLY • link 6.1 years ago by BAGeno ▴ 190

0

Entering edit mode

thanks for the answer. yes, I know the gene present like 5000 or so SNPs but they are not pathogenic and I 'd like to associate with a specific condition after. I was wondering if I can sequence the gene in my problem cohort and see what I can find but I am not sure what would be the sample size and there is no information at all about this gene and relationship with the condition. thanks again

ADD REPLY • link 6.1 years ago by efsvdo • 0

0

Entering edit mode

Nothing on incidence rate (of the disease)? You need to perform a power analysis in order ascertain an adequate sample size. Take a look here: A: Power Analysis for SNPs QTL GWAS

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

0

Entering edit mode

thanks. yes, the incidence of the disease is 24 % but I don't have info related to the SNPs. I want to sequence the gene in this population, I will probably find the SNPs already reported plus new ones and I want to see if there some association with the disease . most of the calculation ask me for a number I don't have for example the MAF, or the Disease Allele Frequency ratio or the Genotype Relative Risk ...

ADD REPLY • link 6.1 years ago by efsvdo • 0

0

Entering edit mode

24% is quite high (?). For a disease with that incidence, there must surely be a lot of research already conducted, or is it 24% in a specific population group? Using the tools that I listed in the other thread, you will be able to determine a suitable sample size. If allele frequencies are unknown, then determine sample sizes for different levels of statistical power and for different allele frequencies.

For example, "With X controls and Y patients, we will be able to detect a disease association signal at 5% alpha and 80% power assuming an allele frequency of 5%"

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

0

Entering edit mode

there is a lot of info about the disease ( infertility) and a lot aobut the gene but not together . Would you consider a MAf of 5% for the study population and 1% for the control a good start?

ADD REPLY • link 6.1 years ago by efsvdo • 0

0

Entering edit mode

I would just test varying combinations of allele frequencies for the population and controls, and come up with multiple different sample size estimates. That way, the reviewer, etc., cannot be too critical.

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Ok thanks . With this percentages the sample is acceptable even low around 300 cases per group . Is there a standard/ a basic or more frequently starter numbers ? to include that ones in between. thanks again

ADD REPLY • link 6.1 years ago by efsvdo • 0

0

Entering edit mode

I have seen 300 for a few studies. The larger studies have 1000s of samples,of course. There is no right or wrong answer, but obviously you cannot really do much with 10 or 20 samples.

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Regarding the MAF : Is there a standard/ a basic or more frequently starter numbers ? to include that ones in between.

ADD REPLY • link 6.1 years ago by efsvdo • 0

score 0 · Answer 1 · 2019-03-08

0

Entering edit mode

6.1 years ago

pltbiotech_tkarthi ▴ 180

You can try BLASTN at ENSEMBL to observe/retrieve related sequences and look for variants perhaps you could observe natural variants or EMS induced variants from ENSEMBL server. If you have VCF file, you can predict variants with SIFT score. Try this: A: Allele frequency visualization