Question

What's probability of the same SNP occurring across all samples in a dataset by random chance?

0

Entering edit mode

3.2 years ago

TheWonderer • 0

Hi there, I'm analyzing some WES data and it appears like there is a specific variant (same gene and at the same position) variant called across all samples. I'm not certain of the best method for determining how likely this would occur by random chance. From my understanding, we could something like 1 - (1 - <mutation rate> <number of samples>) <genome length> but it's not clear to me if this would work. I think the genome length is too high of a number because we can't observe every locations in the genome.

Thank you for your help!

SNP WES probability • 1.2k views

ADD COMMENT • link updated 3.2 years ago by lethalfang ▴ 160 • written 3.2 years ago by TheWonderer • 0

0

Entering edit mode

Not sure what you mean by random chance here.

Only errors (and not even those) could perhaps be modeled as a stochastic process. The chance that random sequencing errors add up to the same SNP in multiple samples is probably very very low (ok...zero)

That multiple samples would show the same valid SNP have to do solely with the allele frequency of a variant in a population. The more frequent the variant the more likely that all samples have it.

The chance that two separate genomes develop the same denovo mutation is again very, very low.

ADD REPLY • link 3.2 years ago by Istvan Albert 102k

0

Entering edit mode

Hi there, thanks for responding! Yes, I assume that seeing the same SNP is very low by random chance. However, I'm really after trying to determine how significant finding the same SNP across multiple samples is and how that could be calculated? I'm considering using MuSiC but wondering if there a better method.

ADD REPLY • link 3.2 years ago by TheWonderer • 0

0

Entering edit mode

The ethnic backdrop of your cohort is important to consider, too.

ADD REPLY • link 3.2 years ago by Kevin Blighe 89k

score 0 · Answer 1 · 2022-04-10

0

Entering edit mode

3.2 years ago

lethalfang ▴ 160

You need to know fraction of population with this SNP, and sometimes that can be close to 1 because the genome reference happen to have a rare variant instead of "wild type."

ADD COMMENT • link 3.2 years ago by lethalfang ▴ 160