Hi there, I'm analyzing some WES data and it appears like there is a specific variant (same gene and at the same position) variant called across all samples. I'm not certain of the best method for determining how likely this would occur by random chance. From my understanding, we could something like 1 - (1 - <mutation rate> <number of samples>) <genome length> but it's not clear to me if this would work. I think the genome length is too high of a number because we can't observe every locations in the genome.
Thank you for your help!
Not sure what you mean by random chance here.
Only errors (and not even those) could perhaps be modeled as a stochastic process. The chance that random sequencing errors add up to the same SNP in multiple samples is probably very very low (ok...zero)
That multiple samples would show the same valid SNP have to do solely with the allele frequency of a variant in a population. The more frequent the variant the more likely that all samples have it.
The chance that two separate genomes develop the same denovo mutation is again very, very low.
Hi there, thanks for responding! Yes, I assume that seeing the same SNP is very low by random chance. However, I'm really after trying to determine how significant finding the same SNP across multiple samples is and how that could be calculated? I'm considering using MuSiC but wondering if there a better method.
The ethnic backdrop of your cohort is important to consider, too.