Question

How many reads needed to be sure a variant does not exist?

0

Entering edit mode

7.2 years ago

steve ★ 3.5k

We are doing targeted exome sequencing with variant calling, and need to determine the minimum number of reads required to be 95% certain that a variant is not present in a given target region. How do you typically do that?

I was thinking some type of power analysis, but wasn't sure what values to use for which parameters.

Are there other ways to do this?

variant calling • 1.5k views

ADD COMMENT • link updated 7.2 years ago by Gabriel R. ★ 2.9k • written 7.2 years ago by steve ★ 3.5k

2

Entering edit mode

This has been done: https://www.ncbi.nlm.nih.gov/pubmed/23773188

ADD REPLY • link 7.2 years ago by WouterDeCoster 48k

0

Entering edit mode

Thanks for the link. Just curious, did you have this on hand, or did you find it on Google? I spent some time trying to Google this topic but did not find this paper, must have been using the wrong keywords :)

ADD REPLY • link 7.2 years ago by steve ★ 3.5k

0

Entering edit mode

I knew it existed, from a couple of years ago. I faintly remembered the first author's name. No idea how I found it originally.

ADD REPLY • link 7.2 years ago by WouterDeCoster 48k

score 4 · Answer 1 · 2018-03-01

I assume you are talking about heterozygous sites. If we live in a perfect world where we sample each chromosome with p=1/2, at a coverage of 5X, you will only observe one particular allele with p=(1/2)^5 = 0.03125. So only observing one allele but not the other is twice that.

However, having observed a divergent base is not an immediate indicator of a variant. It could be due to mismapping, sequencing error, residual adapter, some contamination etc.

Genotypers are usually equipped to quantify your belief in a particular genotype versus another. I suggest that you look at genotyping output at various coverage and think about the confidence you want for a particular genotype.