Question

When Is A Heterozygous Call Not Heterozygous?

2

Entering edit mode

11.8 years ago

Chris Cole ▴ 800

I've been 'doing' whole exome sequencing for a few months now, but I'm still unclear on the reliability of zygocity calls.

From simple genetics a het should be 50:50 for each allele and a hom should be 100:0. However, as I'm sure you're all aware, in WES you rarely get these ratios. So at what point do we say that a het call is unreliable? 30:70 or 40:60? Also, when does a het become a hom? 75:25 or 80:20?

How do others deal with this uncertainty when communicating to clinicians or geneticists?

exome human variant-calling • 5.7k views

ADD COMMENT • link updated 10.7 years ago by Biostar 20 • written 11.8 years ago by Chris Cole ▴ 800

0

Entering edit mode

Are these tumor samples? if so, issues of ploidy and purity come into play and often result in skewed frequencies. If not, Istvan's answer below is solid.

ADD REPLY • link 11.8 years ago by Chris Miller 22k

0

Entering edit mode

Nope, not cancer related.

ADD REPLY • link 11.8 years ago by Chris Cole ▴ 800

0

Entering edit mode

I suggest calling you alleles, with a minimal allele freq threshold (let's say minimum 10%). Then plot the allele frequency of all SNPs, you should get a bellcurve with the mode at 50%. it will tell you how streched your bellcurve is. I personally use 35% as a cutoff typically, but I do think it depends on the data.

ADD REPLY • link 10.7 years ago by apelin20 ▴ 490

0

Entering edit mode

Just to add to what I said earlier, here is a publication "http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003215#s2" where in Figure 3 they do an allele frequency plot of a tetraploid, so in that case your 50/50 peak would start at 0.4 and finish at 0.6.

ADD REPLY • link 10.7 years ago by Adrian Pelin ★ 2.7k

score 6 · Answer 1 · 2014-02-06

6

Entering edit mode

11.8 years ago

Istvan Albert 103k

Technically the variant caller should produce a probability value that indicates the likelihood of each genotype. This works well in many cases (speaking on a large scale) but individually I always see issues where I can't figure out why a given call was made as such. Typically these involve insertion/deletions nearby.

ADD COMMENT • link 11.8 years ago by Istvan Albert 103k

0

Entering edit mode

Thanks for the reply.

Would you say they're reliable? How do you interpret the probability?

ADD REPLY • link 11.8 years ago by Chris Cole ▴ 800

0

Entering edit mode

The probablity is usually a p-value indicating the likelihood of observing the event by random chance.

The reliability is more difficult to asses. The distribution of errors is usually not governed by true random chance (and SNP callers try to account for that), what this means is that errors will always occur in the similar positions rather than being uniformly distributed. This is because different parts of the genome have different mappability levels (easier or more difficult to align to due to repetitiveness and other factors).

Brad Chapman has a blog called Blue Collar Bioinformatics with a series of posts on the reliability of SNP calling http://bcbio.wordpress.com/

It is a great resource to learn more about the challenges of validating these calls.

ADD REPLY • link 11.8 years ago by Istvan Albert 103k

0

Entering edit mode

Ah, thanks. I'd forgotten about Brad's blog. Now bookmarked.

Will look into in more detail.

ADD REPLY • link 11.8 years ago by Chris Cole ▴ 800