When Is A Heterozygous Call Not Heterozygous?
1
2
Entering edit mode
10.8 years ago
Chris Cole ▴ 800

I've been 'doing' whole exome sequencing for a few months now, but I'm still unclear on the reliability of zygocity calls.

From simple genetics a het should be 50:50 for each allele and a hom should be 100:0. However, as I'm sure you're all aware, in WES you rarely get these ratios. So at what point do we say that a het call is unreliable? 30:70 or 40:60? Also, when does a het become a hom? 75:25 or 80:20?

How do others deal with this uncertainty when communicating to clinicians or geneticists?

exome human variant-calling • 5.1k views
ADD COMMENT
0
Entering edit mode

Are these tumor samples? if so, issues of ploidy and purity come into play and often result in skewed frequencies. If not, Istvan's answer below is solid.

ADD REPLY
0
Entering edit mode

Nope, not cancer related.

ADD REPLY
0
Entering edit mode

I suggest calling you alleles, with a minimal allele freq threshold (let's say minimum 10%). Then plot the allele frequency of all SNPs, you should get a bellcurve with the mode at 50%. it will tell you how streched your bellcurve is. I personally use 35% as a cutoff typically, but I do think it depends on the data.

ADD REPLY
0
Entering edit mode

Just to add to what I said earlier, here is a publication "http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003215#s2" where in Figure 3 they do an allele frequency plot of a tetraploid, so in that case your 50/50 peak would start at 0.4 and finish at 0.6.

ADD REPLY
6
Entering edit mode
10.8 years ago

Technically the variant caller should produce a probability value that indicates the likelihood of each genotype. This works well in many cases (speaking on a large scale) but individually I always see issues where I can't figure out why a given call was made as such. Typically these involve insertion/deletions nearby.

ADD COMMENT
0
Entering edit mode

Thanks for the reply.

Would you say they're reliable? How do you interpret the probability?

ADD REPLY
0
Entering edit mode

The probablity is usually a p-value indicating the likelihood of observing the event by random chance.

The reliability is more difficult to asses. The distribution of errors is usually not governed by true random chance (and SNP callers try to account for that), what this means is that errors will always occur in the similar positions rather than being uniformly distributed. This is because different parts of the genome have different mappability levels (easier or more difficult to align to due to repetitiveness and other factors).

Brad Chapman has a blog called Blue Collar Bioinformatics with a series of posts on the reliability of SNP calling http://bcbio.wordpress.com/

It is a great resource to learn more about the challenges of validating these calls.

ADD REPLY
0
Entering edit mode

Ah, thanks. I'd forgotten about Brad's blog. Now bookmarked.

Will look into in more detail.

ADD REPLY

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6