why is this example homozygous 0/0 ?
1
0
Entering edit mode
8.6 years ago
fernardo ▴ 180

Hi,

An example of aligned reads as this image:

enter image description here

After variant calling, in the VCF there are many GT: 0/0 including the example above (red rectangle) which is A --> C and GT: 0/0.

Can someone explain why such example is "Homozygous reference 0/0" while there is both A and C in the non-reference(reads) ? shouldn't it be "Heterozygous 0/1" ?

Note: my only idea is that the Variant Caller doesn't count for the Cs through whatever reason or calculation and assumes there is only A that is why called it as 0/0.

Thanks

SNP • 3.0k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

good point. reverse checking when you have an already confirmed particular variant helps to understand how the variant calling process works. trying to know why an already known variant may not be called is usually quite tricky.

ADD REPLY
0
Entering edit mode

Thanks. Yours is also a nice point to learn from.

ADD REPLY
0
Entering edit mode

What variant caller are you using and with what settings? Could be that the threshold for calling a hetrozygous call is >0.25 and when you have less than 25% of non reference alleles the call usually will be homozygous reference...

ADD REPLY
0
Entering edit mode

I used "Freebayes" with no option (default):

freebayes -f ref.fa aln.bam >var.vcf

But I also used GATK which didn't return any SNP with 0/0.

ADD REPLY
0
Entering edit mode

What version of "freebayes" did you used to call vcf? I want to call 0/0 with freebayes but I don't get any 0/0s with same option with freebayes-5d5b8ac0

ADD REPLY
3
Entering edit mode
8.6 years ago

you shouldn't be surprised if a minimal variant evidence is not considered by the variant caller as enough evidence to change from 0/0 to 0/1. that decision, plus the 0/1 to 1/1 decision, are the main reasons why you use a variant caller rather than a simple base counting script. you must remember that NGS is not a direct method like Sanger, but a massive one, hence errors can occur and estimations must be made. there are several reasons why a particular position may not be considered as a variant (base qualities, mapping qualities, strand bias,...), but the most appealed parameter to recall those close-to-the-threshold variants is the minimum_allele_frequency. you may play around with that variant caller's parameter or any others, but have in mind that it's perfectly feasible to have non-reference bases along your alignments due to the massive nature of the technique, yet be able to consider those positions as non-variant perfectly.

the reason why you may not see any 0/0 variants is that a variant caller reports only variants by default, and an homozygous reference is not considered as such. if you want the variant caller to report any particular or all homozygous reference positions you will have to use a genotyping method (GATK has it) rather than variant discovery.

ADD COMMENT
0
Entering edit mode

Thanks. I confirmed what I was expecting based on your answer. Yes for GATK, it is great but I don't disqualify "Freebayes" as if I will run Variant Filtering step after, then I may exclude 0/0s otherwise Freebayes rate drops down to me.

ADD REPLY

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6