I'm new to a lot of this and I've been looking through the X and Y chromosome regions of a WGS vcf file.
I'm confused. How are variant calls made on X and Y?
I have a few examples:
X 144878186 C T 46.06 PASS AC=1 AF=1.000 AN=1 DP=21 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB 1:9 10:0.526:19:4 6:5 4:46:81 0:4.6058e+01 1.0769e-04:0.00 34.77:2 7 5 5:5 4 3 7
X 48435405 C T 47.92 PASS AC=1 AF=1.000 AN=1 DP=31 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB 1:14 15:0.517:29:5 8:9 7:48:83 0:4.7921e+01 6.9893e-05:0.00 34.77:7 7 7 8:10 4 12 3
Y 3714028 T C 44.78 PASS AC=1 AF=1.000 AN=1 DP=31 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB 1:15 15:0.500:30:7 8:8 7:45:80 0:4.4780e+01 1.4445e-04:0.00 34.77:4 11 8 7:8 7 9 6
How do the AC and AN values relate to the GT and the AF in that string of data? In the first example allele number is given as 1, and GT called as 1, yet the 2nd AF is given as 0.526.
In the 3rd example for Y the 2nd AF is given as 0.500. Why are AFs of ~0.5 called as heterozygous in autosomal chromosomes, but as homozygous single alleles in X and Y?
Many thanks for any help!
I feel like you have the answer in hand - males can't have heterozygous calls on X or Y
Indeed! I am a little confused about how they determine a call as being ref or alt when the AF (as Pierre pointed out is read ALT/(read ALT+read REF) is 0.500. There as many reads for ref as for alt. Shouldn't that be a no-call of some sort?
It's even more curious that the WES for the same individual gives hom and het calls for XX and some (off-target I assume) heterozygous calls for the Y chromosome! (Not in the PAR regions) ie. AN=2
I bet the cutoff to begin considering a variant to be called is very low (it's a depth of 2 reads and an AF of 0.05 in Freebayes). Hemizygous calling is kind of weird since by its very nature any genotype that isn't what you call must be an artefact.
I used to work with some human data that had "PAR-called-on-X" calls, where "PAR" means "Pseudo Autosomal Region". In that region you might get 0.5 AF in males...