Hi friends,
My question might be very trivial to others. I am working on VCF files. Recently I ended up in a confusion for the genotype representation in male X and Y chromosomes.
Human is a diploid organism, which is a well known fact. So we have following different genotypes for all autosomal chromosomes.
1) 0/0 - First allele is a reference base and second allele is a reference base (two alleles are present in two chromosomes)
2) 0/1 - First allele is a reference base and second allele is a alternate base (1 chr has ref base and its pair has alt base)
3) 1/1 - First allele is an alternate base and second allele is an alternate base (1 chr has alt base & its pair has alt base)
4) 1/2, 1/3, 1/4....so on
For male, chrX and chrY should have haploid calls. Then the genotype should be - GT : 0 instead of 0/0
GT : 1 instead of 1/1
GT : 2 instead of 1/2
But, why the VCFs are showing 0/1,1/1,1/2..etc similar to autosomal chromosomes?
That's not true of all variant callers. The RTG variant caller is sex-aware and will produce haploid GT where appropriate, according to the sex of the individuals as specified (including producing diploid calls for male within PAR regions).
RTG also includes a
chrstats
command which will help identify the sex for those samples where the sex is unknown.Good to know, thanks for the info!
Thanks, Devon. I have some samples for which I don't have the gender information.
I read somewhere that for male, there should be many mutations in chr Y and majority of the mutations in chr X should be homozygous alternate. Why it cant be heterozygous genotype?
Similarly, for female, they don't have Y chromosome. So there shouldn't be any mutations and mutations in chrX can be heterozygous and homozygous alternate.
Sorry, I am from computer science background. Can you explain it?
You should be able to tell just from chromosome X. The ones with higher numbers of heterozygous variants are female. For why, think of many each of X and Y males and females have.
Since OP is from a computer science background I would like point to the pseudo-autosomal regions on X/Y chromosomes: https://en.wikipedia.org/wiki/Pseudoautosomal_region