In the VCF distributed by the GEUVADIS project, I have discovered a number of instances of loci for which all genotype entries are 0|1
or 1|0
... two examples are snp_1_145075854
and snp_1_144860026
. Is this a known issue?
In the VCF distributed by the GEUVADIS project, I have discovered a number of instances of loci for which all genotype entries are 0|1
or 1|0
... two examples are snp_1_145075854
and snp_1_144860026
. Is this a known issue?
Yes, this is relatively common, and it is usually related to misalignment of the reads, especially when there are pseudogenes for the region the reads were mapped to. I assume snp_1_145075854 is at chr1:145075854, which maps to the gene PDE4DIP, which has 7 known pseudogenes (see here). So what likely happened is that PDE4DIP and one of its pseudogenes are homozygous for different alleles at analogous parts of the gene, and some of the reads originating in the pseudogene got mapped to the PDE4DIP, creating the universal het you are observing.
In general, when I see a shared het in many people I am very suspicious and sometimes will just filter out these universal hets (although I admit that may not always be the best choice).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.