I'm finding some variants that are present with a high allele fraction in the 1000 genomes project but not in the ESP project, despite good coverage available in the region.
For example:
11 58207204 rs4938895 A G 100 PASS AC=2166;THETA=0.0002;SNPSOURCE=LOWCOV,EXOME;AA=G;AN=2184;RSQ=0.9430;VT=SNP;AVGPOST=0.9991;ERATE=0.0008;LDAF=0.9912;AF=0.99;ASN_AF=1.00;AMR_AF=1.00;AFR_AF=0.97;EUR_AF=1.00
is an exonic variant in 1000 genomes and while the region has high enough coverage in the ESP data, there is no variant call.
The human reference sequence contains some rare alleles, and some sequencing errors.
The 1000 Genomes data reports the frequency of alleles that differ from the reference sequence, so in the above example, all of the EUR and AMR individuals differed from the reference, hence an allele frequency of 1.00 (i.e. everybody was different from the reference).
In the ESP, all of the alleles at the above position are the same; there is no variation within the cohort, and no variant is reported.
The ESP project has samples from populations with European ethnicity as well. So, does it not seem a little strange that not one person in the population has an alternate allele at this locus when the 1000 genomes suggests that its pretty much a common variant in the same population?
The ESP project has samples from populations with European ethnicity as well. So, does it not seem a little strange that not one person in the population has an alternate allele at this locus when the 1000 genomes suggests that its pretty much a common variant in the same population?
All people in the population have the alternate allele.