Solved: Suspiciously High Frequencies Of Alternate Alleles In 1000 Genomes Data
1
5
Entering edit mode
10.9 years ago
Pierre ▴ 130

Hi,

I am having trouble interpreting the genotype calls and the respective allele frequency information in the most recent 1000 genomes data release.

Let's take a look at an example from phase 1 integrated call sets - it's a SNP with the id rs3748597.

Reference allele is T and the reported alternate allele is C at chromosome 1, position 888659. The corresponding functional change at the protein sequence level is a change from Isoleucine to Valine at amino acid position 300.

This is the raw VCF from the integrated call sets for this SNP reported for 1092 individuals:

1    888659    rs3748597    T    C    100    PASS    AVGPOST=1.0000;AA=C;SNPSOURCE=LOWCOV,EXOME;AN=2184;THETA=0.0005;LDAF=0.9282;VT=SNP;AC=2027;RSQ=1.0000;ERATE=0.0003;AF=0.93;ASN_AF=0.92;AMR_AF=0.92;AFR_AF=0.90;EUR_AF=0.95    GT:DS:GL    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00        1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5,-2.3279,-0.002046    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-3.74,-0.00    0|1:1.000:-5.00,0.00,-5.00    1|1:2.000:-5.00,-5.00,0.00

Although I removed a substantial portion of the genotype information for the sake of space, the observation still holds: all of the 1092 individuals carry this variant - most are even homozygote for this variant, that is, they carry the alternate allele on both chromosomes.

There are more of such examples.

Could you please help me understand:

  1. Has this observation - that some variants have incredibly high frequencies, in fact some "alternate" alleles might well be the true reference - already been reported? Am I missing something obvious or understanding the genotype and frequency information incorrectly? (Explained: I understand that some reference alleles are in fact true minor alleles - I am simply surprised to come across cases where alternate allele can reach to frequencies as high as 93%.)

  2. dbSNP reports MAF/MinorAlleleCount: T=0.072/156. I understand that there may be discrepancies regarding the allele frequency due to conceptual or methodological reasons, however, am completely puzzled about the observed MAF=1.0/2184 and the dnSNP MAF=0.072/156. Any explanation? (Explained: dbSNP correctly reports the true minor allele, which happens to be the reference allele. Refence call is possibly made based on individuals carrying the true minor allele.)

Thank you.

1000genomes snp • 4.2k views
ADD COMMENT
4
Entering edit mode

Maybe I misunderstand your question but ...

It's not necessary that the reference allele be the "major" allele. In this case, apparently, the reference is based on someone who carries the true "minor" allele.

ADD REPLY
0
Entering edit mode

Hey brentp - that is indeed the case. In fact, dbSNP correctly reports the minor allele in this example, which happens to be the reference allele. (I will modify that part accordiingly) I am just surprised that in the light of such high frequencies for non-trivial number of alternate alleles, the reference managed to find the true minor allele. Thanks.

ADD REPLY
0
Entering edit mode

I'm a little confused by your use of "reference". The human population has 7 billion people in it. There is no Platonic 'true reference' sequence. We just pick one sequence call the reference, knowing the limitations of that approach.

ADD REPLY
0
Entering edit mode

perhaps you could add the answer separately as well, it would help new readers

ADD REPLY
0
Entering edit mode

done
.

ADD REPLY
5
Entering edit mode
10.9 years ago
brentp 24k

Copied comment from above:

It's not necessary that the reference allele be the "major" allele. In this case, apparently, the reference is based on someone who carries the true "minor" allele.

ADD COMMENT

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6