I am confused which allele is the wildtype for certain SNPs
1
0
Entering edit mode
4.0 years ago
koay • 0

Hi, I am confused about the genotypes of certain SNPs that I am looking into. I will give this SNP rs1128503 as an example. This SNP is found at gene ABCB1, on the minus strand.

If I check dbSNP, the listed allele is A>G. However, since this gene is on the minus strand, the RefSeqGene put it as T>C.

However, on SNPedia and general google search will show that publications always put the mutation as C>T instead.

My question is, which is the wildtype? T or C?

Thank you.

dbSNP for rs1128503 https://ibb.co/7nnHmmR

google search for rs1128503 https://ibb.co/7KmqkDj

SNP • 1.6k views
ADD COMMENT
1
Entering edit mode
4.0 years ago
Emily 24k

A (or T if you're talking about the gene) is the reference allele. That means it is the allele that is found on the reference genome.

The major allele is G. This is the most frequent allele in the 1000 Genomes and gnomAD databases. It is worth noting that A is the most common in the South and East Asian populations in both, however.

The ancestral allele is G as determined by comparison of this locus to other primates.

Neither allele has been associated with a phenotype, according to ClinVar.

"Wildtype" is not an appropriate word to use for human variants as it implies that one allele is "normal" and the other allele is not. This has all kinds of racial and other discriminatory associations so it is best to avoid it.

ADD COMMENT
0
Entering edit mode

Dear Emily,

Thank you for your explanations and pointing out the other databases and sites that I can refer to. However, I am still not sure why articles write the alleles the way they do.

Based on the Ensembl site, the MAF for African, American and European populations is allele A, while the MAF for East Asian and South Asian populations is allele G.

Papers that study this SNP, regardless of either Caucasian or Asian population, usually write in this manner - "ABCB1 C1236T (rs1128503)" or "SNP 1236C>T (rs1128503)". From my understanding, the articles are saying the reference allele is C. Please share some info regarding this nomenclature. Are we supposed to report based on the ancestral allele G? And since the gene is located on the minus orientation, we complement the G to C instead?

Thank you.

ADD REPLY
0
Entering edit mode

The alleles of the variant are A/G, the alleles in the gene are T/C. When referring only to the variant, the convention is to always talk about the forward strand allele. The papers you are referring to are using the alleles in the gene.

ADD REPLY
0
Entering edit mode

Hi,

Yes, I am aware of the difference between the alleles of the variant and the alleles in the gene. My question is, why was it written C>T?

Shouldn't it be T>C for alleles in the gene? As opposed to A>G for alleles in the variant.

ADD REPLY
0
Entering edit mode

It depends on whatever the authors have used. In many cases, lab protocols have been set-up pre-NGS and, therefore, reporting of variants depends on the lab-specific protocol. There are also other reasons. You therefore need to direct your questions to the authors of the published works to which you are referring.

ADD REPLY

Login before adding your answer.

Traffic: 1375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6