How Is The Reference Sequence For A Snp Defined?
3
7
Entering edit mode
14.1 years ago
Andrea_Bio ★ 2.8k

Hi

I am interested in the distribution of SNPs in different populations. I know that if you have a typical biallelic SNP the frequency of the major and minor alleles can be different between populations. Can they ever differ to the extent that one allele is the major allele in one population whereas the other allele is the major allele in a different population.

This then leads me on to the definition of a reference sequence for a SNP:

Is the reference sequence for a SNP simply the base that was present at this locus in the DNA that was sequenced for the reference genome (or the base that had highest occurence if reference genome was heterozygous at that locus). Or is the definition of the reference allele sequence linked in any way to the major allele for that SNP. If it is the latter then I was wondering how you take into account different populations. If it is the former then the population issue is irrelevant.

Many thanks.

snp • 11k views
ADD COMMENT
6
Entering edit mode
14.1 years ago

To respond to your first question about minor allele frequencies differing between populations: absolutely, the MAF (minor allele frequency) for a given SNP can differ between populations. As a randomly drawn example, look at rs11652704 in p53. The frequency of T (reference) alleles in ASW (african ancestry in the Southwest USA) is 27%. Compare with CEU, which is caucasians from Utah, where it is 87%. This is easy to see if you browse HapMap.

This issue is very relevant when interpreting association studies in human genetics, as the genetic architecture differs between populations. It is a matter of debate to what extent polymorphisms identified in one population (say, caucasians from the UK) are relevant to others (say, Southeast Asians), or the other way around. "More studies are needed...", as the saying goes, though some direct comparisons have been done and published in Nature Genetics on Breast and other cancers. For example, see Zheng et al. Nat. Genet. 2009 and Long et al. PLoS Genet. 2010 identifying a locus on 6q using samples of Asian women, compared to previous studies published in caucasian women such as Hunter et al. NG 2007.

ADD COMMENT
1
Entering edit mode

Added some references; there are also numerous studies in African-derived women, though I'm not sure any really large ones have been published yet. These are primary literature; probably look for reviews on admixture studies to find a better overview.

ADD REPLY
0
Entering edit mode

Hi David, could please add the link to that comparison paper you mentioned in your answer. Thanks.

ADD REPLY
0
Entering edit mode

thanks david, i appreciate the concrete example.

ADD REPLY
5
Entering edit mode
14.1 years ago
lh3 33k

In the most common way, we use the first definition: the reference base of a SNP is the base on the reference genome. It is not clearly defined when the reference base is ambiguous. But anyway, reference is not defined based on frequency.

ADD COMMENT
0
Entering edit mode

thanks for your prompt response. That's what I thought but doubted myself for some reason

ADD REPLY
1
Entering edit mode
14.1 years ago

from the dbSNP manual:

The NCBI "Reference allele" for a given SNP refers to the nucleotide base on the NCBI reference assembly at the SNP’s position

ADD COMMENT

Login before adding your answer.

Traffic: 2382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6