Question

Polymorphism Definition Across Databases

2

Entering edit mode

14.1 years ago

Jarretinha 3.5k

Hi all,

I'm digging deep in human cancer-related polymorphisms right now and noticed something that can possibly a problem. The definition of polymorphism isn't the same across databases (when you get to FIND a definition/reference at all). Is there a place where I can find a compilation of this definitions? Has someone such a nice piece of information? Can anyone help to produce one?

-- Edit -- If one find other, post it or put it here!!!

Definition from dbSNP:

"... dbSNP database to serve as a central repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms. ... Note that dbSNP takes the looser 'variation' definition for SNPs, so there is no requirement or assumption about minimum allele frequency.)..."

Common SNP from the HapMap:

"...The most common type of variant, a SNP, is a difference between chromosomes in the base present at a particular site in the DNA sequence. ... It has been estimated that, in the world's human population, about 10 million sites (that is, one variant per 300 bases on average) vary such that both alleles are observed at a frequency of greater than or equal to 1%,..."

database • 5.9k views

ADD COMMENT • link updated 14.1 years ago by Jorge Amigo 14k • written 14.1 years ago by Jarretinha 3.5k

0

Entering edit mode

Can you provide some examples of the problem, especially tied to different databases? I think this would help in getting answers to your question.

ADD REPLY • link 14.1 years ago by Larry_Parnell 16k

Ram · Answer 1 · 2011-02-23

2

Entering edit mode

14.1 years ago

User 59 13k

I think this is partially answered in this thread in the discussion of what is a mutation vs what is a polymorphism.

Asking why the definition differs across databases is like asking why people can't agree on the definition of a gene. There is no standard terminology, only broad interpretation that we largely mean the same thing.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 14.1 years ago by User 59 13k

1

Entering edit mode

That's not so prosaic as it seems. HapMap, Human Variome, dbSNP do not think the same way about the same data! E. g. to dbSNP a mutation and a SNP in the HapMap sense are the same thing!!!! And my question was very pragmatic. I don't want to discuss the definition, just want to compile them to avoid a lot of problem. Hopefully counting with a little help from my friends :D

ADD REPLY • link 14.1 years ago by Jarretinha 3.5k

0

Entering edit mode

Fair enough, read the question with tired eyes and missed the call to arms ;) my bad.

ADD REPLY • link 14.1 years ago by User 59 13k

0

Entering edit mode

No problem! I'm compiling some illustrative examples right now. It's simply complicated to find the scope of some databases . . .

ADD REPLY • link 14.1 years ago by Jarretinha 3.5k

score 1 · Answer 2 · 2011-02-24

the definition of SNP has been fixed for years: a single base variation on the genome that is established within a population with a frequency of >1%. this definition tried to bring the idea that certain sites on the genome were more "flexible" than others, capable of being segregated to the offspring, and for that reason they could be measured in terms of population genetics through allele frequencies and haplotypes. that is what a SNP is and have always been, although I agree that how the term SNP is used on different databases may differ from this original definition.

this definition, that was possible coined as a reference on ~2000, did in fact work at the time it was defined, since the only SNP knowledge available came from the sequencing of a very few number of human samples, and for that reason plenty of frequency assumptions had to be done (this is the reason why the term "common variant" had to arise, because it was almost impossible to determine which variants were from an individual only and which ones could be shared with others). but now we are able to find variants through NGS that doesn't necessarily have to be common, and for that reason we are moving from genotyping SNPs (we search for DNA variation in particular places we already know) to sequencing variants (we read the genome and describe how it differs from an agreed reference).

the main problem of the databases' discordance is that their scope may not be the same, and for that reason one has to know where the data comes from and why that data has been stored there. this is why you can't directly compare dbSNP with HapMap, since HapMap data comes from the typing of ~4M already defined sites trying to determine as best as possible the haplotype distribution in humans, and dbSNP is "just" a variation repository. major dbSNP updates have in fact come from HapMap data, but once that ~4M sites are known all that dbSNP may obtain from HapMap is "just" population data, but not new SNPs.

so, to conclude, let me get back to the genotyping vs. sequencing issue. HapMap has been maintained the original SNP concept through their database since they started because what they do is genotyping, and they already know that those ~4M sites they are studying are indeed polymorphic. but dbSNP since build 130 started to accept submissions from 1000 Genomes, which contained rare variants since the technique used is now genotyping. this has lowered the frequency threshold for the variations present on dbSNP from a minimum of ~1% to ~0.1%, so what you find on dbSNP doesn't have to be strictly a SNP. this is why the term SNV (single nucleotyde variation) is getting favoured over the term SNP.

as a final suggestion, you definitely have to be aware of which database you are working with, and which kind of data is stored in it. knowing why that data was stored there would also help understanding what you may extract from it, and how you may use those results.