Polymorphism Definition Across Databases
2
2
Entering edit mode
13.8 years ago

Hi all,

I'm digging deep in human cancer-related polymorphisms right now and noticed something that can possibly a problem. The definition of polymorphism isn't the same across databases (when you get to FIND a definition/reference at all). Is there a place where I can find a compilation of this definitions? Has someone such a nice piece of information? Can anyone help to produce one?

-- Edit -- If one find other, post it or put it here!!!

Definition from dbSNP:

"... dbSNP database to serve as a central repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms. ... Note that dbSNP takes the looser 'variation' definition for SNPs, so there is no requirement or assumption about minimum allele frequency.)..."

Common SNP from the HapMap:

"...The most common type of variant, a SNP, is a difference between chromosomes in the base present at a particular site in the DNA sequence. ... It has been estimated that, in the world's human population, about 10 million sites (that is, one variant per 300 bases on average) vary such that both alleles are observed at a frequency of greater than or equal to 1%,..."

database • 5.6k views
ADD COMMENT
0
Entering edit mode

Can you provide some examples of the problem, especially tied to different databases? I think this would help in getting answers to your question.

ADD REPLY
2
Entering edit mode
13.8 years ago
User 59 13k

I think this is partially answered in this thread in the discussion of what is a mutation vs what is a polymorphism.

Asking why the definition differs across databases is like asking why people can't agree on the definition of a gene. There is no standard terminology, only broad interpretation that we largely mean the same thing.

ADD COMMENT
1
Entering edit mode

That's not so prosaic as it seems. HapMap, Human Variome, dbSNP do not think the same way about the same data! E. g. to dbSNP a mutation and a SNP in the HapMap sense are the same thing!!!! And my question was very pragmatic. I don't want to discuss the definition, just want to compile them to avoid a lot of problem. Hopefully counting with a little help from my friends :D

ADD REPLY
0
Entering edit mode

Fair enough, read the question with tired eyes and missed the call to arms ;) my bad.

ADD REPLY
0
Entering edit mode

No problem! I'm compiling some illustrative examples right now. It's simply complicated to find the scope of some databases . . .

ADD REPLY
1
Entering edit mode
13.8 years ago

the definition of SNP has been fixed for years: a single base variation on the genome that is established within a population with a frequency of >1%. this definition tried to bring the idea that certain sites on the genome were more "flexible" than others, capable of being segregated to the offspring, and for that reason they could be measured in terms of population genetics through allele frequencies and haplotypes. that is what a SNP is and have always been, although I agree that how the term SNP is used on different databases may differ from this original definition.

this definition, that was possible coined as a reference on ~2000, did in fact work at the time it was defined, since the only SNP knowledge available came from the sequencing of a very few number of human samples, and for that reason plenty of frequency assumptions had to be done (this is the reason why the term "common variant" had to arise, because it was almost impossible to determine which variants were from an individual only and which ones could be shared with others). but now we are able to find variants through NGS that doesn't necessarily have to be common, and for that reason we are moving from genotyping SNPs (we search for DNA variation in particular places we already know) to sequencing variants (we read the genome and describe how it differs from an agreed reference).

the main problem of the databases' discordance is that their scope may not be the same, and for that reason one has to know where the data comes from and why that data has been stored there. this is why you can't directly compare dbSNP with HapMap, since HapMap data comes from the typing of ~4M already defined sites trying to determine as best as possible the haplotype distribution in humans, and dbSNP is "just" a variation repository. major dbSNP updates have in fact come from HapMap data, but once that ~4M sites are known all that dbSNP may obtain from HapMap is "just" population data, but not new SNPs.

so, to conclude, let me get back to the genotyping vs. sequencing issue. HapMap has been maintained the original SNP concept through their database since they started because what they do is genotyping, and they already know that those ~4M sites they are studying are indeed polymorphic. but dbSNP since build 130 started to accept submissions from 1000 Genomes, which contained rare variants since the technique used is now genotyping. this has lowered the frequency threshold for the variations present on dbSNP from a minimum of ~1% to ~0.1%, so what you find on dbSNP doesn't have to be strictly a SNP. this is why the term SNV (single nucleotyde variation) is getting favoured over the term SNP.

as a final suggestion, you definitely have to be aware of which database you are working with, and which kind of data is stored in it. knowing why that data was stored there would also help understanding what you may extract from it, and how you may use those results.

ADD COMMENT
0
Entering edit mode

The definition of SNP vary in the literature, including the frequency (1-5%). dbSNP not even require frequency data for submission, they expect you to be able to tell the difference between polymorphisms and mutations. Certain mutations databases just treat everything as mutations. Being fixed for year doesn't mean anything about quality, scope or realiability. And if the very object that a DB store is poorly defined or generates a conflict with other DB, then the scope of the DB will suffer from the same problem on a bigger scale. That's why I'm trying to gather definitions, not defining!

ADD REPLY
0
Entering edit mode

then what you are trying to find are definition usages, not the definition itself ;) the definition is what has been established for years, and of course the definition itself does not imply anything on quality, scope or reliability. it is its usage what has been varying, hence I agree that trying to find what do the SNP databases mean with the term SNP should be known clearly stated on these databases.

ADD REPLY
0
Entering edit mode

In my opinion this is a real problem. I think that the shady area between mutation/polymorphisms DBs is much greater than we think. Another example from ensembl: "...(specifically our term SNP is closer to the SO term SNV than the SO term SNP)...". SO is Sequence Ontology. I'm a population genetics guy, so the definition doesn't matter much. But, the functional-oriented guys that I work with are hopeless without ontologies/definitions.

ADD REPLY
0
Entering edit mode

it is a real problem indeed. I mainly work with population geneticists, and I can't agree more!

ADD REPLY
0
Entering edit mode

i am just reviewing lots of variation resources. i can collate my definitions and share them with you?

ADD REPLY
0
Entering edit mode

I think that was the purpose of this question from Jarretinha. go definitely for it, as I agree it would be useful to join in a single post the different usages of the term SNP depending on the database chosen.

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6