What Is The Difference Between A Snp And An Entry In A Mutation Database?
4
7
Entering edit mode
14.2 years ago
Andrea_Bio ★ 2.8k

Hello

I am returning to bioinformatics after a long absence and the landscape has certainly changed and there are a few (well, a lot of) things I'm not clear about.

Taking the human genome as an example, what is the difference between a SNP and an entry in a the HGMD? I've been using the ensembl genome browser to look at sequence variations and I'm unsure as to why some are classed as SNPs and some mutations. My understanding of a SNP is that it is a mutation that happened a long time ago and has now become established as a polymorphism in the population, so you refer to it as a polymorphism not a mutation. Using that logic a mutation would be a a variation from the reference genome which has recently occured, but I don't think that's the right answer. I think a mutation is a SNP associated with a disease phenotype. However SNPs and HGMD mutations that cause missense mutations seem to be treated differently by ensembl and I don't understand why and it is confusing me.

Take this example gene and transcript Gene: ENSG00000163914 Transcript: ENST00000296271

There is a 'mutation' at the 4th codon which gives rise to a missense mutation and is linked to the HGMD via this ID: CM930647. However this mutation is not listed among the protein variants for this transcript. Other variants from SNPs and some other mutations are listed. Perhaps this is nothing more than an error in the database.

Thanks in advance for your help

snp mutation • 12k views
ADD COMMENT
4
Entering edit mode
14.2 years ago
User 59 13k

I'll have a stab that this is just a nomenclature difference due to the data sources. Of the 6 sequence variants listed for that transcript 5 are from dbSNP, one is from HGMD.

Things from dbSNP have variation class SNP and things from HGMD have variation class HGMD_MUTATION

Having looked at other HGMD_PUBLIC database entries in Ensembl, these are also classed as HGMD_MUTATION, which suggests it is the data source that is giving rise to this difference in classification. Therefore it's probably best to know what HGMD considers to be a mutation.

Your point about polymorphism vs mutation would still stand to most people, but that isn't necessarily what is being referred to here.

ADD COMMENT
0
Entering edit mode

Why can't i add a comment?

ADD REPLY
0
Entering edit mode

I'm trying to add a reply but the website won't let me add a link: nothing happens when i click 'add comment.' I've also tried wrapping the link up in [?] tags too. How did you add a link to you reply? Thanks

ADD REPLY
0
Entering edit mode

Links will be parsed automatically in comments, there is a link button in the text editor for posts.

ADD REPLY
0
Entering edit mode

Hi. I've got 8 variations but i was looking at an archive version if that makes a difference

If you then look at the cdna you can see a mutation at the 4th codon which causes a missense mutation but this isn't listed in the protein variants.

Do you have any idea why this wouldn't be listed as a variant?

I tried posting links to be helpful but the page wouldn't let me. I was looking at the May 2010 archive

ADD REPLY
0
Entering edit mode

the point of blindly trusting what the database fields mean is what in fact they contain may not be always the best idea. as we all know, database fields are meant to handle information, and the exact meaning of that information does not always suit the database field names. databases help to handle very high percentages of the whole, but when it comes to diseases and particular issues then they are not so straight-forward. I'm expanding my point on a proper answer, as I'm running out of characters here ;)

ADD REPLY
4
Entering edit mode
14.2 years ago

this is an example that trusting the "SNP" wikipedia entry only is not the best idea to dig deep into biological concerns. I see here a vocabulary problem rather than a proper issue: both a SNP and a single base mutation are just punctual DNA variations. although the deep meaning of all these 3 terms (SNP, mutation and variation) may vary from one reference to other, "SNP" is commonly used to describe a DNA change with at least 1% frequency on a population, and "mutation" is left to describe pathogenic DNA changes. but some DNA changes may be more frequent in one population rather than in other, so what it is a SNP on a population may just be an unfrequent variation on other.

historically, the SNP databases were built using methods that were supposed to highlight common variations, or at least variations above 1% on the genotyped population of interest, but the arrival of NGS and the bulk load of NGS results on SNP databases is making people confused about the meaning of SNP, since there are now variations recorded in SNP databases that are not in fact SNPs. the current problem is that we can't really filter out mutations (variations found associated with a disease) using just SNP databases alone, since some mutations may be frequent in certain populations (hence reported in the past) or they might have been captured by large NGS efforts (such as 1000 Genomes Project).

now, trying to answer your question, I just can say that an entry in a mutation database, which is by definition a variation, it can happen to be a SNP for a particular population, or even SNP for all human population if the variation is not the sole cause of the dissease, or if the dissease penetrance is not high enough to be statistically significant so that the variation could not have been reported as pathogenic. in fact, most of the available mutation database are not totally trustworthy (take a look to the HVP project, which is trying to unify and standarise the building of such databases), and the only thing left to the researcher is to investigate thoroughly each detected variation.

ADD COMMENT
0
Entering edit mode

+1 thanks for a good answer. I wasn't totally clear on the distinction before

ADD REPLY
0
Entering edit mode
13.0 years ago

I do not think one can clearly say what is a mutation (present in only one person or a family) and what is a SNP (present in more than 1% of the population). We need way more data from many populations to figure this out. If you define mutation as anything that is pathogenic then it gets even more difficult. What is pathogenic to one person might not be pathogenic to another. Maybe we need to define things better? For example, mutations are not inherited (should be relatively easy to find out if family members consent). SNPs: are inherited. Then within mutations/SNPs one can have pathogenic mutations/SNPs and non-pathogenic ones. Over the next decade it is expected that we will be able to find quite a few of these pathogenic mutations/SNPs.

ADD COMMENT
1
Entering edit mode

Absolutely mutations can be inherited. This is the mechanism by which autosomal dominant diseases are passed from one generation to the next.

ADD REPLY
0
Entering edit mode
13.0 years ago
Laura ★ 1.8k

Something to note here is that HGMD is a subscription database and while ensembl has the location of the HGMD snps they do not have the allele strings of these variants and this is why it is presented with a consequence of coding unknown as without the allele string ensembl cannot determine if this is synonymous or nonsynonymous

This is listed here though

ADD COMMENT

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6