Dbsnp: Inconsistency In Reported Amino Acids?
3
5
Entering edit mode
14.0 years ago
Chris ★ 1.6k

Hey,

I might have stumbled over some inconsistency in dbSNP: If I take a look at the dbSNP homepage for e.g. rs4784677 [1], I stumble over a mis-leading SNP position in the protein sequence (in the GeneView part):

When I look at position 70 (1-based) in the sequence for NP_114091.3, I see a N (Asparagine). However, the report insists that there is a S (mutation from S to {N,T,I}). How could that happen? I have thousands such cases (actually unearthed from the dbSNP SQL tables), where the actual residue at the given sequence position does not match the reported residue in the web interface. Am I missing something here or did I indeed stumble over a mapping error?

Thanks, Chris

[1] http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=4784677

dbsnp mapping error • 2.9k views
ADD COMMENT
7
Entering edit mode
14.0 years ago

Chris,

I have seen this as well, but on a case-by-case basis for particular genes of interest. As one who worked on the human genome project and knowing its history as well as next door to a lab doing the bioinformatics of the Golden Path and SNP mapping, I attribute such differences to the allele(s) found in the reference genome compared to those found during discovery of variation in the genome. (Remember the source of the NP_nnnnnn sequence is the reference genome.) In other words, different individuals' DNA was cloned and sequenced for the different projects - reference genome and SNP discovery. Thus, the alleles very easily can be and often are different.

ADD COMMENT
3
Entering edit mode
14.0 years ago
Chris ★ 1.6k

Thanks Larry, sounds plausible. I wrote the dbSNP team about those inconsistencies. They confirmed the issue and told me that this is indeed a serious problem. They seem to be very interested in fixing this. In the meanwhile I did some further checkings which unearthed a huge bunch of those mapping errors onto protein sequence. The three main errors are:

  1. the residue position is out of sequence bounds,
  2. a synonymous residue change is not synonymous,
  3. a non-synonymous change is actually synonymous.

I've put the specific rs's as SQL dumps on my homepage for those of you who are interested.

  1. http://www.rostlab.org/~schaefer/dbSNP_reports/sequence_out_of_bounds.html
  2. http://www.rostlab.org/~schaefer/dbSNP_reports/synonymous_SNP_failed.html
  3. http://www.rostlab.org/~schaefer/dbSNP_reports/non_synonymous_SNP_failed.html

Chris

ADD COMMENT
1
Entering edit mode
14.0 years ago
Shigeta ▴ 470

Its never a bad idea to screen dbSNP for inconsistencies. Its a humongous dataset and QC is iterative in my experience.

ADD COMMENT

Login before adding your answer.

Traffic: 1775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6