Dbsnp: Mappings To Protein Sequence?
3
4
Entering edit mode
14.1 years ago
Chris ★ 1.6k

Hey,

we are trying to get a local sub-part of dbSNP running on our servers here in our group. Since we are only interested in nsSNPs, we are specifically interested in mappings of rs# to protein sequence, i.e. the concrete RefSeq identifier, the sequence position and the mutant residue. Following the dbSNP handbook from NCBI it seems that the organism-specific SNPContigLocusId tables are of major interest and indeed they have everything that we need. However, those tables only exist for 14 organisms out of overall 100. Does that mean that for the huge majority there don't exist these mappings to protein sequences? If so, why? Or could this information be stored somewhere else in the huge space of dbSNP tables?

Thanks for sharing any insights, Chris

dbsnp mapping protein snp • 3.7k views
ADD COMMENT
1
Entering edit mode

Are you interested in SNPs from all organisms or limited to a subset ? Such mappings are available in various nsSNP annotation database for human, not sure about other organisms.

ADD REPLY
0
Entering edit mode

I'm interested in nsSNPs from all organisms that show up in dbSNP. Human is among the 14 organisms that have the mappings. Thanks, Chris

ADD REPLY
0
Entering edit mode

Hi Chris,

How is your mapping from nsSNP to protein sequence? I am working on a similar project right now. Do you find why only limited mapping from nsSNP to protein sequence?

ADD REPLY
1
Entering edit mode
14.1 years ago
Jan Kosinski ★ 1.6k

In my group, a server has just been developed that does more or less the thing you want (if I understood correctly your question ;-).

http://www.biocomputing.it/picmi/

You can try with Nucleotide input option, see Help for input description.

However, in output you would get the the sequence position and the mutant residue but not on RefSeq but Ensemble transcript. Ensemble transcript do have links to RefSeq, but I don't know how to retrieve them automatically for highthrouput input.

Give it a try, and contact authors if you need more.

ADD COMMENT
0
Entering edit mode

Thanks Jan, I'll give it a try. However I'd really like to know, why dbSNP only has these mappings to 14 organisms. There must be a reason for that. Chris

ADD REPLY
0
Entering edit mode
12.9 years ago
User 6318 • 0

Hi, Chris! In my group, we are currently trying to build a human protein variant database generated from nsSNPs. We need to store both the amino acid sequence of protein variant and original protein. But I can only find protein_acc, residue for the SNP allele and position, but not the protein sequence in SNPContigLocusId tables. Where can I find and download all human protein variant sequence mapped from nsSNPs?

ADD COMMENT
0
Entering edit mode

Hi, the fields protein_acc and protein_ver are pointers to RefSeq. To get the corresponding sequences go to their ftp server and download [1] the fasta file that contains all human sequences. This normally does not contain all sequences that are being referenced in dbSNP. In those cases you have to download those at NCBI case by case, e.g. by using Entrez.

[1] ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/H_sapiens/protein/protein.fa.gz

ADD REPLY
0
Entering edit mode
8.7 years ago

hi I have question in bioinformatics I have gen which is IL8 and this has mutation TGC>TGG how I could find it if the mutation in codon 36

ADD COMMENT

Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6