Obtaining SNP locations on protein sequences.
2
0
Entering edit mode
9.2 years ago
Bioaln ▴ 360

Hello. I'm a researcher in the field in proteomics. Currently, I'm trying to obtain SNP locations on protein sequences, along with protein sequences.

My question is> Which database/service currently hosts protein SNP location, where I can batch download data for every protein? Is this even possible?

Thank you very much.

SNP protein-sequence • 2.3k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

This information is available in uniprot ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

(...)
  <feature description="In dbSNP:rs11542705." id="VAR_048095" type="sequence variant">
    <original>M</original>
    <variation>I</variation>
    <location>
      <position position="155"/>
    </location>
  </feature>
(...)
ADD COMMENT
0
Entering edit mode

Sorry for late reply, I tried to parse this with python, and I successfully get all of the variants. Is there any way I can obtain only nsSNPs? Or ones, connected with pathogenic effects?

ADD REPLY
0
Entering edit mode
9.1 years ago

You can use a tool we have recently published at bioinformatics: I-PV. It will print your protein sequence along with SNVs, their polyphen and sift scores, Indels, aminoacid sequence, their chemical properties and corresponding codons. You will be able to see possible point mutations at each location and distribution of a set of amino acids to another set of amino acids. I have uploaded a set of introductory videos at the I-PV's website. Here is one of them: http://i-pv.org/intro_ipv_alt4.html

You will need the fasta files of your mRNA (NM_...) and protein sequence. You will also need a text file of conservation scores separated by newline character. (You can upload a dummy conservation file of random numbers if you like). Lastly, you will need the variant file for your SNVs where you can download from Biomart for your protein of interest. Or alternatively you can use a vcf file.

The resulting image will be interactive and you can still plot/hide data on it using the highlight tool or drop down menus. To have an idea what the output looks like and whether if it fits what you want take a look at some examples:

I hope this helps,

Good luck with your research,

ADD COMMENT
1
Entering edit mode

Thanks for the answer, were those made with Circos?

ADD REPLY
0
Entering edit mode

Dear Bioaln,

The software is built on top of circos correct. It is a combination of circos and javascript. However, you do not have to generate datatracks yourself, they are automatically generated from the fasta files you provide. The output will open in a browser, and when you click on the SNPs, it will take you to the corresponding page for further information at dbSNP, like this example:

http://i-pv.org/gifs/snpToDbsnp.gif

I hope this helps,

ADD REPLY

Login before adding your answer.

Traffic: 1635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6