Is It Possible To Get Uniprot Ft Features Without Using The Available Flat Files
5
4
Entering edit mode
13.0 years ago

Hello,

I was interested in getting the FT features for a bunch of kinases. for instance for AKT1 I would get "FT DOMAIN 150 408 Protein kinase." from the file available at http://www.uniprot.org/uniprot/P31749.txt.

So I was wondering if parsing the dedicated Uniprot text annotation file is the only way to get this information. Or this information is also stored and available in a publicly accessible database.

Thanks in advance for you suggestion.

uniprot parsing mysql database • 6.7k views
ADD COMMENT
4
Entering edit mode
13.0 years ago
Jerven ▴ 660

There are quite a few ways to get this information out of the uniprot website. Please write to help@uniprot.org

But for example this via the rest interface. I am out of the office and won't have time to write a complete answer until next week (12th of December 2011)

ADD COMMENT
0
Entering edit mode

Very very nice !! I am in a hurry to be december 12th. I didn't know about this REST possibility

ADD REPLY
0
Entering edit mode

My colleague @Elisabeth_Gasteiger hopefully answered your question. @Pierre_Lindenbaum also gave a good answer.

ADD REPLY
4
Entering edit mode
13.0 years ago

Here is a the faq to using the REST interface: http://www.uniprot.org/faq/28

The batch service doesn't seem to let you output custom tab format. http://www.uniprot.org/batch/

Assuming you have a file of accession ids, you can use the entry get service using this python script:

import urllib,urllib2,sys

url = 'http://www.uniprot.org/uniprot?columns=id%2Cfeature%2Cdomain%2Cdomains&format=tab&query=accession%3A'

accFile = open(sys.argv[1],'r')

for line in accFile:
    acc = line.strip()

    response = urllib2.urlopen(url + acc)
    results = response.read().strip().split('\n')[1]
    response.close()

    print results

save as yourName.py. Use by: python yourName.py accessionIDsList

This script will basically go through each accession id in the list, request the entry and display the feature, count of domains, and domain name in a tab delimited format. If you want to display other information, check out the REST service FAQ and add in your own columns in the url.

ADD COMMENT
3
Entering edit mode
13.0 years ago

You could use the simple following XSLT file:


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    xmlns:u="http://uniprot.org/uniprot"
    version='1.0'
    >

<xsl:output method="text" encoding="UTF-8"/>
<xsl:param name="temporary">temporary</xsl:param>

<xsl:template match="/">
  <xsl:apply-templates select="u:uniprot"/>
</xsl:template>

<xsl:template match="u:uniprot">
  <xsl:apply-templates select="u:entry"/>
</xsl:template>

<xsl:template match="u:entry">
  <xsl:variable name="name" select="u:name[1]"/>
  <xsl:for-each select="u:feature">
    <xsl:value-of select="$name"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@type"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@description"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@evidence"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@status"/>
    <xsl:text>    </xsl:text>
    <xsl:apply-templates select="u:location"/>
    <xsl:text>
</xsl:text>
  </xsl:for-each>
</xsl:template>

<xsl:template match="u:location[u:begin and u:end]">
  <xsl:value-of select="u:begin/@position"/>
  <xsl:text>    </xsl:text>
  <xsl:value-of select="u:end/@position"/>
</xsl:template>

<xsl:template match="u:location[u:position]">
  <xsl:value-of select="u:position/@position"/>
  <xsl:text>    </xsl:text>
  <xsl:value-of select="u:position/@position"/>
</xsl:template>

</xsl:stylesheet>

Example:

xsltproc --novalid stylesheet.xsl http://www.uniprot.org/uniprot/P31749.xml

AKT1_HUMAN    chain    RAC-alpha serine/threonine-protein kinase            1    480
AKT1_HUMAN    domain    PH            5    108
AKT1_HUMAN    domain    Protein kinase            150    408
AKT1_HUMAN    domain    AGC-kinase C-terminal            409    480
AKT1_HUMAN    nucleotide phosphate-binding region    ATP        by similarity    156    164
AKT1_HUMAN    region of interest    Inositol-(1,3,4,5)-tetrakisphosphate binding            14    19
AKT1_HUMAN    region of interest    Inositol-(1,3,4,5)-tetrakisphosphate binding            23    25
AKT1_HUMAN    region of interest    Inhibitor binding            228    230
AKT1_HUMAN    active site    Proton acceptor        by similarity    274    274
AKT1_HUMAN    binding site    Inositol-(1,3,4,5)-tetrakisphosphate            53    53
(...)
ADD COMMENT
0
Entering edit mode

Thanks a lot for sharing your xslt response using online xml text files.

ADD REPLY
2
Entering edit mode
13.0 years ago
Chris ▴ 190

You can download the flat file containing all Swiss-Prot proteins here [1]. To parse that file, I'd use sth like Biopython which makes it easy to retrieve the feature section of each protein.

[1] ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz

ADD COMMENT
0
Entering edit mode

Thanks chris but I am looking for an alternative way, as I said in my post.

ADD REPLY
2
Entering edit mode
13.0 years ago

You could also use the gff format (cf. http://biowiki.org/GffFormat)

examples:

single entry: http://www.uniprot.org/uniprot/P31749.gff

query: http://www.uniprot.org/uniprot/?query=AKT1&sort=score&format=gff

(see http://www.uniprot.org/faq/28)

PS: To get a reply from the UniProt team, the best channel is to send an email to help@uniprot.org

ADD COMMENT

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6