Hi, here's a question which seems more tricky to solve than it looks initially. I am trying to convert SwissProt accessions into a tabular format for import into SQL containing the "best bet" sub-cellular localization of all proteins (one row per pair (accession, location) ):
Accession Location Evidence
Q9YH95 Nucleus Manual
Just the way it looks like in the picture in the html page: http://www.uniprot.org/uniprot/Q9YH95 Parsing the XML format would be easy. http://www.uniprot.org/uniprot/Q9YH95.xml contains:
<comment type="subcellular location">
<subcellularLocation>
<location evidence="1 3">Nucleus</location>
</subcellularLocation>
</comment>
Edit: Should be nicely solved using this XSLT by Pierre: How to map sub-cellular localisation to enteries in uniprot database fasta file.
That is not the case for all entries though: e.g. http://www.uniprot.org/uniprot/Q96AT9 and http://www.uniprot.org/uniprot/Q96AT9.xml
<dbReference type="GO" id="GO:0005829">
<property type="term" value="C:cytosol"/>
<property type="evidence" value="ECO:0000318"/>
<property type="project" value="GO_Central"/>
</dbReference>
<dbReference type="GO" id="GO:0070062">
<property type="term" value="C:extracellular exosome"/>
<property type="evidence" value="ECO:0007005"/>
<property type="project" value="UniProtKB"/>
</dbReference>
Does that mean the way to get the full information is:
- Parse the <subcellularlocation> for those entries that have it.
- Parse GO terms and select those that are coming from "Cellular localization" for the remaining entries using a GO parser?
I noted it would be best to simply reproduce the code that draws the compartment image, does somebody have access to that?
Related but not the same: what is the Query to find proteins which Subcellular location have Manually-assigned evidence in uniprot ?
Tagging: Elisabeth Gasteiger
We are at SIB Swiss-Prot working on UniProt are offsite, wait until thursday ;)