Entering edit mode
11.3 years ago
Vladimir Chupakhin
▴
520
Hello!
I need your advice on how to annotate the protein (uniprotAC) with taxonomy lineage.
http://www.uniprot.org/uniprot/?query=$(QueryProperty)&format=tab&columns=$(columns)
where columns are
id,genes,domains,ec,database(PDB),organism,organism-id
Actually I got lost on the step to provide the correct name for taxonomy lineage column. Any suggestions?
Many thanks!
Sorry, but I don't understand what you are trying to do. Could you add an example of what your input is, and what you expect as output?
on the uniprot page there is a info: Taxonomic lineage Eukaryota › Metazoa › Chordata › Craniata › Vertebrata › Euteleostomi › Mammalia › Eutheria › Euarchontoglires › Primates › Haplorrhini › Catarrhini › Hominidae › Homo
I want to download that data...
You have to parse the xml or text file by a python or other program.
I was looking for other solutions, parsing txt/xml will take a lot of time. I am currently using only mammals list - that can help, but it need an update.
Sorry to burst your bubble, but parsing txt/xml is incredibly fast. Just use a library such as lxml or w/e. Or get it on json instead, and use a simplejson flavor.
Anyway, the easiest solution would probably be:
Get all IDs for your query using http://www.uniprot.org/uniprot/?query=$(QueryProperty)&format=tab&columns=$(columns)
Then download the respective text file: http://www.uniprot.org/uniprot/$(id).txt and extract the OC lines.
Yes, I did that way. But I was looking for "online" solution, so I can check on the flight, esp. when the UNIPROT is updated regularly...thus I need update the list, also regularly.
You might have better luck using the Uniprot API, though I've never used it, so I don't know if more options are available. If you will be doing it very often, I would drop a mail to the Uniprot helpdesk, asking for that piece of information to be added. Seeing as they release every four weeks, you might get lucky.