Entering edit mode
7.0 years ago
a.rex
▴
350
I have a very long list of gene and their corresponding uniprot IDs from a blast against the uniprot database.
I was wondering - is there a tool I can download to convert these ids to a gene name.
Thank you
Here is Uniprot's link that explains about programmatic (i.e. terminal) access to the retrieve/mapping tool: Programmatic access - Mapping database identifiers They have examples on how to write a script in several languages just for this. Also, If you are unfamiliar with how to use the UniProtID converter tool, here is a UniProtID Tutorial
You can manually upload a file of thousands of IDs and convert them. I believe their limit is a file with ~40,000 IDs. If you have a very large amount of IDs from your blast output, you can use the
split
command in terminal to split each file to 40,000 IDs or whatever you wish and then write a script using their examples to access that tool programmatically on each of these file.Using #R
You can find your data under the column "Gene.names".
For some reason when I search for a uniprot accession using this function, it returns data for twice as many proteins. Is there something special about the A0A2 or A0A0 prepended to the accessions that makes this work?
Edit: Figured it out. Paste "accession:" before each accession id to make this work. I suspect just including a bare accession (e.g. P60710) makes it search both entry ids and accessions.
UniProt ID converter.
Thank you - I am however aware of this. I just wanted to know if there was a way of inputting a very large list >1000 loci. Something terminal based?
The code I posted here could solve your problem as well, once you add the necessary input and output logic you're looking for.
You can download ID mappings and parse them in any way you want.
This conversion tool will let you (in theory) a list of unlimited ids, but I think in the UI it is limited to about 3,000. They also have an API that you should be able to access via a terminal and I believe that workflow supports the unlimited workflow.
https://biodbnet-abcc.ncifcrf.gov/db/db2db.php