Question

How do I go from UniProt ID to retrieving the gene name?

0

Entering edit mode

7.0 years ago

a.rex ▴ 350

I have a very long list of gene and their corresponding uniprot IDs from a blast against the uniprot database.

I was wondering - is there a tool I can download to convert these ids to a gene name.

Thank you

sequence • 13k views

ADD COMMENT • link 7.0 years ago by a.rex ▴ 350

2

Entering edit mode

Here is Uniprot's link that explains about programmatic (i.e. terminal) access to the retrieve/mapping tool: Programmatic access - Mapping database identifiers They have examples on how to write a script in several languages just for this. Also, If you are unfamiliar with how to use the UniProtID converter tool, here is a UniProtID Tutorial

You can manually upload a file of thousands of IDs and convert them. I believe their limit is a file with ~40,000 IDs. If you have a very large amount of IDs from your blast output, you can use the split command in terminal to split each file to 40,000 IDs or whatever you wish and then write a script using their examples to access that tool programmatically on each of these file.

ADD REPLY • link 7.0 years ago by ladypurrsia ▴ 60

1

Entering edit mode

Using #R

uniprot_mapping <- function(ids) {
           uri <- 'http://www.uniprot.org/uniprot/?query='
           idStr <- paste(ids, collapse="+or+")
           format <- '&format=tab'
           fullUri <- paste0(uri,idStr,format)
           dat <- read.delim(fullUri)
           dat
                  }

  ## Usage
  ids = c("A0A2T3D680", "A0A0F0E143", "A0A0F0E266")
  uniprot_mapping(ids)

You can find your data under the column "Gene.names".

ADD REPLY • link 4.9 years ago by josev.die ▴ 70

0

Entering edit mode

For some reason when I search for a uniprot accession using this function, it returns data for twice as many proteins. Is there something special about the A0A2 or A0A0 prepended to the accessions that makes this work?

Edit: Figured it out. Paste "accession:" before each accession id to make this work. I suspect just including a bare accession (e.g. P60710) makes it search both entry ids and accessions.

ADD REPLY • link 4.9 years ago by alexandercmonovich • 0

0

Entering edit mode

UniProt ID converter.

ADD REPLY • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

Thank you - I am however aware of this. I just wanted to know if there was a way of inputting a very large list >1000 loci. Something terminal based?

ADD REPLY • link 7.0 years ago by a.rex ▴ 350

0

Entering edit mode

The code I posted here could solve your problem as well, once you add the necessary input and output logic you're looking for.

ADD REPLY • link 7.0 years ago by mobiusklein ▴ 180

0

Entering edit mode

You can download ID mappings and parse them in any way you want.

ADD REPLY • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

This conversion tool will let you (in theory) a list of unlimited ids, but I think in the UI it is limited to about 3,000. They also have an API that you should be able to access via a terminal and I believe that workflow supports the unlimited workflow.

https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLY • link 7.0 years ago by andrew ▴ 560