I have a list of gene name in a file
CHRNB2
EGR2
GCK
KRT14
LMNA
FGF3
TK2
ABCC8
How can U map them to uniprot ID?
P.S I tried Uniprot "ID mapping" (from-"GENEID" to-"UNIPROTKB AC").But it couldn't map.
Please suggest me what to do.Thnx
I have a list of gene name in a file
CHRNB2
EGR2
GCK
KRT14
LMNA
FGF3
TK2
ABCC8
How can U map them to uniprot ID?
P.S I tried Uniprot "ID mapping" (from-"GENEID" to-"UNIPROTKB AC").But it couldn't map.
Please suggest me what to do.Thnx
Use Mygene.info. You can do batch requests via post, or you can use the live API to do batch requests as well.
Here's how via the live API:
Click on the "Try API live!", select "gene query service". Click on "post"
For "q" put in your gene names separated by a comma.
For "scopes" type "symbol" (without the quotation marks)
For "fields", use "symbol,entrezgene,uniprot" and any other parameter of interest
Click "try it" when done.
Result will be in the response body.
These look to be gene symbols not Entrez Gene identifiers, so identifier mapping for Entrez Gene would not work. However UniProt include most gene symbols and their various synonyms in their data, so a query should work and find the set of UniProtKB entries which match. For example:
I suspect you may have a specific species in mind so you might want to use additional terms to limit the results further.
Once you have worked out the form of the required query you could use the UniProt.org REST API to script the required queries and return only selected data.
UniProt.org REST API work specifically for UniProt IDs. Can you please demo using gene name?
Your entries remind me of gene symbols. a couple of options are available, just to name a few:
Adding to the above list, you can also try Biodbnet
For the HGNC symbols its useful to go throught the Symbol Checker first
You can then do some cross-checks (e.g. see if the names are what you expect and if any symbols are outdated)
Then paste the HGNC: ID list column across to UniProt ID mapper
You can then filter by species and reviewed (= Swiss-Prot)
(would be interesting if you ran a few thousand symbols through all the methods above and tell us how you got on!)
A note of caution, related to this question, from the results for the sequential cross-reference restricts of Human Swiss-Prot, HGNC, Entrez Gene and Ensembl. It thus looks like http://www.ncbi.nlm.nih.gov/pubmed/24939910 suggesting sub-19K numbers could be not far short of the mark, or at least nearer than the (conservative back then) 25K estimate a decade ago http://www.ncbi.nlm.nih.gov/pubmed/15174140
You should be able to obtain what you are looking for by following the instructions in this UniProt FAQ:
Please note that it is planned to extend the UniProt identifier mapping to gene symbols.
Don't hesitate to contact the UniProt helpdesk if you have additional questions.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
There is also a "mygene" python module available to use.
Usage examples on this id mapping tutorial are available here
Very helpful. Thanks