How can I get reference id of a protein from uniprot id using uniprot tool on mac terminal?
How can I get reference id of a protein from uniprot id using uniprot tool on mac terminal?
There are several ways to do this with R and some Bioconductor packages. Here I put three different methods, using annotation packages, using biomaRt package and using Uniprot.ws package. In each case you need to specify the target species, which makes this not so convenient if you have multi-species mapping in mind.
Here I use the annotation package org.Hs.eg.db and the interface provided by AnnotationDbi. This method returns all associated Refseq ids together, including peptide and nucleotide ids, which might be what you want or not. Advantage: do not need access to online server.
id <- "P62195" # vector of ids to map.
library(org.Hs.eg.db)
# columns(org.Hs.eg.db) # check other columns to be returned.
# keytypes(org.Hs.eg.db) # check other keys for query.
select(org.Hs.eg.db, id, "REFSEQ", "UNIPROT")
# 'select()' returned 1:many mapping between keys and columns
# UNIPROT REFSEQ
# 1 P62195 NM_001199163
# 2 P62195 NM_002805
# 3 P62195 NP_001186092
# 4 P62195 NP_002796
# 5 P62195 XR_934508
The biomaRt package connects to the biomart resource at Ensembl and makes queries based on different filters. You can get separately nucleotide/peptide ids, if desired. Check also different attributes that can be returned. Advantage: access the latest information in Ensembl.
library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
# listAttributes(mart) # check other mappings.
# listFilters(mart) # check other filters.
getBM(
attributes = c("refseq_peptide", "external_gene_name", "description"),
filters = "uniprot_swissprot",
values = id,
mart = mart
)
# refseq_peptide external_gene_name
# 1 NP_002796 PSMC5
# 2 PSMC5
# 3 NP_001186092 PSMC5
getBM(
attributes = c("refseq_mrna", "external_gene_name"),
filters = "uniprot_swissprot",
values = id,
mart = mart
)
# refseq_mrna external_gene_name
# 1 NM_002805 PSMC5
# 2 PSMC5
# 3 NM_001199163 PSMC5
UniProt.ws works like something between the other too in the sense that you have to establish a connection first then you make queries with the select()
interface. Advantage: seems the most natural way to make queries about Uniprot ids.
libraryUniProt.ws)
up <- UniProt.ws(taxId=9606) # taxid for homo sapiens.
# columns(up) # check other columns to be returned.
# keytypes(up) # check other keys for query.
select(up, id, columns = "REFSEQ_PROTEIN", keytype = "UNIPROTKB")
# Getting mapping data for P62195 ... and P_REFSEQ_AC
# 'select()' returned 1:many mapping between keys and columns
# UNIPROTKB REFSEQ_PROTEIN
# 1 P62195 NP_001186092.1
# 2 P62195 NP_002796.4
select(up, id, columns = c("REFSEQ_NUCLEOTIDE"), keytype = "UNIPROTKB")
# Getting mapping data for P62195 ... and REFSEQ_NT_ID
# 'select()' returned 1:many mapping between keys and columns
# UNIPROTKB REFSEQ_NUCLEOTIDE
# 1 P62195 NM_001199163.1
# 2 P62195 NM_002805.5
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
A: get the uniprot accession from ensembl protein ID
Hi Wei, I can use this server but i have to do this for many number of uniprot ids so i need a script in any programming language so i could convert them all at once.If you could help me out in this then that would be of great help.
Thanks
What is reference id? Can you put an example of the kind of conversion you are looking for?
let's say if the uniprot id for a protein is P62195 then the reference id (RefSeq Protein) is NP_001186092.1 for this particular protein so I have to do this conversion for so numerous number of proteins. One way of doing is using the ID mapping tool of uniprot server manually but i want to automate this using a script. Please visit this link http://www.uniprot.org/uploadlists/ and put the values of all the options as i have given below: identifier= P62195 from=UniProtKB AC/ID to= RefSeq Protein then click on go
now the problem is i want to do this process using a script so how can i do it ?