Annotation package approach

Question

Id conversion using uniprot script on mac terminal

1

Entering edit mode

7.9 years ago

deepakshar211 ▴ 10

How can I get reference id of a protein from uniprot id using uniprot tool on mac terminal?

sequence software error R • 2.2k views

ADD COMMENT • link updated 7.9 years ago by ddiez ★ 2.0k • written 7.9 years ago by deepakshar211 ▴ 10

0

Entering edit mode

A: get the uniprot accession from ensembl protein ID

ADD REPLY • link 7.9 years ago by shenwei356 8.7k

0

Entering edit mode

Hi Wei, I can use this server but i have to do this for many number of uniprot ids so i need a script in any programming language so i could convert them all at once.If you could help me out in this then that would be of great help.

Thanks

ADD REPLY • link 7.9 years ago by deepakshar211 ▴ 10

0

Entering edit mode

What is reference id? Can you put an example of the kind of conversion you are looking for?

ADD REPLY • link 7.9 years ago by ddiez ★ 2.0k

0

Entering edit mode

let's say if the uniprot id for a protein is P62195 then the reference id (RefSeq Protein) is NP_001186092.1 for this particular protein so I have to do this conversion for so numerous number of proteins. One way of doing is using the ID mapping tool of uniprot server manually but i want to automate this using a script. Please visit this link http://www.uniprot.org/uploadlists/ and put the values of all the options as i have given below: identifier= P62195 from=UniProtKB AC/ID to= RefSeq Protein then click on go

now the problem is i want to do this process using a script so how can i do it ?

ADD REPLY • link 7.9 years ago by deepakshar211 ▴ 10

score 0 · Answer 1 · 2017-02-08

There are several ways to do this with R and some Bioconductor packages. Here I put three different methods, using annotation packages, using biomaRt package and using Uniprot.ws package. In each case you need to specify the target species, which makes this not so convenient if you have multi-species mapping in mind.

Annotation package approach

Here I use the annotation package org.Hs.eg.db and the interface provided by AnnotationDbi. This method returns all associated Refseq ids together, including peptide and nucleotide ids, which might be what you want or not. Advantage: do not need access to online server.

id <- "P62195" # vector of ids to map.

library(org.Hs.eg.db)
# columns(org.Hs.eg.db) # check other columns to be returned.
# keytypes(org.Hs.eg.db) # check other keys for query.

select(org.Hs.eg.db, id, "REFSEQ", "UNIPROT")
# 'select()' returned 1:many mapping between keys and columns
# UNIPROT       REFSEQ
# 1  P62195 NM_001199163
# 2  P62195    NM_002805
# 3  P62195 NP_001186092
# 4  P62195    NP_002796
# 5  P62195    XR_934508

biomaRt package approach

The biomaRt package connects to the biomart resource at Ensembl and makes queries based on different filters. You can get separately nucleotide/peptide ids, if desired. Check also different attributes that can be returned. Advantage: access the latest information in Ensembl.

library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
# listAttributes(mart) # check other mappings.
# listFilters(mart) # check other filters.

getBM(
  attributes = c("refseq_peptide", "external_gene_name", "description"),
  filters = "uniprot_swissprot",
  values = id,
  mart = mart
)
# refseq_peptide external_gene_name
# 1      NP_002796              PSMC5
# 2                             PSMC5
# 3   NP_001186092              PSMC5

getBM(
  attributes = c("refseq_mrna", "external_gene_name"),
  filters = "uniprot_swissprot",
  values = id,
  mart = mart
)
# refseq_mrna external_gene_name
# 1    NM_002805              PSMC5
# 2                           PSMC5
# 3 NM_001199163              PSMC5

UniProt.ws package approach

UniProt.ws works like something between the other too in the sense that you have to establish a connection first then you make queries with the select() interface. Advantage: seems the most natural way to make queries about Uniprot ids.

libraryUniProt.ws)
up <- UniProt.ws(taxId=9606) # taxid for homo sapiens.
# columns(up) # check other columns to be returned.
# keytypes(up) # check other keys for query.

select(up, id, columns = "REFSEQ_PROTEIN", keytype = "UNIPROTKB")
# Getting mapping data for P62195 ... and P_REFSEQ_AC
# 'select()' returned 1:many mapping between keys and columns
# UNIPROTKB REFSEQ_PROTEIN
# 1    P62195 NP_001186092.1
# 2    P62195    NP_002796.4

select(up, id, columns = c("REFSEQ_NUCLEOTIDE"), keytype = "UNIPROTKB")
# Getting mapping data for P62195 ... and REFSEQ_NT_ID
# 'select()' returned 1:many mapping between keys and columns
# UNIPROTKB REFSEQ_NUCLEOTIDE
# 1    P62195    NM_001199163.1
# 2    P62195       NM_002805.5