UniProt ID, genes
2
0
Entering edit mode
2.0 years ago
Rob ▴ 170

Hi friends,

How can I download the UniProt ID for protein coding genes? I have about 20000 genes and it is hard to just search them one by one. IS there any code for it?

I have a code to download gene names through the DESeq2 workflow, however it does not give UniProt ID.

attributeNames <-c("ensembl_gene_id","external_gene_name","hgnc_symbol", "chromosome_name","description", "entrezgene_id")

filterValues <- rownames(result_DESeq2)

Annotations <- getBM(attributes=attributeNames, filters =
                       "ensembl_gene_id",values = filterValues,
                     mart=useMart(biomart="ensembl",
                                  dataset="hsapiens_gene_ensembl"))

resAnnotated <- as.data.frame(res) %>% 
  rownames_to_column("ensembl_gene_id") %>% left_join(Annotations,
                                                      "ensembl_gene_id") %>% dplyr::rename(logFC=log2FoldChange, FDR=padj)

Thanks

ID UniProt RNASeq gene • 2.2k views
ADD COMMENT
3
Entering edit mode

You can download UniProt ID's from their site. Customize columns you need or additionally filter using the options in left column.

ADD REPLY
0
Entering edit mode

Thank you GenoMax

ADD REPLY
2
Entering edit mode

The workflow is using the biomaRt library (as @Hamid points out below). If you create the mart, you can query it for the attributes in can return using listAttributes(mart), and you'll see the roughly 3000 or so things you can get back, including UniProt IDs.

ADD REPLY
0
Entering edit mode

Thanks seidel

ADD REPLY
4
Entering edit mode
2.0 years ago

Give a try to the Uniprot convertion database

ADD COMMENT
2
Entering edit mode

Indeed, the UniProt IDmapping/batch retrieve service at https://www.uniprot.org/id-mapping has an option to map from Gene names (idealy restricted to an organism) to UniProtKB. Several download formats are available.

ADD REPLY
0
Entering edit mode

in addition, you can get the sequences, GO and KEGG terms, and a long set of other features

ADD REPLY
3
Entering edit mode
2.0 years ago

I assume you have a list of Ensembl gene ids (let's call it "ens_gene_ids"), the following should give you the UniProt ids for Ensembl ids that code for proteins and have an entry in UniProt:

library(biomaRt)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- getBM(filters= "ensembl_gene_id", 
               attributes= c("ensembl_gene_id","hgnc_symbol", "uniprot_gn_id"),
               values= ens_gene_ids, 
                 mart= mart)
ADD COMMENT
0
Entering edit mode

Thank you Hamid it was helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6