From ensembl id to protein structure
1
0
Entering edit mode
5.2 years ago

I have a list of ~700 ensembl ID. I need to extract the protein structures of each ensembl id. How can I do this?

Is there any script(python, R) so that we can download the structure from website(like RCSB)?

gene • 1.7k views
ADD COMMENT
1
Entering edit mode

Hello pratik24111991!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/10412/download-protein-structure-from-gene-name

This is typically not recommended as it runs the risk of annoying people in both communities.

Also, it looks like you created 2 posts each both here and on Bioinformatics SE. That is bad etiquette on multiple levels.

ADD REPLY
1
Entering edit mode
5.2 years ago
noodle ▴ 590

The R package biomaRt may help you with connecting the ensembl ID to RefSeq or PDB ID. Below is an example linking a gene name to an ensemble ID and Refseq ID, where the gene.name is provided by you...but the values input and attributes output can easily be changed according to what you have and what you want.

emsemble.match <- getBM(attributes=c("refseq_mrna", "ensembl_gene_id","ensembl_transcript_id", "external_gene_name"), filters = "external_gene_name",values = gene.name, mart= ensembl)
ADD COMMENT
0
Entering edit mode

Thanks @joe. But I dont have gene list but I have a list of ensembl ID like

ENSG00000088305
ENSG00000119772
ENSG00000116030
ENSG00000116717
ENSG00000033327
ENSG00000182492
ENSG00000134352
ENSG00000065609
ADD REPLY
1
Entering edit mode

The point is that you can use getBM() to match the ENSG to the PDB id, and once you have the PDB id you can retrieve the structure info...here I do the first part for you...I recommend the bio3d package for part two

Edit: no loop

library(biomaRt)

ENSG.list <- c("ENSG00000088305", "ENSG00000119772", "ENSG00000116030", "ENSG00000116717",
               "ENSG00000033327", "ENSG00000182492", "ENSG00000134352", "ENSG00000065609")

#this will give a list of all the attributes you can retrieve
listAttributes(ensembl)

ensembl_matches <- getBM(attributes = c("pdb",
                                        "refseq_mrna", 
                                        "external_gene_name",
                                        "ensembl_transcript_id", 
                                        "ensembl_gene_id"),
                        filters = "ensembl_gene_id",
                        values = ENSG.list, 
                        mart = ensembl)
ADD REPLY
1
Entering edit mode

A nice answer, but I just wanted to point out that biomaRt will be really slow if you do this in a loop for a large number of Ensembl IDs. It will work much better if you pass the vector of IDs to the values argument e.g.

ensembl <- useEnsembl('ensembl', dataset = 'hsapiens_gene_ensembl')

ensembl_matches <- getBM(attributes = c("pdb",
                                        "refseq_mrna", 
                                        "external_gene_name",
                                        "ensembl_transcript_id", 
                                        "ensembl_gene_id"),
                        filters = "ensembl_gene_id",
                        values = ENSG.list, 
                        mart = ensembl)
ADD REPLY
0
Entering edit mode

Ah, right. Good catch!

ADD REPLY

Login before adding your answer.

Traffic: 1676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6