I have a list of ~700 ensembl ID. I need to extract the protein structures of each ensembl id. How can I do this?
Is there any script(python, R) so that we can download the structure from website(like RCSB)?
I have a list of ~700 ensembl ID. I need to extract the protein structures of each ensembl id. How can I do this?
Is there any script(python, R) so that we can download the structure from website(like RCSB)?
The R package biomaRt
may help you with connecting the ensembl ID to RefSeq or PDB ID. Below is an example linking a gene name to an ensemble ID and Refseq ID, where the gene.name
is provided by you...but the values input and attributes output can easily be changed according to what you have and what you want.
emsemble.match <- getBM(attributes=c("refseq_mrna", "ensembl_gene_id","ensembl_transcript_id", "external_gene_name"), filters = "external_gene_name",values = gene.name, mart= ensembl)
The point is that you can use getBM()
to match the ENSG to the PDB id, and once you have the PDB id you can retrieve the structure info...here I do the first part for you...I recommend the bio3d
package for part two
Edit: no loop
library(biomaRt)
ENSG.list <- c("ENSG00000088305", "ENSG00000119772", "ENSG00000116030", "ENSG00000116717",
"ENSG00000033327", "ENSG00000182492", "ENSG00000134352", "ENSG00000065609")
#this will give a list of all the attributes you can retrieve
listAttributes(ensembl)
ensembl_matches <- getBM(attributes = c("pdb",
"refseq_mrna",
"external_gene_name",
"ensembl_transcript_id",
"ensembl_gene_id"),
filters = "ensembl_gene_id",
values = ENSG.list,
mart = ensembl)
A nice answer, but I just wanted to point out that biomaRt will be really slow if you do this in a loop for a large number of Ensembl IDs. It will work much better if you pass the vector of IDs to the values
argument e.g.
ensembl <- useEnsembl('ensembl', dataset = 'hsapiens_gene_ensembl')
ensembl_matches <- getBM(attributes = c("pdb",
"refseq_mrna",
"external_gene_name",
"ensembl_transcript_id",
"ensembl_gene_id"),
filters = "ensembl_gene_id",
values = ENSG.list,
mart = ensembl)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello pratik24111991!
It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/10412/download-protein-structure-from-gene-name
This is typically not recommended as it runs the risk of annoying people in both communities.
Also, it looks like you created 2 posts each both here and on Bioinformatics SE. That is bad etiquette on multiple levels.