I am working with STRING protein interaction data where each nodes are ensembl protein ID as following
ENSP00000272298
ENSP00000253401
ENSP00000401445
ENSP00000418915
ENSP00000327801
ENSP00000466298
ENSP00000232564
ENSP00000393379
ENSP00000371253
ENSP00000373713
I wonder how I can convert them into gene symbol
in python ? if you don't recommend, how I can convert my gene symbols
into ensembl protein IDs ?
I know I can use the mygene
in python
import mygene
mg = mygene.MyGeneInfo()
and convert gene symbols into ensembl genes, but I wonder how can tweak this to have protein id instead
result = mg.query("APOE", scopes='symbol', fields=['ensembl'], species="human")
for hit in result["hits"]:
if "ensembl" in hit and "gene" in hit["ensembl"]:
print(hit["ensembl"]["gene"])
I have tried biomart through python
from pybiomart import Server
server = Server(host='http://www.ensembl.org')
dataset = (server.marts['ENSEMBL_MART_ENSEMBL']
.datasets['hsapiens_gene_ensembl'])
dataset.query(attributes=['ensembl_gene_id', 'ensembl_transcript_id', 'ensembl_peptide_id'],
filters={'ensembl_transcript_id': ['ENSP00000371253']})
how ever the error I get is
BiomartException: Unknown filter ensembl_transcript_id, check dataset filters for a list of valid filters.
and there is no other relevant as far as I see. I also couldn't find examples in python, and this is the most relevant one in R
Have you tried other utilities, such as eutils or biomart?
just added to the question !
The
ensembl_peptide_id
that you have used is a valid filter, so is theensembl_transcript_id
(I checked using biomaRt). Unless the python API is inferior to the R api, I think things should work. You can check using theshow_filters()
method.It looks like it is the python API problem !! thanks for the confirmation. I do get the result in R but that filter does not exist in the python API.