Using Ensembl API to get a Gene ID from a protein ID
4
1
Entering edit mode
9.3 years ago
Joseph Hughes ★ 3.0k

I am using a Homology adaptor:

my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'Homology');
my $homologies = $homology_adaptor->fetch_all_by_Member($gene_member);

which in when I loop through:

foreach my $homology (@{$homologies}) {
   my $member=${$homology->get_all_Members}[1];​

Does not always return a member description:

$member->description​

But does return an protein ID. I am wondering what is the best way to get the gene id from the protein id using the Ensembl API.

ensembl API • 4.0k views
ADD COMMENT
0
Entering edit mode

I was trying to do something similar, convert GeneID to Gene Symbol. I was doing this for a large amount of genes and found it to be quite slow using the api. I had much more success downloading a TSV file from the UCSC Genome Browser, provided what your looking for is in there.

If you select the selected fields from primary and related tables option in the output format, you should be able to get the information you want in a text file, which you can either read into memory or store in mysql or something.

Hope this helps!

-Kyle

ADD REPLY
2
Entering edit mode
2.5 years ago

This is one line of code with gget info which also returns the parent gene:

pip install gget, then simply:

# Command-line
gget info ENSP00000354687
# Python
import gget
gget.info(["ENSP00000354687"])
ADD COMMENT
1
Entering edit mode
9.3 years ago
Emily 24k

You could use the protein ID to get a translation object, then use $translation->transcript->gene->stable_id $translation->transcript->getGene->stable_id to get the gene stable ID.

ADD COMMENT
0
Entering edit mode

Hi Emily, that would actually be good if I could get it to work. I am running API version 81 and get the following error:

Can't locate object method "gene" via package "Bio::EnsEMBL::Transcript"
ADD REPLY
1
Entering edit mode

Sorry, it's Get_gene, not gene. This is what happens when I write stuff from memory rather than check it.

ADD REPLY
0
Entering edit mode

For the translation adaptor, is there a way I can specify the genome. In the examples I have seen it is usually the common name, e.g.:

my $translation_adaptor = $reg->get_adaptor('human', 'Core', 'Translation');

But it would be easier if I could use the genome name (e.g. homo_sapiens). Or is there an easy way to get from common name to species name?

ADD REPLY
0
Entering edit mode

Either work.

ADD REPLY
0
Entering edit mode

Hi Emily,

I'm still getting:

Can't locate object method "Get_gene" via package "Bio::EnsEMBL::Transcript"
ADD REPLY
0
Entering edit mode

The capitalisation is wrong. It should be get_Gene.

Works a treat now.

Thanks

ADD REPLY
1
Entering edit mode
9.3 years ago
Joseph Hughes ★ 3.0k

Following Emily's suggestion I used the following to get the gene id:

my @members = @{$homology->get_all_Members()};
foreach my $this_member (@members) {
   my $translation_adaptor = $reg->get_adaptor($this_member->genome_db->name,'Core','Translation');
   my $translation = $translation_adaptor->fetch_by_stable_id($this_member->stable_id);
   print $translation->transcript->get_Gene->stable_id,"\n";
}
ADD COMMENT
0
Entering edit mode
9.3 years ago
Kamil ★ 2.3k

I'm not sure if I understand your question, but I would skip BioPerl and use Ensembl BioMart.

(Also see the BioMaRt Bioconductor package to perform queries programmatically with R.)

Ensembl Gene ID      Ensembl Protein ID
ENSG00000198888      ENSP00000354687
ENSG00000198763      ENSP00000355046
ADD COMMENT

Login before adding your answer.

Traffic: 2642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6