Access BioMart Homologs from Python?
1
1
Entering edit mode
2.2 years ago
ngarber ▴ 60

From within Python, I want to be able to query BioMart to return a list containing information about genes and their homologs:

  1. Source species - Stable Protein ID
  2. Source species - Gene name
  3. Source species - Protein sequence
  4. Target species - Stable Protein ID of homolog
  5. Target species - Gene name of homolog
  6. Target species - Protein sequence of homolog

For example, say I was to input :

dataset = "Ensemble Genes 107"
target_species_dataset = "Elephant genes (Loxafr3.0)"
homolog_query = "Human"

How do I feed that into BioMart so that it spits out the six parameters I listed earlier?

Thanks so much in advance if anyone can help!

sequences Python biomaRt homology Ensembl • 4.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

On top of Arup Ghosh's answer, you can also consider using the files available on the Ensembl FTP site:

Say we want all orthologous gene pairs between Human and Cow from the default Vertebrate ncRNA-trees. We could download the entire set of default Vertebrate ncRNA-trees homologies in one TSV file. For Ensembl 107 this would be located at:

http://ftp.ensembl.org/pub/release-107/tsv/ensembl-compara/homologies/Compara.107.ncrna_default.homologies.tsv.gz

This is a pretty massive file — 3.2 GB — but if we filter it to keep only the rows in which the 'homology_type' is an orthology (i.e. 'ortholog_one2one', 'ortholog_one2many' or 'ortholog_many2many'), while 'species' and 'homology_species' are 'homo_sapiens' and 'bos_taurus' (or vice versa), we will get a reasonably sized file of Human-Cow orthologues.

You can also use the language agnostic Ensembl REST API to retrieve orthologue data programmatically using the homology endpoints. E.g: http://rest.ensembl.org/documentation/info/homology_ensemblgene

ADD REPLY
0
Entering edit mode

Both tools are outdated and bad documented with examples on input and/or how to process output

ADD REPLY
2
Entering edit mode
2.2 years ago

You can also pass gene symbols to gget search with your target species and then use gget info on the returned Ensembl IDs to get the other information: https://github.com/pachterlab/gget

ADD COMMENT

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6