Question

Access BioMart Homologs from Python?

1

Entering edit mode

2.6 years ago

ngarber ▴ 60

From within Python, I want to be able to query BioMart to return a list containing information about genes and their homologs:

Source species - Stable Protein ID
Source species - Gene name
Source species - Protein sequence
Target species - Stable Protein ID of homolog
Target species - Gene name of homolog
Target species - Protein sequence of homolog

For example, say I was to input :

dataset = "Ensemble Genes 107"
target_species_dataset = "Elephant genes (Loxafr3.0)"
homolog_query = "Human"

How do I feed that into BioMart so that it spits out the six parameters I listed earlier?

Thanks so much in advance if anyone can help!

sequences Python biomaRt homology Ensembl • 4.8k views

ADD COMMENT • link updated 17 months ago by Rockbar • 0 • written 2.6 years ago by ngarber ▴ 60

0

Entering edit mode

Python interface for Biomart API.

https://pypi.org/project/biomart/

https://pypi.org/project/pybiomart/

ADD REPLY • link 2.6 years ago by Arup Ghosh 3.3k

0

Entering edit mode

On top of Arup Ghosh's answer, you can also consider using the files available on the Ensembl FTP site:

Say we want all orthologous gene pairs between Human and Cow from the default Vertebrate ncRNA-trees. We could download the entire set of default Vertebrate ncRNA-trees homologies in one TSV file. For Ensembl 107 this would be located at:

http://ftp.ensembl.org/pub/release-107/tsv/ensembl-compara/homologies/Compara.107.ncrna_default.homologies.tsv.gz

This is a pretty massive file — 3.2 GB — but if we filter it to keep only the rows in which the 'homology_type' is an orthology (i.e. 'ortholog_one2one', 'ortholog_one2many' or 'ortholog_many2many'), while 'species' and 'homology_species' are 'homo_sapiens' and 'bos_taurus' (or vice versa), we will get a reasonably sized file of Human-Cow orthologues.

You can also use the language agnostic Ensembl REST API to retrieve orthologue data programmatically using the homology endpoints. E.g: http://rest.ensembl.org/documentation/info/homology_ensemblgene

ADD REPLY • link 2.6 years ago by Ben Moore ★ 2.4k

0

Entering edit mode

Both tools are outdated and bad documented with examples on input and/or how to process output

ADD REPLY • link 17 months ago by Rockbar • 0

score 2 · Answer 1 · 2022-09-22

2

Entering edit mode

2.5 years ago

Laura Luebbert ▴ 450

You can also pass gene symbols to gget search with your target species and then use gget info on the returned Ensembl IDs to get the other information: https://github.com/pachterlab/gget

ADD COMMENT • link 2.5 years ago by Laura Luebbert ▴ 450