Question

Ensembl Homology REST requests - possible to download whole database and query locally?

0

Entering edit mode

2.5 years ago

ngarber ▴ 60

I'm querying the Ensembl Homology REST database (https://rest.ensembl.org/homology/id/) with a list of genes to get their homologs, but my list of IDs is pretty long, so this takes quite a while. I'm doing the requests in Python, which is the only language I work in... alas, I know there is a Perl API, but I have no idea how to use it.

Is there a way to download all entries in the Ensembl Homology REST database and then query them locally?

Here is my code as it currently stands, which requests one entry at a time, since I believe REST can't accept requests for multiple genes (but please correct me if I'm wrong). Hopefully there is a way to do this locally...

import requests
import pandas as pd
import time

gene_id_list = data_df[ensembl_gene_col].tolist() #data_df is generated elsewhere and contains a list of genes and data
gene_id_list = list(dict.fromkeys(gene_id_list)) #removes duplicates

rest_server = "https://rest.ensembl.org"
rest_ext = "/homology/id/"
rest_suffix = "?"

gene_homologies_dict = {}
for i, gene_id in enumerate(gene_id_list):
    if gene_id != "None": 
        print("Retrieving homology data for", gene_id, "(" + str(i) + " of " + str(len(gene_id_list)) + ")")
        query_url = rest_server + rest_ext + gene_id + rest_suffix
        response = requests.get(query_url, headers = {"Content-Type" : "application/json"})
        if not response.ok: 
            response.raise_for_status()
        decoded = response.json()

        decoded_data = decoded.get("data")
        if len(decoded_data) == 0: 
            decoded_data = {}
            homologies = []
        elif len(decoded_data) == 1: 
            decoded_data = decoded_data[0]
            homologies = decoded_data.get("homologies")
        else: 
            raise Exception("For " + gene_id + " in gene_id_list, decoded_data length was " + str(len(decoded_data)) + " (expected: 1)")

        print("\t... retrieved! Data length:", len(homologies))

        gene_homologies_dict[gene_id] = homologies
        time.sleep(0.2)

python REST homology biomart ensembl • 1.0k views

ADD COMMENT • link updated 2.5 years ago by Ben Moore ★ 2.4k • written 2.5 years ago by ngarber ▴ 60

1

Entering edit mode

possible to download whole database and query locally?

http://ftp.ensembl.org/pub/current_compara/

ADD REPLY • link 2.5 years ago by Pierre Lindenbaum 165k

0

Entering edit mode

So for looking at homologs of human proteins, do I want the following file?

http://ftp.ensembl.org/pub/current_compara/conservation_scores/91_mammals.gerp_conservation_score/gerp_conservation_scores.homo_sapiens.GRCh38.bw

And if so, what do I do with a bigWig file? I've never worked with those before. Don't they just contain genomic data? It's protein homologs I want...

ADD REPLY • link 2.5 years ago by ngarber ▴ 60

0

Entering edit mode

Homologies can be found in the following directory on the Ensembl FTP: http://ftp.ensembl.org/pub/current_emf/ensembl-compara/homologies/

ADD REPLY • link 2.5 years ago by Ben Moore ★ 2.4k