I'm querying the Ensembl Homology REST database (https://rest.ensembl.org/homology/id/) with a list of genes to get their homologs, but my list of IDs is pretty long, so this takes quite a while. I'm doing the requests in Python, which is the only language I work in... alas, I know there is a Perl API, but I have no idea how to use it.
Is there a way to download all entries in the Ensembl Homology REST database and then query them locally?
Here is my code as it currently stands, which requests one entry at a time, since I believe REST can't accept requests for multiple genes (but please correct me if I'm wrong). Hopefully there is a way to do this locally...
import requests
import pandas as pd
import time
gene_id_list = data_df[ensembl_gene_col].tolist() #data_df is generated elsewhere and contains a list of genes and data
gene_id_list = list(dict.fromkeys(gene_id_list)) #removes duplicates
rest_server = "https://rest.ensembl.org"
rest_ext = "/homology/id/"
rest_suffix = "?"
gene_homologies_dict = {}
for i, gene_id in enumerate(gene_id_list):
if gene_id != "None":
print("Retrieving homology data for", gene_id, "(" + str(i) + " of " + str(len(gene_id_list)) + ")")
query_url = rest_server + rest_ext + gene_id + rest_suffix
response = requests.get(query_url, headers = {"Content-Type" : "application/json"})
if not response.ok:
response.raise_for_status()
decoded = response.json()
decoded_data = decoded.get("data")
if len(decoded_data) == 0:
decoded_data = {}
homologies = []
elif len(decoded_data) == 1:
decoded_data = decoded_data[0]
homologies = decoded_data.get("homologies")
else:
raise Exception("For " + gene_id + " in gene_id_list, decoded_data length was " + str(len(decoded_data)) + " (expected: 1)")
print("\t... retrieved! Data length:", len(homologies))
gene_homologies_dict[gene_id] = homologies
time.sleep(0.2)
http://ftp.ensembl.org/pub/current_compara/
So for looking at homologs of human proteins, do I want the following file?
And if so, what do I do with a bigWig file? I've never worked with those before. Don't they just contain genomic data? It's protein homologs I want...
Homologies can be found in the following directory on the Ensembl FTP: http://ftp.ensembl.org/pub/current_emf/ensembl-compara/homologies/