Question

Obtaining every"Protein ID" from a genome in OMA

1

Entering edit mode

2.1 years ago

christyh ▴ 10

Hi all,

I am trying to obtain Entrez ID's for orthologous genes between two organisms using OMA. To do that, my methods are as follows:

OMA>Tools>Genome Pair Orthology
Select Organism 1, Select Organism 2, Select "Entrez Gene IDs" for Preferred ID
Download provided file (File A) as a .tsv file. The file's headers are: "Organism 1", "Organism 2", "Orthology", "OMA Group". All of Organism 1's IDs are OMA IDs (I think, beginning with the unique identifier of the organism and 5 numbers), and Organism 2's IDs are Entrez Gene ID's (I think).
Cross reference Organism 1's IDs to UniProtKB Gene IDs (OMA>Search:Taxon ID:"Organism 1's unique identifier">List Genes)
Download the list of genes with OMA and UniProtKB IDs (File B)
Merge files A and B in R by OMA IDs (File C). Headers are "Organism 1 (OMA IDs)", "Organism 1 (UniProtKB IDs)", "Organism 2 (Entrez IDs)", "Orthology", "OMA Group"
Merge file C and another file D (with extraneous information about Organism 1 irrelevant to this post) by UniProtKB IDs (File E). Headers are "Organism 1 (OMA IDs)", "Organism 1 (UniProtKB IDs)", "Organism 1 (extraneous info...)", "Organism 2 (Entrez IDs)", "Orthology", "OMA Group"

My problem is in step 5. The organism of interest has over 4000 genes, but the list that I am downloading from OMA only has 100 genes. Is there a way to download the entire 4000+ gene list?

Thank you, Christy

OMA orthologs • 1000 views

ADD COMMENT • link updated 2.1 years ago by Adrian Altenhoff ★ 1.1k • written 2.1 years ago by christyh ▴ 10

score 2 · Answer 1 · 2023-02-03

Hello,

It seems that the current download is limited to 100 genes on the genomes pages, we will increase this limit to the extent of each genome size.

In the meanwhile, if you need this information you can use the API: https://omabrowser.org/api/docs

For your specific case: https://omabrowser.org/api/genome/HUMAN/proteins/ From there you can download the JSON file using the 'GET' dropdown and select 'JSON'

I hope this helps

Clement.

score 1 · Answer 2 · 2023-02-06

Dear Christy,

thanks for your interest in the OMA browser. indeed, there seems to be an issue with listing all the genes of a species. We are working on a fix for this and hope to have it ready soon.

However, I would like to suggest an alternative approach to you that can use more easily. You correctly realized that for Organism1 we were returning OMA_IDs instead of the requested EntrezGene IDs. We use OMA_IDs as fallback in case we have not stored a cross-reference of the requested type in our database.

The current functionality to retrieve orthologs between genome pairs will soon be updated - we would like to improve it's functionality and allow to return more specific information. But the current version has a feature that allows you to obtain different type of cross-references for the two species. So you could get directly the UniProtKB/TrEMBL accessions for species1 and EntrezGeneIDs for species2. For this, you need to manually form the following URL:

https://omabrowser.org/cgi-bin/gateway.pl?f=PairwiseOrthologs&p1=<species1>&p2=<species2>&p3=UniProt&p4=EntrezGene

and replace <species1> and <species2> with the OMA species code.

Hope this helps as a temporary fix.

Best wishes Adrian