Question

OMA use case

0

Entering edit mode

7 weeks ago

nmalexan • 0

Hello,

I have been using oma standalone to assign orthologs between vertebrate genomes like so:

I essentially have one mammal and three birds and am trying to find 1:1 orthologs as TOGA was computing too many missing orthologs due to the intronic and intergenic divergence.

#using OutgroupSpecies := ['mus_musculus']; in parameters.drw

./bin/oma -c -n 100
./bin/oma -n 100 -s -W 7000
./bin/oma -n 100

I have four questions I can't find answers to online:

Is it a problem to include mitochondrial proteins or proteins annotated from NUMTs in the genome.
Is it possible to add my proteins from a handful of species to a precomputed database and look at orthologs?
Does a precomputed database improve ortholog detection.
I also have various fasta formats for proteins from these different genomes:

For example.

Mouse
>ENSMUSP00000070648.4|ENSMUST00000070533.4|ENSMUSG00000051951.5|OTTMUSG00000026353.2|OTTMUST00000065166.1|Xkr4-001|Xkr4|647

Chicken
>NP_001001127.1 endothelin receptor type B precursor [Gallus gallus]

oma omastandalone • 885 views

ADD COMMENT • link updated 6 weeks ago by GenoMax 151k • written 7 weeks ago by nmalexan • 0

score 0 · Answer 1 · 2025-03-19

0

Entering edit mode

7 weeks ago

Adrian Altenhoff ★ 1.1k

Hi @nmalexan,

thanks for your interest in OMA standalone. regarding your questions, here are my thoughts on that.

It is generally not a problem to use mitochondrial proteins, but OMA does not handle them in a specific way.
I don't understand your question exactly. Are you asking if you can place your proteins into existing gene families and extract the orthologous/paralogous relations from them? or rather if you can run OMA standalone with non-complete proteomes? The later would be fine (you can export the genomes of interest from the https://omabrowser.org/export and run OMA standalone). If you want to place your proteins inside existing Hierarchical Orthologous Groups, you could use the Fastmapping tool on the OMA Browser (https://omabrowser.org/oma/fastmapping/) where you can upload your sequences and they will be mapped into existing HOGs, or simply to the closest sequence in the database.
If you mean using the export functionality of the OMA Browser, this will mainly speed up computations, as the expensive All-vs-All computations among the exported genomes is already done. Also, adding more species usually improves orthology detection as they bring more resolution in the family.
Having different ID formats is ok for OMA standalone. The output files will simply use the fasta header line as IDs.

A small remark about how you started the OMA standalone jobs:

./bin/oma -c
./bin/oma -n 100 -s -W 7000
./bin/oma

would be better. Only the All-vs-All part should be run in parallel. If you don't use a scheduler, but run on a single machine, you can further skip the -W 7000 argument, as you don't want your jobs to stop after ~2hours.

Let us know if you have more specific questions. Cheers Adrian

ADD COMMENT • link 7 weeks ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Thank you!!

One more question:

I'm thinking I will keep only the longest transcript for each gene based on the GTF and then extract the protein sequences. I assume this preprocessing is required? I have the following formats, and the mouse faa seems to have proteins for multiple transcripts of the same gene.

For example.

Mouse

ENSMUSP00000070648.4|ENSMUST00000070533.4|ENSMUSG00000051951.5|OTTMUSG00000026353.2|OTTMUST00000065166.1|Xkr4-001|Xkr4|647

Chicken

NP_001001127.1 endothelin receptor type B precursor [Gallus gallus]

ADD REPLY • link 7 weeks ago by nmalexan • 0

0

Entering edit mode

yes, you need to provide protein sequences, so you will need to translate your transcripts before running OMA standalone.

OMA standalone is able to work with several splicing variants per gene. But you need to provide an additional <genome>.splice file (next to the <genome>.fa file), which lists all the splicing variants per gene on one line. OMA Standalone will then select the evolutionary best conserved variant. However, this requires quite a bit of additional computing time.

ADD REPLY • link 7 weeks ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Thank you!! That answers my question.

ADD REPLY • link 7 weeks ago by nmalexan • 0

0

Entering edit mode

What ive done is, for each gene ID, I list all the unique transcript IDs like so

ENSMUSG00000089707.1    ENSMUST00000160117.1
ENSMUSG00000073747.2    ENSMUST00000097849.2
ENSMUSG00000035000.8    ENSMUST00000047812.7,ENSMUST00000156871.1,ENSMUST00000129738.1,ENSMUST00000150979.1,ENSMUST00000149306.1,ENSMUST00000151818.1
ENSMUSG00000104919.1    ENSMUST00000197640.1
ENSMUSG00000088677.1    ENSMUST00000158052.1
ENSMUSG00000103889.1    ENSMUST00000195543.1
ENSMUSG00000081037.3    ENSMUST00000117553.1
ENSMUSG00000075164.1    ENSMUST00000099867.1
ENSMUSG00000030510.11   ENSMUST00000066475.10,ENSMUST00000208521.1
ENSMUSG00000103601.1    ENSMUST00000195341.1

Does this look like the format you're looking for?

ADD REPLY • link 7 weeks ago by nmalexan • 0

0

Entering edit mode

slightly different format is required. simply add all the fasta headers on the same line, separated by a ; character:

ENSMUSG00000035000.8|ENSMUST00000047812.7;ENSMUSG00000035000.8|ENSMUST00000156871.1;ENSMUSG00000035000.8|ENSMUST00000129738.1
ENSMUSG00000089707.1|ENSMUST00000160117.1
...

(assuming your fasta header looks

>ENSMUSG00000035000.8|ENSMUST00000047812.7
MGAHASVTDTNILSGLESNAT
>ENSMUSG00000035000.8|ENSMUST00000156871.1
QREAQEKPPDDSDLRSVRTNENK

ADD REPLY • link 7 weeks ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Re: I don't understand your question exactly. Are you asking if you can place your proteins into existing gene families and extract the orthologous/paralogous relations from them? or rather if you can run OMA standalone with non-complete proteomes? The later would be fine (you can export the genomes of interest from the https://omabrowser.org/export and run OMA standalone). If you want to place your proteins inside existing Hierarchical Orthologous Groups, you could use the Fastmapping tool on the OMA Browser (https://omabrowser.org/oma/fastmapping/) where you can upload your sequences and they will be mapped into existing HOGs, or simply to the closest sequence in the database.

I was referring to this line from the documentation:

"Additionally it is possible to export the precomputed all-against-all for any of the >2000 genomes currently in the oma database."

Thank you so much, this answers my questions

ADD REPLY • link 7 weeks ago by nmalexan • 0

0

Entering edit mode

Adrian Altenhoff After Ive exported the tar file:

Do I set the outgroup based on my genomes only, or including the imported data?

When I untar this file it's a new OMA directory with the executables, Cache directory and all. Can I just replace the Cache with my original directory or do I need to run my jobs in this new one.

ADD REPLY • link 6 weeks ago by nmalexan • 0

1

Entering edit mode

Your own genomes and the comparisons among them you can add to the Cache/DB/ and Cache/AllAll folders, after adding the fasta files to DB/ as well. The command bin/oma will then compute the alignments of all the genomes you exported and your set of genomes.

The outgroup you need to specify based on all the genomes in the analysis.

Cheers

ADD REPLY • link 6 weeks ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Thanks a million!!

ADD REPLY • link 6 weeks ago by nmalexan • 0

0

Entering edit mode

nmalexan : Please consider accepting the original answer (green check mark) to provide closure to this thread.

ADD REPLY • link 6 weeks ago by GenoMax 151k