Question

How to make ortholog table with Reciprocal Best Hit blast (Python?) ?

1

Entering edit mode

8.8 years ago

jolespin ▴ 150

I have 2 fasta files containing all of the proteins from 2 distance organisms (A Spirochaetes and a Firmicutes). I want to map the genes from the Firmicutes to it's best hit in the Spirochaetes.

What is the best way to do this and the most accepted way?

I'm very familiar with Python and my first thought was to use skbio and do a pairwise alignment for all of the proteins.(http://scikitbio.org/docs/0.4.1/generated/skbio.alignment.StripedSmithWaterman.html). However, since it's local alignment then it may give me a high score for a single domain which is not what I want.

I then thought about using BioPython and the blast wrapper but I don't know how to specify the query database and a length threshold (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc87).

blast genome protein gene ortholog • 5.9k views

ADD COMMENT • link updated 5.5 years ago by Biostar 20 • written 8.8 years ago by jolespin ▴ 150

score 4 · Answer 1 · 2016-10-14

4

Entering edit mode

8.8 years ago

Christophe Dessimoz ▴ 740

It may be more straightforward to infer your orthologs using our software package OMA standalone (http://omabrowser.org), which takes your two fasta files as input and produces all pairwise relationships.

If you want to combine your two genomes with publicly available data, you can also export precomputed OMA genomes at http://omabrowser.org/export

ADD COMMENT • link 8.0 years ago by Christophe Dessimoz ▴ 740

0

Entering edit mode

I quickly checked onto OMA documentation, it looks promising. I'm wondering, have you compared your tool with the standard reciprocal blast approach? Why your tool is better? Besides, as i understand output of the tool is a table of pairwaise relations. Is there an option to directly extract all orthologs pairs from initial files (e.g. in FASTA format)?

ADD REPLY • link 8.1 years ago by Denis ▴ 320

1

Entering edit mode

Conceptually, the main limitation of reciprocal best hit is that it cannot cope with one-to-many or many-to-many orthology, which exist whenever a gene has duplicated after the speciation of interest. More discussion here:

https://academic.oup.com/gbe/article/5/10/1800/520875/Bidirectional-Best-Hits-Miss-Many-Orthologs-in

Now, if you are interested in the relative performance of different orthology inference methods, including OMA and reciprocal blast hit, please refer to this paper:

http://www.nature.com/nmeth/journal/v13/n5/full/nmeth.3830.html

ADD REPLY • link 8.0 years ago by Christophe Dessimoz ▴ 740