Question

Orthology

0

Entering edit mode

3.4 years ago

PhyloW • 0

I am doing a phylogenetic analyses of two sets of proteins (A and B) that are functionally very closely related and share a large degree of sequence similarity. I have identified from various species protein sequences I want to include in the analyses (for both proteins). Each separate sequence was included based on their similarity / BLAST results to the known and characterized (functionally) proteins (A and B) in Arabidopsis. I am worried that some of the species included might however represent paralogs and not orthologs. Is there any analyses where I can "plug and play" the data that I have and see whether it comes out as orthologs (hypothetically then an orthologous group for protein A and one for protein B). I do not want to do an analyses where I search a database for orthologs, I want to ID it in the sequences I already have in my dataset (which were included obviously based on certain pre selected criteria).

Paralogy Orthology BLAST • 1.2k views

ADD COMMENT • link 3.4 years ago by PhyloW • 0

1

Entering edit mode

You can use OrthoMCL for this purpose.

ADD REPLY • link 3.4 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Thank you for the answer. I have read up along similar lines, but was not quite sure whether it was the best approach. Will give it a try though.

ADD REPLY • link 3.4 years ago by PhyloW • 0

1

Entering edit mode

I guess it would definitely help you.

Just to give a very brief introduction about how it works. It will take a set of protein sequences (let's say proteome from three different species) and perform homology-based clustering: first by running BLAST (for sequence similarity) and then clustering (using MCL program). Finally, it will predict the list of paralogous and orthologous proteins and stratify them.

ADD REPLY • link 3.4 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Note that this is a shortcut which I would be wary of using in this case. Strictly speaking, from their very definition, paralogy and orthology can only be inferred from a phylogenetic tree. I would add the sequences to be tested to the relevant multiple sequence alignment and rebuild a phylogenetic tree from it then infer the relationships.

ADD REPLY • link 3.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Perhaps I might also just mention: Not all the species / sequences we include might be from fully sequenced and annotated genomes. Is OrthoMCL not too "specialised" in that regard as it relies on these assumptions???

ADD REPLY • link 3.4 years ago by PhyloW • 0