Parsing Protein Trees to determine orthologs and paralogs
2
1
Entering edit mode
10.5 years ago
lchau91 ▴ 20

Hi Everyone,

I'm trying to find orthologs and lineage specific paralogs between two species. I tried using the ensembl homology pipeline but both my species are not on the database. Therefore, I tried to write my own similar pipeline. So far I've accomplished the following:

  1. Blast all for every gene in both genomes
  2. Filtering of blast results based of evalue and alignment length
  3. Single Linkage clustering with MCL to form gene families
  4. For each gene family, I did a protein alignment with PRANK and built a tree with Treebest, which also takes in the species trees and tries to build a gene tree accordingly.

My question deals with parsing these gene trees. I want to use these gene trees to find paralogs and orthologs between my two species but I'm not sure how to parse all of the topologies of these trees and how to determine paralogous or orthologous relationships.

Are there any programs that can take in gene trees and output a list of paralogs and orthologs?

Thanks.

LC

phylogeny orthologs paralogs • 4.1k views
ADD COMMENT
0
Entering edit mode

Not to my knowledge. I would probably use the ETE library of python to write a parser to do the job.

ADD REPLY
0
Entering edit mode

I think it still could be worthwhile to contact the Ensembl helpdesk (helpdesk@ensembl.org) or the person within their Compara team who deals with the gene trees (Matthieu Muffato, muffato@ebi.ac.uk), as they would happily give you their software (for either tree building and/or homology / paralogy inference) as well as any advice.

ADD REPLY
0
Entering edit mode

I ended up using ETE library to write a parser for my gene trees but I've also contacted the Ensembl helpdesk. I'm still a novice at writing my own scripts so I'll do a comparison to see how my skills match up!

Thank you everyone!

ADD REPLY
2
Entering edit mode
10.5 years ago
jhc ★ 3.0k

The ETE toolkit is indeed capable of doing that. You can use a species overlap algorithm to detect duplication and speciation events, or reconcile your gene tree with the expected species trees (this paper includes a comparison of both methods).

Briefly, you will need to load your gene tree as a PhyloTree object, and then call any of the tree.get_descendant_evol_events or tree.reconcile methods. Both methods will process your tree, label the nodes as speciation or duplication and return a list of speciation and duplication events. Then you can visualize your tree with tree.show() to see the predictions or process the list of events. If you used tree.reconcile, a reconciled-tree PhyloTree object will also be returned, including the inferred lost branches (also visible con tree.show). There are some examples in the ETE tutorial showing how to get orthology/paralogy prediction based on gene trees: http://pythonhosted.org/ete2/tutorial/tutorial_phylogeny.html#detecting-evolutionary-events

ADD COMMENT
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2494 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6