Hi Everyone,
I'm trying to find orthologs and lineage specific paralogs between two species. I tried using the ensembl homology pipeline but both my species are not on the database. Therefore, I tried to write my own similar pipeline. So far I've accomplished the following:
- Blast all for every gene in both genomes
- Filtering of blast results based of evalue and alignment length
- Single Linkage clustering with MCL to form gene families
- For each gene family, I did a protein alignment with PRANK and built a tree with Treebest, which also takes in the species trees and tries to build a gene tree accordingly.
My question deals with parsing these gene trees. I want to use these gene trees to find paralogs and orthologs between my two species but I'm not sure how to parse all of the topologies of these trees and how to determine paralogous or orthologous relationships.
Are there any programs that can take in gene trees and output a list of paralogs and orthologs?
Thanks.
LC
Not to my knowledge. I would probably use the ETE library of python to write a parser to do the job.
I think it still could be worthwhile to contact the Ensembl helpdesk (helpdesk@ensembl.org) or the person within their Compara team who deals with the gene trees (Matthieu Muffato, muffato@ebi.ac.uk), as they would happily give you their software (for either tree building and/or homology / paralogy inference) as well as any advice.
I ended up using ETE library to write a parser for my gene trees but I've also contacted the Ensembl helpdesk. I'm still a novice at writing my own scripts so I'll do a comparison to see how my skills match up!
Thank you everyone!