Hi there,
does anyone knows how I can parse paralogs in a newick tree, please?
Hi there,
does anyone knows how I can parse paralogs in a newick tree, please?
Using Fitch's definition, any two genes related by a duplication event are paralogs. We can classify them into within-species paralogs (two genes in the same species) and between-species paralogs (two genes in different species). You will find a longer explanation and examples at http://www.ensembl.org/info/docs/compara/homology_method.html
If you are interested in within-species paralogs, the solution is pretty straightforward. By definition, any two genes in your tree that belong to the same species are paralogs.
Between-species paralogs (paralogous genes in different species) are more complicated to infer. You need to reconcile the gene tree with a species tree first. You can use RAP (http://pbil.univ-lyon1.fr/software/RAP/RAP.htm) or ETE (http://ete.cgenomics.org/) for this. Once you have done this, you can go through all the duplication nodes and infer a paralogy relationship between any gene in one of the branches and any gene in the other branch.
Note that any phylogenetic package will build a tree with all the sequences that you have aligned. In other words, you are also relying on the sequence similarity search you (or someone else) have performed before. You may want to make sure that one of the two paralogues is not an outlier.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you please clarify your question a bit: What is your input? Can you provide an example?