Entering edit mode
6.0 years ago
Moses
▴
150
Hi,
I'm using FastTree to construct phylogenetic tree for an application, This application requires a full binary tree however FastTree is generating a tree that's binary in all it's nodes other than the root node. Is there a way that I can change this option so that even the root will have two children only?
this is specified in the documentation: "the root will have three children and other internal nodes will have two children"
Can you provide your data?
Polytomies are more often the result of certain aspects of the data than of the specific tree algorithm.
Have you attempted to plot your tree with anything else? (IQTree is very good).
you can access my Multiple sequence alignment that I used to build the Tree with FastTree through this link: https://iu.box.com/s/ihequ56at14e2f9twb4j5vu71m6zvpeu
I have used Archaeopteryx tool to visualize it.
You can always save the tree in a txt based format (phylip, nexus) and then open and manually edit the tree in a text-editor. I used to have a software to visually edit trees but can't recall what it was for the moment.
To my knowledge, and I asked a few colleagues too, there's no way to force the bifurcation. Whether Lieven's suggestion of manually editing it is appropriate I'm not sure - you'd have to know a priori which the outgroup would be and that's not always going to be the case.
Well, it might indeed not be 'biologically' appropriate but it might solve the issue. That being said, the outgroup remark is certainly valid as jrj.healey mentions. Not sure about FastTree but can't you add an outgroup species when constructing the tree? Or go for an un-rooted tree?
Thanks all for your responses and suggestions. Unfortunately I will be constructing many trees in and calling FastTree in a pipeline so updating the output in newick format manually would not be that feasible. One way that I found it to work is to use the "retree" script that comes with the Phylip package, and then you can use the midpoint parameter which will center an arbitrary root to be the midpoint of the tree equidistant from the two furthest leaves. After this post-processing I end up having a full binary tree, if I have let's say n leaf nodes then I have n-1 internal nodes (including the "root" node).
The only issue with this is however these scripts were coded to run interactively with the user, if you want to run them in another script and call them by issuing one command then you have to write a physical file including all of it's parameters one line at a time, which is going to make the process more complicated. I need to find a more practical way to automate this post processing!
Look in to
Dendropy
orete3
. Both are python modules which have methods for midpoint rooting a tree.Are you using fasttree for any reason other than speed?
How did you make your initial alignment by the way? I looked at it earlier and it seems to be very 'gappy'.
this would make life much easier. Thank you for this!
update! I tried the Dendropy method, gave the same results as the FastTree program however it takes much more time to execute!
Yes, I wasn’t suggesting you make your tree with it. I was just saying to use it to apply a midpoint root to the existing tree.
In terms of speed you probably won’t beat fasttree (hence the name). You might want to revisit your alignment process though as a bad alignment can slow down the tree creation process, and I’m not convinced your alignment is optimal.
what tool do you suggest to use for alignments? I used MUSCLE to generate that file which I shared the link!
Also I'm encountering another problem now that I have noticed: FastTree is not putting my entire sequence names from the fasta headers. For example if in the fasta header from my multiple sequence alignment is 'D08_k99_412321_1342_1701_+' then in the newick output file its becoming 'D08_k99_412321_1342',
or if the fasta header in my multiple sequence alignment is 'D13_k99_152816_1343_1711_+' then it becomes 'D13_k99_152816_1343' in the newick output file etc... I don't understand why FastTree is not keeping the entire sequence header, even-though there are no restricted characters in my fasta IDs and there are no spaces, just underscores which in the documentation is encouraged to use instead of space characters.
Many phylogenetic tools have a lenght limit on the sequence naming they allow (don't ask me why). Just make sure that the names stay unique
As lieven says, many phylo tools impose limits. I'm not sure what the limit is in the case of FastTree, but for example, PHYLIP strict format has a hard limit of 16 characters.
yes PHYLIP tools are constraining the names, I have to use a different tool since I need those sequence headers to appear fully in the newick file.
Concerning for my alignment, what tool should I use to align my sequences since you mentioned above that my alignments dont seem to be optimal?
Muscle is a solid choice so you can probably proceed as you are, but it would be scientifically rigorous to compare the alignments you get from other tools. MAFFT is also pretty speedy, so maybe give that a try and just see what you get.
Thank you for your directions. Ill also align it with MAFFT and see what I can get.