I have a concatenated alignment of candidate phyla ribosomal proteins [4 markers]. I did this by generating each of the alignments separately and then literally just concatenated the strings together for each organism. Anyways, I've been doing some digging and most sources say that UPGMA is a terrible method for phylogenetic reconstruction.
I'm attaching the neighbor joining and UPGMA results side by side. The UPGMA makes MUCH more sense in terms of taxonomy where similar taxa are next to each other and the outgroups are off on there own. This is not the case with the neighbor joining.
Why is neighbor-joining better and why should I not trust the UPGMA results even though they are more consistent with a priori taxonomy assignments?
To my knowledge both methods are pretty poor. Maximum likelihood approaches are usually more accurate, if a little slower.
I was experimenting with Maximum Likelihood trees with the CLC GUI and it looks like they need a starting tree which can be generated with either Neighborhood-joining or UPGMA. . Do you know of any commandline tools thats can create Maximum Likelihood trees from UPGMA constructor trees ? Preferably taking fasta-formatted alignment as input but it's ok if they don't.
I’m not sure about the specific starting tree, but 2 of the best tree creation tools are RAxML (lots of customisation and advanced options), and IQTree (still very advanced, but much more user friendly).
Trying out both right now :) Wow you're right about RAxML there are so many options. I have them both running and will compare the results. IQTree seems to have a very interpretable output and parameters.