I am pretty new to bioinformatics and have been trying to perform phylogenetics upon approx 100 divergent bacteria.
The approach that I took was using MUSCLE alignment then simply a maximum likelihood tree, which got most things spot on but there were a few obvious errors.
A friend told me about gblocks, and after running my data sure enough I got a better tree.
However reading the literature (see italics below) it appears gblocks is only intended for protein coding DNA.
Although we have only used protein alignments, the same conclusions are expected to apply to protein-coding DNA alignments of similar divergence. On the other hand, although we predict that the general conclusion that ambiguously aligned regions in any data set are best excluded when they provide more noise than signal, rRNA alignments as well as alignments from noncoding DNA have very different features from coding alignments, and our simulations were not specifically designed to explore the properties of these kinds of sequences. However, our purpose in this work is not giving strict rules about the best alignment strategy and associated parameters.
Essentially what I am asking, is curating the data with gblocks valid (with 16s Rna) ? Has any one used it in literature before (that I cant find) ? or am I going to have to find a new way to improve my data ?
All opinions appreciated