Hi,
I am trying to generate a phylogenetic tree of a certain enzyme (~160aa) which highly conserved throughout Eukarya. I am especially interested in its evolution in Metazoa. My problem is that no matter what I do I always end up with trees that are supported by low bootstrap values.
I work as follows: I collected 31aa seq from 31 species representing Bilateria (mostly Lophotrochozoa and Ecdysozoa but also Deuterostomia), Cnidaria, Ctenophora, Porifera and Placozoa. I have rooted the tree with the choanoflagellet Monosiga brevicollis.
I use MAFFT-linsi for alignment. I usually also use GUIDANCE with several cutoff values to eliminate unreliable columns.
I generate the tree using RAxML gui with the following settings:
ML + slow bootstrap, 1000 runs, 1000 replications, Protgamma LG (or JTT as protein model) + empirical frequencies.
Any idea what I can change/add in order to increase the bootstrap support of the tree? I am at a loss...
You could look at whether your enzyme is represented in TreeFam and try adding your sequences to the corresponding family/tree and/or try TreeBest (or its Ensembl Compara variant) with your sequences.
How many columns do you have after removing the unreliable columns?
You could also try Gblocks and trimAl for removing poorly aligned columns.
Perhaps, you could try ProtTest for model selection.
EDIT: How do you identify/collect the orthologs in 31 species? This is very important.
Any particular reason for not using
-m PROTGAMMAAUTO
in RAxML? I don't know about empirical frequencies with so few sequences in the msa. Have you curated your alignment manually? Have you tried any other alignment algorithms?