Hi,
I estimated a phylogeny tree, using the partial mitogenomes from around 100 taxa, using both the methods, maximum likelihood (RAXML) and the bayesian inference (BEAST2).
The topology remains the same in both the methods. I did it without partitions of mitogenomes., And, further, performed the selection analysis using PAML for the mitogenomes
and calculated the dN\dS substitutions by pairwise. I noticed that some of the mitogenomes were under negative selection and some of them are positive selection., My
question is, how do I know which genes in the mitogenomes are under selection ? Is it important to perform the analysis using partitions ?
Is their any tools, which can split the multiple sequence alignment file in to partitions file.
Suggestions please.
Alright,
Making a partition for a few samples is doable, manually. How about ~100 taxa ?
For genomes it was straightforward to predict orthologs and then genetic inference. For mitogenomes it was a little difficult.
Partitions are done by proteins/genes (by individual sequence alignments), not by taxa. I don't know of any way to delineate them except manually. That means if I have a protein A with 5000 taxa, I make an alignment of all those sequences and count the number of aligned columns. Let's say there are 200 columns in the alignment of protein A, and another 150 in protein B. When I concatenate them into a single alignment, which is now 350 residues wide (and 5000 taxa long), my two partitions will be 1-200 and 201-350. Doesn't matter how many taxa I have, as the partitions are defined by individual sequences that make up a complex alignment.
Just to add to this - one convenient option for automated partition generation is the
pxcat
program in Phyx tools suite:https://github.com/FePhyFoFum/phyx
In the example above there is an alignment that is 2174 residues wide, and it is divided into 16 partitions - one for each protein that were concatenated together. It shows you that partitions #2, #11 and #15 are considered under the same evolutionary model (LG substitution matrix, 4 categories of the gamma-shaped rate distribution), while other partitions fall with different matrix and gamma combinations.