I am trying to build a ML phylogeny in MEGA 6 using COI gene sequences that vary in length from around 550-650 bp. The sequences are aligned. Is it best to use the "Use all sites", "Partial Deletion" or "Complete Deletion" option when estimating which nucleotide model to use and for building the actual phylogeny? Thanks in advance.
Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard
I prefer to use all information available. Missing (or ambiguous) data does not contribute to the single site likelihood in most implementations. That said, complete deletion removes any sites with ambiguities before running the analysis. Partial deletion just removes any sites above a threshold. This could really truncate your dataset depending on how sparse it is.
Regardless of what you select, you need to use the same approach for estimating the model and estimating the phylogeny. This is important; the best-fit model of nucleotide sequence evolution might change once you remove sites.
You can usually perform more rigorous phylogenetic inference outside of MEGA using Garli, MrBayes, BEAST, etc. Might be something to consider if you're so inclined.
Thanks for the input, I also prefer to use all information available. So would you recommend using all sites (if it were you?). I'll be using MrBayes afterwards too which I'm more familiar with.
I would, because even if there is a high percentage of missing data at a site there still may be some information - why exclude it? There is precedence for this in the literature, too, with huge gene-by-taxa supermatrices with > 90% missing data still resolving deep splits because the information content is there.
Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information.
Would you be able to give me references with which it will possible to defend such an approach?
Many thanks, Leonhard
Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard