Building phylogeny in MEGA 6
1
1
Entering edit mode
9.7 years ago

I am trying to build a ML phylogeny in MEGA 6 using COI gene sequences that vary in length from around 550-650 bp. The sequences are aligned. Is it best to use the "Use all sites", "Partial Deletion" or "Complete Deletion" option when estimating which nucleotide model to use and for building the actual phylogeny? Thanks in advance.

phylogeny COI MEGA • 4.2k views
ADD COMMENT
0
Entering edit mode

Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard

ADD REPLY
1
Entering edit mode
9.7 years ago
Brice Sarver ★ 3.8k

I prefer to use all information available. Missing (or ambiguous) data does not contribute to the single site likelihood in most implementations. That said, complete deletion removes any sites with ambiguities before running the analysis. Partial deletion just removes any sites above a threshold. This could really truncate your dataset depending on how sparse it is.

Regardless of what you select, you need to use the same approach for estimating the model and estimating the phylogeny. This is important; the best-fit model of nucleotide sequence evolution might change once you remove sites.

You can usually perform more rigorous phylogenetic inference outside of MEGA using Garli, MrBayes, BEAST, etc. Might be something to consider if you're so inclined.

ADD COMMENT
0
Entering edit mode

Thanks for the input, I also prefer to use all information available. So would you recommend using all sites (if it were you?). I'll be using MrBayes afterwards too which I'm more familiar with.

ADD REPLY
1
Entering edit mode

I would, because even if there is a high percentage of missing data at a site there still may be some information - why exclude it? There is precedence for this in the literature, too, with huge gene-by-taxa supermatrices with > 90% missing data still resolving deep splits because the information content is there.

ADD REPLY
0
Entering edit mode

Dear Brice: thank you for this insight about "using all sites" and the maximum phylogenetic information. Would you be able to give me references with which it will possible to defend such an approach? Many thanks, Leonhard

ADD REPLY

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6