Question

Aligning DNA sequences in MEGA

0

Entering edit mode

3.0 years ago

sushi ▴ 10

I am trying to align matK sequences in MEGA using the MUSCLE algorithm and UPGMA clustering method.

I observed a lot of gaps after the alignment, would it be better if I manually remove it or should I just retain it? I would also appreciate it if you'll leave me some tips or guide on how to trim gaps in the aligned sequences.

Lastly, I would also want to know how removing or retaining the gaps will affect the results in the phylogenetic tree that I would be constructing.

Thank you very much!

MUSCLE UPGMA MSA DNA MEGA • 2.5k views

ADD COMMENT • link 3.0 years ago by sushi ▴ 10

score 2 · Answer 1 · 2021-11-18

In MSA, gaps are introduced f.e. if one or some of the sequences have different lengths, insertions, etc. Gaps are normal to have to some extent. Eg. if one sequences contains a novel protein domain that the others do not, then all other sequences have gaps in these columns. Also, if the sequences are of different lengths, there will likely be gaps at the ends. First thing to do is check that your sequences are correct, especially if you are aligning DNA. Did you get the cDNA sequences right or do some contain intronic sequences? If introns are retained, try to remove them, or if you cannot get a correct cDNA, remove the whole sequence. You can consider to remove single sequences that cause extra many or large gaps anyway, they might be outliers and you might be able to reduce the number of gaps.

Do gaps or gap-removal affect phylogenetic reconstruction? Likely yes, but it is not always clear in what way or if the removal of gaps or trimming of alignments makes things better or worse.

The good thing with MEGA is, you can simply try it out interactively. I suspect that gaps affect simpler methods like upgma, neighbor-joining, and parsimony, whereas more state-of-the-art methods like ML and Bayesian will be more robust. There, in my experience it is better to simply leave the alignment alone.

There is also the down-side of MEGA, which is that it doesn't contain much state-of-the-art methods. Stand-alone aligners like MAFFT and T-Coffee are possibly superior to Muscle at least in some cases and some settings, and specifically IQ-tree, RaXML, MrBayes, etc. are more exact and often much faster compared to the stuff that is built in MEGA. So I would recommend to try things out in MEGA but then use some advanced tools to try to get the optimal phylogeny.
Every case is likely going to be different and unfortunately may require a lot of trial and error to tune.