Aligning DNA sequences in MEGA
1
0
Entering edit mode
3.0 years ago
sushi ▴ 10

I am trying to align matK sequences in MEGA using the MUSCLE algorithm and UPGMA clustering method.

I observed a lot of gaps after the alignment, would it be better if I manually remove it or should I just retain it? I would also appreciate it if you'll leave me some tips or guide on how to trim gaps in the aligned sequences.

Lastly, I would also want to know how removing or retaining the gaps will affect the results in the phylogenetic tree that I would be constructing.

Thank you very much!

MUSCLE UPGMA MSA DNA MEGA • 2.5k views
ADD COMMENT
2
Entering edit mode
3.0 years ago
Michael 55k

In MSA, gaps are introduced f.e. if one or some of the sequences have different lengths, insertions, etc. Gaps are normal to have to some extent. Eg. if one sequences contains a novel protein domain that the others do not, then all other sequences have gaps in these columns. Also, if the sequences are of different lengths, there will likely be gaps at the ends. First thing to do is check that your sequences are correct, especially if you are aligning DNA. Did you get the cDNA sequences right or do some contain intronic sequences? If introns are retained, try to remove them, or if you cannot get a correct cDNA, remove the whole sequence. You can consider to remove single sequences that cause extra many or large gaps anyway, they might be outliers and you might be able to reduce the number of gaps.

Do gaps or gap-removal affect phylogenetic reconstruction? Likely yes, but it is not always clear in what way or if the removal of gaps or trimming of alignments makes things better or worse.

The good thing with MEGA is, you can simply try it out interactively. I suspect that gaps affect simpler methods like upgma, neighbor-joining, and parsimony, whereas more state-of-the-art methods like ML and Bayesian will be more robust. There, in my experience it is better to simply leave the alignment alone.

There is also the down-side of MEGA, which is that it doesn't contain much state-of-the-art methods. Stand-alone aligners like MAFFT and T-Coffee are possibly superior to Muscle at least in some cases and some settings, and specifically IQ-tree, RaXML, MrBayes, etc. are more exact and often much faster compared to the stuff that is built in MEGA. So I would recommend to try things out in MEGA but then use some advanced tools to try to get the optimal phylogeny.
Every case is likely going to be different and unfortunately may require a lot of trial and error to tune.

ADD COMMENT
0
Entering edit mode

Thank you very much for answering my questions, Sir! I will take note of these and will try to apply the following to my undergrad thesis.

I tried to construct NJ tree, wherein I left the alignment alone. However, upon observation, I got bootstrap value lower than 70 in some nodes. I would like to ask if this is okay or should I attempt for a higher bootstrap value?

ADD REPLY
2
Entering edit mode

NJ is only for doing a quick initial screen, or initial tree generation for ML, but not for the final analysis, it was maybe ok 20 years ago before the more advanced methods were around or the computational power was limited. My recomendation is to try IQtree with at least 1000 ultra-fast bootstrap iterations and let it determine the optimal substitution model. BS values below 70% are common I think, anyway they might vary widely depending on the method and substitution matrix.

ADD REPLY
0
Entering edit mode

This is noted, Sir. I will try performing these in IQtree also.

Thank you very much, your help is highly appreciated. Have a great day ahead!

ADD REPLY

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6