Should I Strip Alignment Columns Before Making A Tree?
5
11
Entering edit mode
13.7 years ago
John ▴ 790

After alignment I have the option to keep alignment columns. Should I strip them before making a tree in general or specifically in MrBayes or RaxML? I've checked the manual for each program and I can't find the answer. I can't think intuitively if this matters or not?

phylogenetics tree alignment • 9.2k views
ADD COMMENT
7
Entering edit mode
13.7 years ago

I distinguish two scenarios:

  • a region with gaps in some sequences but with a good alignment overall: in this case I wouldn't remove it. MrBayes will simply ignore the regions with gaps, but you still have some information that you could potentially code in a binary form as presence/absence of that particular gap.

  • a region with overall poor alignment and no phylogenetic inertia: then I would remove it. I use visual inspection for this, rather than Gblocks (which is good option for pipelines, though).

ADD COMMENT
3
Entering edit mode

I do not recomment gblocks, either. It is over stringent. When there are distant homologs, it leaves almost nothing. In the end, I developed something myself for my pipeline.

ADD REPLY
2
Entering edit mode

I recommend a program for masking as much as possible, except in the cases where it seems to go obviously wrong, just from a reproducibility standpoint. But I wouldn't use GBlocks. It doesn't really seem to do a very good job. One option to check out is a program called MANUEL. (http://www.ncbi.nlm.nih.gov/pubmed/19770262)

GUIDANCE (http://nar.oxfordjournals.org/content/early/2010/05/23/nar.gkq443) is another option as is the mentioned trimAL

ADD REPLY
1
Entering edit mode

I might add that I always use Gblocks with settings that are considerably more relaxed than the default. I agree that the default is too stringent.

ADD REPLY
6
Entering edit mode
13.7 years ago
Spitshine ▴ 660

Don't forget to start with a good alignment: I prefer muscle over ClustalW for more conservative alignments and better block structures. If you can, use hmmalign and suitable HMMs.

I find GBlocks overly conservative and removing all gaps definitely removes information that ML methods can use. My sample size is one (elongation factors 1 and 2 across archaea, bacteria and eukaryotes), but I would expect to find more if I would start looking for it. TrimAl is a softer, good alternative.

It also depends on your type of questions. If you need a tree of protein sequences, e.g. to define orthologous groups, being conservative and using GBlocks is probably OK. If you are gagging for a phylogenetic signal to place your organism of choice, I would go with manual selection, guided by TrimAl or GBlocks.

ADD COMMENT
2
Entering edit mode

+1 for the comment on Gblocks and mentioning trimAI.

ADD REPLY
0
Entering edit mode

+1 for the comment. It is essential to have a "fairly good" alignment to start with. By a simple visual inspection, you can spot particular sequences that are obviously unrelated to the general alignment. Once you remove such sequences based on reasonable justifications, you can then use trimming programs to have an idea about how much phylogenetic noise due to gaps contribute to the noise in your data. I think TrimAl is a good alternative and very flexible to perform such noise-to-signal analysis

ADD REPLY
0
Entering edit mode

In my opinion, Gblocks worked well for very conserved alignment, but poor for distantly related sequences.

ADD REPLY
2
Entering edit mode
13.7 years ago
lh3 33k

I do not know what is the best practice, but I trim poorly aligned columns and find that the quality of the resulting tree is improved in average (sometimes trimming leads to worse results). I used NJ and PhyML.

ADD COMMENT
2
Entering edit mode
13.4 years ago
scapella ▴ 390

Hi,

I agree with Carlos about possible scenarios. Most of programs for phylogenetic reconstruction ignore gap-rich columns but columns, where is not clear if they are well aligned or not, can mix up everything and bias your final tree to a wrong topology.

As it has been said, I'd like to recommend you to use any available program, such as GBlocks or trimAl, for detecting and removing, in a systematic manner, gap-rich columns.

For instance, you could try any automated option in trimAl if you are not sure about which parameters are more suitable for you. The program infers the parameters using only gaps, gaps and similarity scores, etc. If you have a clear idea about which parameters fit well for your case, you can always use manual options in trimAl.

I think removing all columns with at least one gaps is not good idea since many columns with few gaps carry a lot of phylogenetic signal that can be crucial for your final tree in terms of topology, support, branch length, etc.

ADD COMMENT
1
Entering edit mode
13.7 years ago

I would always use a program such as Gblocks to remove poorly aligned columns before constructing maximum-likelihood trees. The main reason for this is that the probabilistic models generally do not deal well with gaps. Also, regions with many gaps are the ones most likely to contain alignment errors, for which reason it is desirable to eliminate them.

ADD COMMENT
0
Entering edit mode

No, I do not recommend gblocks. It is over stringent.

ADD REPLY

Login before adding your answer.

Traffic: 2408 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6