Question

Gblocks For 16Srna Phylogenetics ?

3

Entering edit mode

12.8 years ago

Brett ▴ 150

I am pretty new to bioinformatics and have been trying to perform phylogenetics upon approx 100 divergent bacteria.

The approach that I took was using MUSCLE alignment then simply a maximum likelihood tree, which got most things spot on but there were a few obvious errors.

A friend told me about gblocks, and after running my data sure enough I got a better tree.

However reading the literature (see italics below) it appears gblocks is only intended for protein coding DNA.

Although we have only used protein alignments, the same conclusions are expected to apply to protein-coding DNA alignments of similar divergence. On the other hand, although we predict that the general conclusion that ambiguously aligned regions in any data set are best excluded when they provide more noise than signal, rRNA alignments as well as alignments from noncoding DNA have very different features from coding alignments, and our simulations were not specifically designed to explore the properties of these kinds of sequences. However, our purpose in this work is not giving strict rules about the best alignment strategy and associated parameters.

Essentially what I am asking, is curating the data with gblocks valid (with 16s Rna) ? Has any one used it in literature before (that I cant find) ? or am I going to have to find a new way to improve my data ?

All opinions appreciated

phylogenetics • 3.9k views

ADD COMMENT • link updated 12.8 years ago by Steve ▴ 10 • written 12.8 years ago by Brett ▴ 150

score 2 · Answer 1 · 2012-02-24

I would try to improve the alignment, rather than exclude regions. My take on Gblocks is that it's designed to deal with more divergent sequences than 16S.

We have come up with the following workflow that seems to work well. It uses a secondary structure-aware method that seems to produce fewer obvious misalignments than Clustal.

Download a curated 16S alignment from RDP and use it with cmbuild from the Infernal suite to generate a covariance model.
Align the sequences of interest using the covariance model with cmalign.
Use BioPython AlignIO to convert Stockholm to Phylip format
Inspect, trim and if necessary adjust the alignment in Mesquite. The main areas Infernal seems to have problems is when the end of a sequence comes near a gap in the overall alignment.
Generate a tree using RAxML or other

RDP:http://rdp.cme.msu.edu/misc/resources.jsp Infernal:http://infernal.janelia.org/ Mesquite:http://mesquiteproject.org/mesquite/mesquite.html

Ram · Answer 2 · 2012-02-24

1

Entering edit mode

12.8 years ago

Steve ▴ 10

Can't comment if it is right or wrong, but it has been used in multiple papers for 16S

eg

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 12.8 years ago by Steve ▴ 10

score 0 · Answer 3 · 2012-02-23

0

Entering edit mode

12.8 years ago

Joseph Hughes ★ 3.0k

For rRNA you might want to look into profile alignment options in MAFFT or CLUSTAL. You will need a reliable profile to align to. Greengenes Core Set might be useful. You might also find NAST multiple alignment useful.

ADD COMMENT • link 12.8 years ago by Joseph Hughes ★ 3.0k