When Is Dna Better Choice Than Aminoacids For Multiple Sequence Alignments And Phylogenetic Reconstruction?
4
11
Entering edit mode
13.1 years ago
jhc ★ 3.0k

I generally use protein sequences to generate MSA for phylogenetic reconstruction of gene families. What, in your opinion, are the best criteria to decide that switching to DNA level would be a better choice?

Some obvious cases include alignments of genes with almost identical protein sequences. I would really like to hear about your strategies or examples.

Thanks!

alignment multiple phylogenetics • 9.0k views
ADD COMMENT
3
Entering edit mode

Take a look at this. I think it summarizes most important reasons: http://evolution-textbook.org/content/free/contents/ch27.html#ch27-6-3

ADD REPLY
1
Entering edit mode

Why would one use protein sequences for phylogenetic reconstruction, when DNA is also available? you want to have the strongest possible phylogenetic signal and mutations act on DNA, especially on third codon positions in exonic regions. So I think DNA contains more information for phylogenetic reconstruction..

ADD REPLY
5
Entering edit mode
13.1 years ago
lh3 33k

My preference is to use protein sequences for multi-alignment and then replace each amino acid with the original codon. We build trees with the codon alignment. For deep branches, you more rely on the 1st and 2nd phases and for shallow branches, you more rely on the 3rd phase. At least to treefam, as I have tested, this strategy gives the best trees. There is also a review about what alignment to use when constructing trees (I cannot find it now). The conclusion is similar: codon alignment yields better trees for both deep and shallow branches.

Ah, you can have a look at my PhD thesis about an evaluation. Table 6.2.

ADD COMMENT
0
Entering edit mode

thanks, that's useful!

ADD REPLY
4
Entering edit mode
13.1 years ago
fransua ▴ 390

Hi, if your sequences are not too divergent, you can use directly codon models in order to reconstruct your tree. PAML http://abacus.gene.ucl.ac.uk/software/paml.html can be usefull for doing this, it is perhaps not the most efficient program for phylogeny, but allows you to test for different codon models, have a look to the manual in order to know the corresponding number of degrees of freedom etc...

an other tool is https://www.nescent.org/wg_garli/Main_Page but I have almost no experience with it.

I think that if you can defend that there is no saturation in the rate of synonymous mutations, using codon model is the best option.

A way to check it would be to compare the tree support you obtain with codon model, or aminoacid model.

good luck!

ADD COMMENT
3
Entering edit mode
13.1 years ago
Tancata ▴ 210

Yes, use DNA if the protein sequences are very conserved. One advantage of using DNA sequences not mentioned so far is that you can infer a substitution matrix from the alignment, even with a smallish amount of data (like a single gene). You can't really do that for protein sequences unless you have a concatenated alignment (lots of data), so with proteins you often have to use pre-baked substitution matrices like WAG, LG, etc.

I think for older evolutionary splits, protein sequences will definitely be more informative, though - DNA will be too noisy. I suppose DNA could have more information if you were using a codon model but people don't seem to do that. Column-by-column, the signal in the four possible DNA bases will probably get saturated more quickly.

ADD COMMENT
2
Entering edit mode
13.1 years ago
Niek De Klein ★ 2.6k

I found http://www.bio.net/bionet/mm/methods/1998-January/064055.html a good overview of why using which blast program. To sum it up, you would use Blastn with very similar sequences (think of two different strains of the same bacteria, or two individuals of the same organism) or to find non-coding regions.

Blastp is better for more distantly related proteins (since protein sequence is more conserved than DNA sequence) and it is faster than blastn because of a smaller database size.

ADD COMMENT

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6