Deleting The 3Rd Codon Position In A Multiple Sequence Alignment And Substitution Saturation
2
2
Entering edit mode
13.6 years ago

Dear All,

I was wondering if anyone could direct me to a script or an application (preferably) using which I could delete all the 3rd codon positions in a MSA? I have done this sometime ago but not able to remember how! I am planning to construct a phylogenetic tree excluding the 3rd position. I get clear cut phylogenies when I employ maximum-likelihood approach implemented in PhyML with 1000 replicates but getting polytomy for the bayesian tree (10mil generations, MrBayes).

When I perform the "Test of substitution saturation (Xia et al. 2003; Xia and Lemey 2009)", the program tells me that there is little saturation (Iss is significantly lesser than Iss.c). The standard output of the program is...

----------------------------------------------------------------------

Significant Difference
               ----------------------
               Yes                 No
-------------------------------------------------------
Iss < Iss.c    Little             Substantial
               saturation         saturation
-------------------------------------------------------
Iss > Iss.c    Useless            Very poor
               sequences          for phylogenetics
-----------------------------------------------------------------------

What I want to know is, does this output change if there is no saturation at all? Or is 'little saturation' as good as no saturation and usable for phylogenetic analyses? I have also plotted the genetic distance against transition and transversion rates but not able to interpret the graph. Any help is greatly appreciated.

Thank you very much, Kartik

phylogenetics • 8.6k views
ADD COMMENT
1
Entering edit mode
13.6 years ago
David W 4.9k

Katrik,

I've not used the tests you talk about, so I'm not sure how to interpret them. Looking at you plots, it does seem that the 3rd position is a little saturated and won't be providing much signal for the deepest nodes in your tree. But before you get drastic and lop it off I'd check a couple of things...

  • Is F84 the best model for these sequences? It does include seperate Ti/Tv ratios but a different model (selected by jModeltest or similar) might deal with other rates better and get some more signal

  • Has you MrBayes run converged and do you have a good MCMC sample (you can use Are We There Yet to have a look a while it runs). I'd be surprised in PhyML recovered a nicely resolved tree and MrBayes didn't. But...

If that doesn't work MEGA will export only the 1st and 2nd codons - but be aware you'll also be throwing away a lot of the the signal that helps relate the shallow nodes in your tree.

EDIT to answer q's below

Sounds like you've taken all the obvious steps and are in the murky world of phylogenetic troubleshooting, which usually requires knowing what's going on in your tree. Micheal Sanderson has a review of some of the steps you might take (doi:10.1146/annurev.ecolsys.33.010802.150509) and for Bayesian analyses densitree is a good way of visualising just whats happening in your sample.

(You could also treat 3rd codons as one partition and 1/2nd as another, then unliking their parameters so each can behave differently)

ADD COMMENT
1
Entering edit mode

Hey David. I followed up your advise on one of the other posts and constructed a strict bifurcating tree. Now this tree is just like the PhyMl tree. I was wondering the exact difference between the two contype commands (halfcompat and allcompat). I read that halfcompat is like 50% majority consensus rule of PAUP but being new to this field, I have no clue what exactly this does. Finally, can I include this tree I generated using allcompat in publications? Thank you very much for all the help mate :)

ADD REPLY
1
Entering edit mode

Oh right! Yeah, halfcompat is majority rule consensus and all compatible gives you every clade even if they only have small supports. Just explain what you did in your methods and you'll be fine)

ADD REPLY
0
Entering edit mode

Dear David,

Thank you very much for the reply. I have used jModeltest to select the best model for the dataset and have used it for running the phylogenetic analyses. I did not see a difference in the plot when I used GTR or F84 model, so did not pay enough attention. I will check with the appropriate model. Thanks for the tip!

I was surprised too that PhyML gave a resolved tree but MrBayes did not. I guess its as you pointed out, its the little saturation that's masking the signals required for a better resolution of the 'deepest nodes'.

ADD REPLY
0
Entering edit mode

The Bayesian analyses had reached convergence. Both maximum-likelihood and Bayesian analyses produce the same topology but the Bayesian tree is not resolved at some positions. Could this also be because of an adaptive radiation? Also, the bootstrap values are really low while Bayesian posterior probabilities are well over 0.90. I guess that's a general trend....?

ADD REPLY
0
Entering edit mode

Thanks for the help David. I haven't tried that yet, but will do so. Thank you.

ADD REPLY
0
Entering edit mode
7.2 years ago
al-ash ▴ 210

For deleting 3rd codon position using R, see a related Q&A on biostars: Reduce saturation by deleting the 3rd position in each codon of DNA seqs. R ape package

ADD COMMENT

Login before adding your answer.

Traffic: 2459 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6