Dear All, I have a lineage that is all parasitic plant sequences, I want to examine inside the sequences, if there are sites evolving under positive selection.
Now I have a tree file, which is a parasitic-plant lineage and some other sequences (from several legume species). This is a part of a tree reflecting horizontal gene transfer. We detected that our parasitic plants got this gene from its host Medicago(non parasitic plants). We are very interested in identifying if sequences in the parasites have some sites that evolve some new function. However, because this sequences are very similar, so on the whole sequence level, Ks/Ks < 1, suggesting it evolves some negative selection. So I was wondering if there are particular sites under positive selection.
I use the following major parameters in my .ctl file fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated kappa = .3 * initial or fixed kappa fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate omega = 1.3 * initial or fixed omega, for codons or codon-based AAs ncatG = 10 * # of categories in the dG or AdG models of rates
I am not sure if my settings are correct. If anyone can give me some suggestions on how to do this in paml or give me some suggestions, I would appreciate it a lot. I run this in paml it seems that there is no positively selected sites, yet I am not sure.
Anyone has some experience in identifying sites under positive selection for a particular lineage? Suggestions are welcome!
Many thanks!
Best, Jenny
It is a 20 months old thread, but I am adding some answers to the queries posted by yangzhenzhen1988 above, which may benefit others. Please read the query completely in the above post to get a clear idea of what I am answering.
1. I wonder what the degree of freedom is, how we can estimate that.
If we look into the result file, we will see a line similar to as follows:
Here np is the number of parameters
So, our degree of freedom = (np of the alternate model - np of Null model )
2. If when model =1 (free model), we found that our branch w has higher value compared to the background w(when model =0, i.e. all branches have the same w), what can we conclude? Can we say that our branch has relaxed evolution of rate compared to the background branch? Or our branch has faster evolutionary rate?
To my understanding, free model only says that different branches have different rate of evolution. It dose not confirm if any of the branch have positive selection. To test if a branch have positive selection, we have to use model=2 and also would have to use the tree file with specifying the branch (for which we are testing if there is positive selection) with the tag #1.
3. Is it because they are allowing sites to be changed, so the whole branch keeps fixed? And that's the model that they detect if there is positive selected sites? (read the complete question from the above post)
Yes, This is the to find out if there are positively selected sites, and this is called site model (not branch-site model as assumed in the questions).
4. Then it comes to me another question, will there be a difference if I set the background of my tree differently ? Will it gives me different results? When running paml, what is the criteria of choosing background taxa or seqs if we want to get significant results?
I do not understand the question completely, but I would add a note. Running codeml with different starting tree topology gives different results.