Hi,
I am running codeml on orthogroup tree to determine whether a certain lineage is under positive selection. The way I did is to set null model with fixed omega (model=2
, NSsites=0
, fix_omega=1
) while branch-specific model by labeling branch of interest (foreground) in tree file (model=2
, NSsites=0
, fix_omega=0
). After running codeml on both models, I was able to extract likelihood (lnL) and number of parameters (np) for each model, as well as Kn/Ks value for background branches and foreground branches in branch-specific model.
To found out the significant difference on Kn/Ks values between foreground and background branches, I calculate LRT by LRT=2*(lnL1-lnL0)
and corresponding pvalues using dchisq function (df =1) in R. I noticed that in some orthogroups, LRT was reported as a negative value which leads to pvalue =0. That is definitely abnormal. How could LRT being negative happen?
Also, I am still a little bit confused about Kn/Ks value predicted for background and foreground in branch-specific model report. If I compared them and Kn/Ks is larger in foreground compared to background, may I make a comment that lineage I chose is under positive selections? The other way around, if Kn/Ks is smaller, the lineage is under negative selection.
Hope I have made my question clear. Thank you
Yes, I agree that 'The alternative model should always give you a higher or equal likelihood than the null model' but in my situation, some alternative model have lower likelihood than null model, which causes LRT < 0. I do not know how it is caused. I had run branch-site model and it has the same problem.
Are you sure you are setting the alternative and null models appropriately? It may help if you send the LnL values for each model. For example, if the alternative model has LnL of -10000 and the null of -10040, then that is perfect, the alternative model is more likely.
Here is the output summary I got from branchsite-based model for 8 different trees. I ran dchisq on LRT in R and get pvalue for each. You can notice that some LRT is negative, all the pvalue were estimated = 0
That's not negative in practice. Here, -3.8E-05 is basically zero (no difference between alternative and null models; p-value=1).
I see, thank you!