Hi all,
Could someone just clarify the following output for me?
Example output:
branch t N S dN/dS dN dS N*dN S*dS
38..1 0.431 2257.5 655.5 0.0546 0.0294 0.5379 66.3 352.6
38..39 0.068 2257.5 655.5 0.0546 0.0046 0.0843 10.4 55.2
39..40 0.026 2257.5 655.5 0.0546 0.0018 0.0323 4.0 21.1
tree length for dN: 0.1848
tree length for dS: 3.3855
I don't understand what t, N, and S are. I know dN/dS is the omega ratio, and dN and dS is the average nonsynonymous/synonymous substitutions from node..node. So what is N and what does NdN mean? same with S and SdS?
If I want to find the average dN and dS (individually) for particular branches, should I use these values from model M0 or M3? I essentially want to find out if some species are evolving at a faster rate compared to others.
Thank you all!
I am confused still, if there are 2000 N and only 650 S, how is the overall dN/dS less than 1?
N & S are the estimated number of sites in the alignment where a substitution would either be non-synonymous (N) or synonymous (S). This is calculated based on the genetic code and the codon usage model estimated from the data. It is not the same thing as the actual number of synonymous or non-synonymous changes. So in your alignment, there are about 2257 non-synonymous sites and 655 synonymous sites. This is roughly what you'd expect given the genetic code.
Despite there being more non-synonymous sites, the rate of non-synonymous changes is much lower than the rate of synonymous changes, leading to a dN/dS that is <<< 1. This suggests a significant degree of purifying selection acting on whatever gene you're looking at.
Also, why would model M1a be used to check for branch-specific dN and dS? Why not model M0 which doesn't constrain it since M1a does not allow positive selection anyways?
It's possible I got the model terminology wrong. If you set model = 1 in the control file, this will give you the free-ratio model, which allows each branch to have an independent dN/dS. Looking at the manual again, I see that this different from model M1a, which is a nearly neutral model that is used as a null model when looking for positive selection. Typically M1a is compared against M2a.
In your case, you can run M0 to get dN and dS rates across each branch, but the overall dN/dS rate will be constrained to be equal across the whole tree (this is rather non-intuitive, at least to me!). I was just suggesting you could also try the free-ratio model to get independent estimates of dN/dS at each branch as well.
Okay I can try the free ratio model as well, but I don't think it gives me individual dN and dS values with their individual branch lengths. I have to make a dN and dS tree where I can use their individual branch lengths and compare the differences visually in that manner. M0 gives individual dN and dS but the branch length (t) then is for the average dN/dS I believe.
The free ratio model actually will give you individual dN and dS values for each branch. The output file will have three different trees, one for dS, one for dN, and one of omega.