I have a list of orthologous genes and a species tree, I would like to know how to assign a dN/dS ratio for these species for the ensembl of the orthologous genes?
Could this ratio be relative to a set of genes not just one? If not is a mean(dN/dS ratio) feasible to have an idea on the speciation time depending on a set of orthologous genes for each species?
I have a list of orthologous genes and
a species tree, I would like to know
how to assign a dN/dS ratio for these
species for the ensembl of the
orthologous genes?
If you just want to know the global rate of protein sequence evolution you can choose between one of these methods:
- computing dN/dS individually and then computing the mean of those ratios (you should control their distribution first).
- concatenating the aligned sequences of all your genes and computing the dN/dS for this sequence.
You can check previously published papers using both techniques to know the pros/cons.
Of course this will only give you an overview of the selection occurring on your different branches but it still can be interesting to detect some differences in term of efficiency/pressure of selection between lineages.
For more fine-tuned analyses and to detect sites or branches whose patterns of selection might differ (typically to detect positive selection) you might perform more advanced analyses using different available PAML models.
Could this ratio be relative to a set
of genes not just one? If not is a
mean(dN/dS ratio) feasible to have an
idea on the speciation time depending
on a set of orthologous genes for each
species?
As David said above this is not directly possible. dN/dS ratio will just allow you to assess the differences in terms of efficience/pressure of selection between your different branches.
The dS itself is an estimate of the neutral substitution rate but:
it depends on the generation time
(along the whole branch), dN might be
higher if the generation time is
shorter in one branch.-
you will need a molecular clock to know how many mutations occur per unit of time (this can be achieved using fossil data as previously mentioned).
I have a list of orthologous genes and a species tree, I would like to know how to assign a dN/dS ratio for these species for the ensembl of the orthologous genes?
You could... but I don't know what it would tell you. dN/dS is used as a way of determining if a given locus (or codon) has been under selection. You could compare different loci using something like hyphy (i.e., is one class of gene more likely to have been a traget of selection), but I'm not sure what you'd learn by comparing species (unless one has been under genome-wide selection, which seems unlikely).
If not is a mean(dN/dS ratio) feasible to have an idea on the speciation time depending on a set of orthologous genes for each species?
No, dN/dS is a ratio of two types of substitution. When you want to add dates to your species tree you want (very crudly) to use the total number of substitutions between species as your data. To set the dates in years you'll need either a fossil or a biogeographic date (i.e., the latest or earliest time a particular node on your species tree could have formed) or an idea of the mutation rate of your genes (perhaps from another study).
If you decide to go down that route this question might help (but feel free to ask another if you get stuck)
Phillipe, I think you mean "dS itself is an estimate of the neutral substitution rate" not dN.
Indeed, thanks for noticing this obvious mistake, I'll edit it.