Entering edit mode
4.0 years ago
Chvatil
▴
130
Hello everyone, I need for a project to calculate dS along branches
for 300 house keeping genes
within a phylogeny of 100 species
.
The idea is to get a dN/dS distribution of dS for them.
But here is my question, do you think it is better to :
- Build a phylogeny for each house keeping gene and use the corresponding gene tree for the codeml analysis (then the corresponding gene tree is used for each analysis) ?
- Or use a consensus well supported tree of the 100 species for all the analysis (then the same tree is used for all analysis)
The issue I see with the first one is that if I use this one, I will have to constrain the topology of the trees in order to reflect the good topology of speciations using the consensus well supported tree.
Thanks for your help and have a nice day.
it totally depends on you aim. For instance, If you see distribution of dN or dS values in scope of species, it is better to use a concatenated file that contains all genes and a phylogenetic tree. Doing so, you can see dN or dS values for each species overall. This reflects overall dN or dS distribution for each species, but you can not see which gene has which value. If you are intersted in dN or dS distribution for each single gene, you need to do this analyses for each gene with its corresponding tree. For each trree you may get different topology, as this is a result of evolution of each gene if there is. Again, are you interested in species-wide dN or dS distribution or gene-wide?
Well I'm interested in making a distribution of dS along branches, so for instance I want to take two species in the tree and say: ok the sum of dS (kind of divergence time) between those two species is 3.4, and by adding 300 gens I want to say : ok the dS value between species 1 and 2 is comprised between [3.1 and 3.7] at 95%
For distribution along branches, it is better to use a concatenated file (all genes in a file) with a phylogenetic tree. If you want to comparisons between 2 species, you can perform pair-wise comparison; using sequence of two species without a phylogenetic tree.
How much time does it take to run for 300 genes? Do you use any method to parallelize codeml?