Hi!, I'm looking for a little bit of guidance.
My question is regarding the simulation of DNA sequences with a fix substitution rate.
The majority of the programs for simulating sequences use Continuous Markov Chain Models, with different instantaneous rate matrices that can be modified according to the free parameters of each model (JC69, HKY, GTR, etc..). For example, in HKY the free parameters are the transition/transversion (kappa) rate and nucleotide frequencies (pi).
In these programs (for example INDELible, Seq-Gen, Pyvolve), the substitution rate is expected to be represented by the branch length of the phylogenetic tree used to create the simulations.
INDELible paper.
"..Rate matrices are rescaled by INDELible such that the branch lengths represent the expected number of substitutions per site (or the average expected number of substitutions per site under a heterogeneous-sites model)." (P2 - Simulation of substitutions)
INDELible: a flexible simulator of biological sequence evolution
"...Each branch length is assumed to denote the mean number of nucleotide substitutions per site that will be simulated along that branch..." (P2 - Algorithm)
Let's say I want to simulate sequences with a mean substitution rate of 2.5 x 10^-7 per year. How would I prepare a tree that can accurately represent that substitution rate and be used in the simulation? In other words, how can the length be represented as a substitution rate accurately?
Any insight is appreciated!. Thanks in advance!