Hi all, this is the first time that I am posting a query here. So please do not be offended if I have written something which is not the norm here.
Essentially, I have an alignment of protein coding genes and I am trying to find if there are certain regions in the alignment which evolve under dual constraints i.e. in addition to the constraint for protein coding, there is also some regulatory role for certain DNA elements. Such constraints are usually observed for elements like Exonic splicing enhancers.
One way to do that is to look at the conservation at synonymous sites. Since synonymous sites are free from constraints of protein coding, any conservation observed at such sites would imply an additional role for such subsequences, possibly regulatory. However, I do not know how do some math/stats to prove that the conservation observed at synonymous sites is statistically significant and more than what would be observed under a null model (where the null model states that the only constraint/conservation observed in a multiple sequence alignment is because of protein coding). Any help in this regard would be very very welcome.
Regards
Sankalp
Why not email the authors to ask for their code?
They used MULTIZ as input which you can generate for your data. They are unspecific about how they estimate parameters, but in general use a dataset of coding regions to estimate a codon-based rate matrix, which you could do in PAML. I would not talk yourself out of contacting them for code and more information about their approach, especially the parameter estimation. You could save yourself a lot of time by doing so.
Thanks Casey, I have seen that paper. But I am not really a molecular evolution person. And I was looking for something that is already available rather than trying to figure out all the parameters for modelling sequence/codon evolution and then write code for the same. But thanks anyway for the prompt reply.
Things are not that simple. What these people did was specific for the 29 way vertebrate alignments that they were interested in. They were estimating the parameters which again were specific for those organisms. I have an altogether different dataset and hence cannot use the same.
Thanks Casey, I will get in touch with these guys.