Hi everyone,
Does anyone know how to calculate the number of synonymous sites for a certain sequence? I know the principles, but I don't know whether there is any software or script that can be used for this. All the software I know is to calculate dN, dS, Ka, and Ks... I would really appreciate if anyone can help me out with this problem. Thank you in advance!
--Patricia
I don't understand "for a certain sequence". Normally people calculate the number of syn. sites for a set of SNPs. What you would have to do is align that sequence to a reference and compute the variants.
To be clear - do you meant you are looking for way to locate 4-fold degenerate sites (i.e. sites for which all mutations will be synonymous) in a single sequence?
I think, rather, that Patricia is using one of the sometimes-called "approximate" or "counting" methods of looking for selection, in which an attempt is made to count the number of synonymous sites by looking at sequences and identifying positions that are 4-fold degenerate (which might count as a single synonymous site), then maybe half a synonymous site for 2-fold degenerate sites, and so on.
Or at least that's how I understand these methods. The references I give in my answer below cover these issues.
The fact that one is (always? almost always?) interested in synonymous (or non-synonymous) changes between two (or more) sequences, highlights that a method that compares 2+ sequences to address such questions is likely to be a good way to go.
In addition, the fact that software such as PAML (generally acknowledged as providing good ways of estimating such things) doesn't provide this kind of information (or at least as far as I can tell, after looking at this a bit just now), further highlights that estimating these kinds of things is unlikely to be something of high interest.
Patricia, would be great to get some feedback on whether these answers/comments are useful for your question.
Hi aidan-budd, Thank you very much for taking the trouble to help me. I really appreciate it. You are right. I'm looking for a software to count the number of potential synonymous or nonsynonymous sites of a sequence, by identifying 4-fold degenerate, 2-fold degenerate sites (Nei-Gojobori Method). Like the others say, these information is often offered when the input is sequence alignment files. The problem is that I don't wanna estimate dN, dS, and things alike. So there is no need to generate sequence alignments. I saw some people get this kind of information by writing programs, like Perl scripts, which I am not good at :(