Hello,
I was wondering how to compute 1GC, 2GC and 3GC (GC content for each codon at position 1, 2, 3).
I want to compare GC content of any predicted CDS. At the moment, I simply compute with this formula:
GC_content = (G+C)/(A+T+G+C)
For 1GC 2GC 3Gc I tried:
1GC = (G1+C1)/(A1+T1+G1+C1)
2GC = (G2+C2)/(A2+T2+G2+C2)
3GC = (G3+C3)/(A3+T3+G3+C3)
But I'm not confident about this way of computation. I know the formula takes mutation rate into account. I haven't found any software (like a little python script) for now and It's not really difficult to make my own python script if I have the right formula. I would prefer an existing script since it will involved lot of mathematics and probabilities.
In addition, I'm searching a deep review into GC content in prokaryota to really understand how make conclusions from GC content. I'm looking into several articles but I have not found a synthetic review on this subject.
Thanks in advance for any help.
R may be a little bit slow for large-scale analysis. So I suggest not using it in case you have much data to analyze.