I'm writing a small script which does the following:
- Takes an amino acid sequence (ex. GVGP...) and a nucleotide length k
- Represent amino acid sequence as triplets of amino acid residues (ex. GGGVVVGGGPPP...)
- Run a sliding window across it with length k (ex. if k=5 we have GGGVV, GGVVV, etc)
- Count the number of times each unique sequence from (3) occurs in the entire sequence. Also, count the number of synonymous nucleotide sequences that can code for each.
I'm writing a description of this script. I feel that there should be a more clear and concise way to describe this in a couple sentences, or a term that describes the sequences in (3).
Has it been defined before in literature? If not, is there a concise pseudocode that can describe the process? I'm quite new to writing papers in bioinformatics/CS so forgive me if this is an obvious question.
It looks like there is no accepted terminology. "synonymous codon fragment" sounds pretty good. Though I will probably need to give a proper detailed definition in my text after all.