I have a large number of fasta files each containing a single nucleotide sequence, all of which are in frame (but not all of which contain start codons), and some of which contain ambiguous characters (Ns) where the identity of a particular base is unknown due either to poor quality or absent sequence information.
At times, a particular codon may only have a single base that was properly sequenced, such as below, a short example sequence from one such file: ...GTGCTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAG...
I would like to be able to trim out Ns from all of these sequences, but do so in a "codon-sensitive" way, such that the trimming would either leave the CNN or remove the C with the Ns (ideal). It is trivial for me to remove the Ns, but I am not sure how to handle it given the codons. If it is helpful, I already have the corresponding amino acid sequence.