I am trying to find all instances of a specific codon in a given gene or RNA sequence using Python's regular expressions. The findall function seems to be able to do the the job, however, the problem is that one needs to match codons not any subsequent three letters, which may not be part of a codon. Here's an example:
>>> seq='CTCTTACTT'
>>> import re
>>> re.findall(r'CTT',seq)
['CTT', 'CTT']
The first CTT
that it finds does not correspond to a codon (CT**CTT**ACTT
) since we have only three codons in the given sequence including:
CTC
, TTA
,CTT
Obviously, the most straightforward way is to use a loop, extract each codons from the sequence and compare it with CTT
(the codon we are searching for), but I am looking for a smarter way of doing so.
Something like this should really appear as a comment under the answer it refers to - the "Answers" section is intended for, well, answers to the original question.