Hi,
I want to find a pattern of sequence in a genome. Let's say to find following pattern (G4N(1-10))5
that translates to 4 Guanines followed by 1 to 10 bases of either A or T or G or C and then this pattern repeated for 5 times.
I have FASTA file of the organism that I work with and I have basic knowledge of Python
and regex
. Is there a package or library that does the task or should I write whole code for myself. Initially I only want to know how many of the pattern exist in the reference sequence, but later it will be beneficial to know the start and stop positions as well.
Thanks for help in advance!
Thanks for sharing it.