Entering edit mode
8.1 years ago
jinkuozhang
▴
30
I try to find all the possible "N20NGG" sequence in a target sequnce like:
example_seq ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACCGAAAACGGTCGGGACCGAAAACGG
What I used is python regular expression:
import re
example_seq = "ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACCGAAAACGGTCGGGACCGAAAACGG"
pattern = re.compile(r'(.{20}).GG')
all_matched_seq = pattern.finditer(example_seq)
for record in all_matched_seq:
print(record.group(1), end="\t")
print(record.span())
I ony got two matched sequences:
- ACAATTGTAGTATATAAAAA (13, 36)
- AAAACGGTCGGGACCGAAAA (45, 68)
My script failed to retrieve the other 4 matched sequences:
CAATTGTAGTATATAAAAAA; AAAAAGGGAGTAACCGAAAA; AGGGAGTAACCGAAAACGGT; GGGAGTAACCGAAAACGGTC;
How can I modify my script to get all the reasonable ones?
This precisely answered my question. John, Thanks!
If this answered your question it's appropriate to mark this answer as "accepted".