Rosalind exercise: Finding a Motif in DNA
3
2
Entering edit mode
4.4 years ago
caro-ca ▴ 20

Hi, community!

I am trying to find a motif in a DNA sequence. This is my code:

#!/usr/bin/env python3

from sys import argv
import re 

#Functions
def find_motif(seq_dna, motif):
    results = re.finditer(motif, seq_dna)
    r = []
    for result in results:
        r.append(result.span()[0] + 1)
    print(" ".join(map(str, r)))

if __name__=='__main__':
    seq_dna = argv[1]
    motif = argv[2]    
    find_motif(seq_dna, motif)

By running my code as python finding_motif.py "GATATATGCATATACTT" "ATAT", this is the stdout:

2 10

However, there is another motif in index 3 that is not counted. Could somebody help me with a way how to tackle this? The real output is:

2 4 10

Thank you for your help in advance

python rosalind • 6.4k views
ADD COMMENT
1
Entering edit mode

btw., you can do print(*r) instead of the print-join-map business, which is equivalent to print(r[0], r[1],etc.).

ADD REPLY
0
Entering edit mode

Wow! I did not know about that. Thank you!! It worked.

ADD REPLY
2
Entering edit mode
4.4 years ago
hugo.avila ▴ 530

I dont know why your code doesn't work, maybe it is because "re" restarts looking after the index of the first match. Here is my solution:

s = "GATATATGCATATACTT" 

for i in range(len(s)):
    if s[i:].startswith("ATAT"):
        print(i+1)
ADD COMMENT
1
Entering edit mode

Thank you! You were really helpful. It worked!

ADD REPLY
2
Entering edit mode
4.4 years ago
Mensur Dlakic ★ 28k

It says on re library documentation page:

re.finditer(pattern, string, flags=0)
Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

Your matches are overlapping, so it will find only the first of them. You should be able to solve this by looping through the string using re.search, which is described on the same page, or using re.findall as described here.

ADD COMMENT
1
Entering edit mode

Another option is to use the third party regex module, which is a drop-in replacement for re but supports overlapping matches natively IIRC.

ADD REPLY
0
Entering edit mode

Thank you so much for your help; however, I could not understand what flags meant, but I don't know if changing it to another number it's possible to assign overlapped matches? I could not find that information in the documentation page.

ADD REPLY
1
Entering edit mode

Here is an explanation of those flags. There is no flag for overlapping matches. But the flags re.MULTILINE and re.IGNORECASE could be useful in another context.

ADD REPLY
0
Entering edit mode

Thank you so much, It was really helpful.

ADD REPLY
1
Entering edit mode
4.4 years ago

This is only a minor modification of your code using the lookahead from this answer.

import re

OFFSET = 1

def find_motif(seq_dna, motif):
    return list((m.start() + OFFSET) for m in re.finditer("(?=" + motif + ")", seq_dna))

print(find_motif("GATATATGCATATACTT", "ATAT"))

Prints: [2, 4, 10]

ADD COMMENT
0
Entering edit mode

start() and ?= were new to me. Great comment! Thank you

ADD REPLY
0
Entering edit mode

How can I do if I need the result without the [] and the ,. I need the results like this: 2 4 10 how can I do that?

ADD REPLY
0
Entering edit mode

[ ] means its a list in python. You need to access a specific entry of the list. You can do that in a number of different ways.

https://www.programiz.com/python-programming/list

ADD REPLY

Login before adding your answer.

Traffic: 1362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6