finding overlapping motifs to increase length of motif
1
0
Entering edit mode
2.0 years ago
andrea • 0

Hi All,

I am able to find matching motif in my sequence, and I would like to now find overlapping motifs. Basically, after matching my motif, I want to find the 6 amino acids after it. This is the code below that I used to find the motif:

import Bio
import regex

from Bio import SeqIO
input_file = 'sequences.fasta'
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
     name, sequence = fasta.id, str(fasta.seq)
     result=regex.finditer(r"[YFWLIMVA]..[LMALVN]..[AGSTCD].[LAIVNFYMW]",sequence)
 for x in result:
    print(name, x.start(), x.end(), x.group())

The above code works perfectly becasue it give me the sequence id, positions and the motif. The output is below:

P1  33 41 VTLLPAADL

Right now, what I want to do is to also get the 6 amino acids after matching this motif, such that I get an output like the one below.

P1 33 47 VTLLPAADLLMAIID

The code that I have tried to get the 6 amino acids after my match is below.

import Bio
import regex

from Bio import SeqIO
input_file = 'sequences.fasta'
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
     name, sequence = fasta.id, str(fasta.seq)
     result=regex.finditer(r"[YFWLIMVA]..[LMALVN]..[AGSTCD].[LAIVNFYMW]",sequence)
 for x in result:
    print(name, x.start(), x.end() + 6, x.group())

This the output it gives me:

#It does not extend my motif by 6 amino acids, after getting the match.     
P1  33 47 VTLLPAADL 

#My desired output is this which include the overlapping LMAIID motifs
P1   33 47 VTLLPAADLLMAIID

I also tried the code below, but it returns an error.

import Bio
import regex

from Bio import SeqIO
input_file = 'sequences.fasta'
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
     name, sequence = fasta.id, str(fasta.seq)
     result=regex.finditer(r"[YFWLIMVA]..[LMALVN]..[AGSTCD].[LAIVNFYMW]",sequence)
 for x in result:
    print(name, x.start(), x.end() + 6, x.group() +6)
motif fasta biopython aminoacid • 1.0k views
ADD COMMENT
0
Entering edit mode

You have your regex result x, but not the whole fasta record. With the extended numbers, you need to slice the fasta record.

ADD REPLY
0
Entering edit mode

Thank you Michael, how do I do that? I am still new in this, could you perhaps provide me with an example code on how I must do it

ADD REPLY
3
Entering edit mode
2.0 years ago
iraun 6.2k

You have to go back to the original sequence, and fetch the subsequence using the coordinates, like this:

print (sequence[x.start():x.end()+6])
ADD COMMENT
0
Entering edit mode

perfect it worked

ADD REPLY

Login before adding your answer.

Traffic: 2674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6