I am having trouble trying to search for motifs in a multi fasta file. I have used two techniques one gives me the name of the sequence where the motif is found but doesn't give me the the motif and its position. The other method does not return anything. Please see the code below, any assistance will be highly appreciated. The first one is:
infile=open("sequence.fasta",'r')
out=open("Result.csv",'w')
pattern=re.compile(r"(P[A-Z]{2}P")
for line in infile:
line = line.strip("\n")
if line.startswith('>'):
name=line
else:
s=re.finditer(pattern,line)
print('%s:%s' %(name,s))
out.write('%s:\t%s\n' %(name,s))
This one above returns, the output below. It gives me the name of the sequence which is P1, but it does not give me (1) The motif found (2) the position of the motif. And the Results.csv file that it generates is blank.
#output
>P1:<callable_iterator object at 0x000001D0611B7AF0
I then tried a different technique:
infile=open("sequence.fasta",'r')
open=open("result.csv"",'w')
pattern=re.compile(r"(P[A-Z]{2}P")
for line in infile:
line=line.strip("\n")
if line.startswith('>'):
name=line
else:
s = re.finditer(pattern,line)
for match_obj in s:
print(match_obj)
print(match_obj.group())
out.write('%s:\t%s\t%s\n' %(name, match_obj,match_obj.group()))
The one above does not return anything at all. the Results.csv file is also blank. I'm still new in this python, and some of the techniques I used here I found on bioinformatics stackexchange.
The output I desire, is where I can get (1) Name of sequence, (2) Motif found and (3) position of motif (where it starts and ends) as shown in the example below in the context of every sequence in the multi fasta file
>P1
PACP
22:26
>P2
PDCP
34:38
Any format is fine as long as I can get something similar to the format above
Thank you, I will give this a try.
It worked thank you so much
Good :). Please consider marking the answer as "Accepted".