Hello I have problem in my python script, which should simulate protein cutting by trypsin(cuts after K and R but no if next is P):
for i in range(len(s)-1):
if ( (s[i]=='K' or s[i]=='R')and (s[i+1]!='P')):
f.append(s[n:i+1])
n=i-1
The problem is that somewhere it will work properly and dont cut sequence with KP or PR but somwhere it will cut it, dont you know why????? THX for answer!!! zuzka
I'm not sure why it doesn't work, but you might consider using a regular expression instead
import re
pattern = re.compile('[KR][^P]') #it is the regular expression. it means K or R and follow by anything but P
peptides = pattern.split(sequence) #the split method split on the re you define juste before and return a list
if I try with
sequence = 'LTRPTGKJHIKPTHHKTTGHV'
it returns
peptides = ['LTRPTG', 'HIKPTHH', 'TGHV']
I think your code doesn't work because the "n" object is not defined before the "if ( (s[i]=='K' or s[i]=='R')and (s[i+1]!='P')):" and even if you define "n" before de if, when you subset "s" between "n" and "i+1" you won't get the property fragment.
I made and adaptation of your code and scripts that have two loops, the fist one get the "cuts points" and the other loop make the cuts and store in "f":
cuts = [0]
f = []
def main(s):
for i in range(len(s)-1):
if ( (s[i]=='K' or s[i]=='R')and (s[i+1]!='P')):
cuts.append(i)
cuts.append(len(s))
for a, b in zip(cuts, cuts[1:]):
f.append(P[a:b])
print f
main("LTRPTGKJHIKPTHHKTTGHV")
That print the following list:
['LTRPTG', 'KJHIKPTHH', 'KTTGHV']
If you want to optimise this code, you can fusion the two loops and do all in one loop, as you attempted with your code, but I think this way to write the code is more intuitive and easy to read (at least for me).